I found three quality checks doing the same job.
They’d been built months apart, each one responding to a specific failure. The first caught AI-generated phrasing — the filler words and hedge phrases that make content sound machine-written. The second enforced writing craft rules — sentence structure, reading level, paragraph rhythm. The third checked voice consistency — does this sound like the author?
All three were checking for bad phrases. All three had word lists. All three were scanning the same content at different points in the pipeline. And because they’d been built independently, each one had its own version of the same rules — similar enough to overlap, different enough that nobody noticed the duplication.
When I merged them, the duplication was worse than I’d expected. Not just similar word lists — identical rules written three different ways. One check flagged “delve” with a score penalty. Another flagged “delve into” as a phrase. The third had “delve” on a banned list with no scoring at all. Same intent, three builds, no coordination between them.
But the duplication wasn’t the problem. It was covering something up.
All three checks tested phrasing. None of them tested what the content was doing. A paragraph could pass every gate and still read like a sales pitch, because the gates were looking at words, not intent. The overlap meant nobody had noticed the gap — each check was a reaction to a specific failure, not a survey of what the others were already handling.
The merged version now has a step that didn’t exist in any of the originals: a paragraph-level question — “who does this serve, the reader or the author?” It’s what a human reader answers in five seconds. No phrase list catches it.
Three overlapping checks, one missing one. You don’t find that until you collapse the pile.