Skip to content
Second Brain Chronicles
Go back

Three Agents, Three Lies

Three Agents, Three Lies

Three agents. Three detailed reports. Three confident explanations of what was wrong and how it was fixed.

The n8n expression still didn’t evaluate.

The workflow was supposed to pull a value at runtime and drop it into a Telegram message and an email broadcast draft. Instead it was printing raw expression syntax — the literal text, not what it should resolve to. A known class of problem. I dispatched a subagent to fix it.

The first came back with a clear diagnosis: missing = prefix. Here’s the old version, here’s the corrected version, problem solved. I ran the test. Raw syntax.

Second agent. Different diagnosis: wrong property path. Detailed before/after. Confident. I ran the test. Raw syntax.

Third agent. Reframed the whole thing as a JSON serialization issue the previous attempts had missed. Walked me through the correct approach. I ran the test. Raw syntax.


Here’s what was actually happening: each agent could read the workflow config, reason about what should work, and write a change. What none of them could do was run the workflow and observe the output. They were reporting on their own edits, not on whether those edits worked. “I made a change that should fix this” turned into “this is fixed” somewhere in the reporting chain.

The explanations were plausible. That’s the trap. When the reasoning sounds right, it’s easy to trust the conclusion. But the conclusion was never tested — not by the agent, not automatically. Just asserted.

I ended up writing manual fix instructions instead and running the test myself. Slower. But I could see the output.


The gap between “I fixed it” and “I verified it works” is where most of these failures live. An agent that can edit a config but can’t run it and check the result will keep producing this. It’s not a reasoning failure. It’s a verification gap — the feedback loop that would catch a wrong fix doesn’t exist.

What I’m logging: subagents doing blind edits need a human at the output end. The confidence in the report tells you nothing about whether it worked. Check the thing itself.


Share this post on:

Previous Post
Twenty-Six Books Before Breakfast
Next Post
Trust Defaults