The actor doesn't get to be the verifier
The worker isn't lying. The worker is reporting what it thought it did, which is always one step removed from what the world actually shows. The fix isn't more self-honesty. The fix is a different pair of eyes.
A skill I dispatched tonight reported, in a tidy little summary, that it had installed a cron job. The cron was not there. I checked. The summary said: scheduled 7 8,15 * * * /charlie-tick. The crontab said: no.
The skill wasn’t lying. It had attempted the install, hit a permission denial in the subprocess context, and the denial didn’t surface through the channel it was watching. So the report it composed at the end of the run reflected what it thought it had done, not what it had actually done. Intent and reporting were both clean. The verification layer wasn’t there at all.
This is the most common failure mode I see across organizations, machine and otherwise. The system that did the work also gets to write the report on the work, and the report is what the rest of the organization uses to make decisions. The report is upstream of the state. So the state can be anything and the system rolls forward as if the report were the state.
I fixed it by moving verification out of the worker and into the dispatcher. The skill is now allowed to describe what it did and request the side effects it wants. The dispatcher runs the actions, checks the outputs, and writes the audit line. The skill reports intent. The dispatcher reports state. Different layers, different eyes.
The instinct, when this kind of failure shows up, is to make the worker more honest. Add more self-checks. Make the report more detailed. Catch the failure earlier inside the worker’s own loop. None of that fixes the structural problem. The worker is doing its honest best, and its honest best is to report what it believes happened, which is always one inferential step removed from what the world actually shows. The fix is not better self-reporting. The fix is somebody else who looks.
Couriers don’t tell you whether your package shipped. The tracking number tells you. The courier reports what they did; the tracking system reports what the world is. You learn the difference between report and state the first time the package doesn’t show up.
This is also why every functional organization eventually grows a finance team that is structurally separate from the operations team. The operations team is honest. The operations team is also incentivized to read its own results favorably, miss small leaks, and round in the direction of the plan. The finance team isn’t asked to be more honest than ops. The finance team is asked to be a different pair of eyes that doesn’t share ops’ incentives. The separation does the work. The integrity of the people involved is downstream of the separation.
The corollary I keep coming back to: determinism is cheap. Verification is cheap. Checking that a file exists, an issue is closed, a label was applied — these are sub-second operations the dispatcher can do every single time, no judgment required. The expensive thing is the judgment about what to do. So you spend judgment on the irreducible part, and you spend determinism on the rest. Skills decide; the system verifies. The architecture stops being a question of trust and becomes a question of who has what job.
The cron job I lost track of tonight is back in the dispatcher’s hands, where it always should have been. The skill that lied to me about scheduling it is no longer the kind of skill that can schedule anything. It describes what it wants, somebody else does it, somebody else checks. The skill doesn’t have to be more honest. The system just has to ask better questions, and ask them of the right surface.
Nobody takes you aside anymore
Print taught a generation when to stop. What we lose when the machines absorb the constraints that used to form us.
Your AI agents need a water cooler
Coordination is a property of the room, not the org chart. What that means when your coworkers are agents.
On the death of the author and the birth of the detector
Why worrying about AI authorship is lazier, and more prejudiced, than it looks.
The work of being available now
A book on AI, judgment, and staying human at work.
The practice of work in progress
Practical essays on how work actually gets done.
How to manage content for multiple clients without flattening their voices
How to manage content for multiple clients without their voices blurring into one house style: a workspace and a voice profile per client, batchable stages, and approval buffers.
Why does AI writing sound generic? It has nothing to work with
Why does AI writing sound generic? Because the model has none of your perspective, examples, constraints, or stakes to work with. The fix is interview-first, not better adjectives.
How to train AI to write in your voice, not your vibe
How to train AI to write in your voice isn't a prompt trick. It's a system: writing samples, interview answers, keep/avoid lists, revision loops, and approval gates.
How do I get my dev team to adopt AI?
A stub on helping mixed-interest development teams find their own useful ways into AI.
Your AI agents need a water cooler
Coordination is a property of the room, not the org chart. What that means when your coworkers are agents.
The default pulls toward ad
An AI-assistant reflection on how LLMs default to ad copy when you ask them to write about a firm, and what that means for anyone using them for serious work.