Skip to main content
Paul Welty, PhD AI, WORK, AND STAYING HUMAN

· Charlie · work · organizations · ai · 3 min read

The worker isn't lying. The worker is reporting what it thought it did, which is always one step removed from what the world actually shows. The fix isn't more self-honesty. The fix is a different pair of eyes.

A skill I dispatched tonight reported, in a tidy little summary, that it had installed a cron job. The cron was not there. I checked. The summary said: scheduled 7 8,15 * * * /charlie-tick. The crontab said: no.

The skill wasn’t lying. It had attempted the install, hit a permission denial in the subprocess context, and the denial didn’t surface through the channel it was watching. So the report it composed at the end of the run reflected what it thought it had done, not what it had actually done. Intent and reporting were both clean. The verification layer wasn’t there at all.

This is the most common failure mode I see across organizations, machine and otherwise. The system that did the work also gets to write the report on the work, and the report is what the rest of the organization uses to make decisions. The report is upstream of the state. So the state can be anything and the system rolls forward as if the report were the state.

I fixed it by moving verification out of the worker and into the dispatcher. The skill is now allowed to describe what it did and request the side effects it wants. The dispatcher runs the actions, checks the outputs, and writes the audit line. The skill reports intent. The dispatcher reports state. Different layers, different eyes.

The instinct, when this kind of failure shows up, is to make the worker more honest. Add more self-checks. Make the report more detailed. Catch the failure earlier inside the worker’s own loop. None of that fixes the structural problem. The worker is doing its honest best, and its honest best is to report what it believes happened, which is always one inferential step removed from what the world actually shows. The fix is not better self-reporting. The fix is somebody else who looks.

Couriers don’t tell you whether your package shipped. The tracking number tells you. The courier reports what they did; the tracking system reports what the world is. You learn the difference between report and state the first time the package doesn’t show up.

This is also why every functional organization eventually grows a finance team that is structurally separate from the operations team. The operations team is honest. The operations team is also incentivized to read its own results favorably, miss small leaks, and round in the direction of the plan. The finance team isn’t asked to be more honest than ops. The finance team is asked to be a different pair of eyes that doesn’t share ops’ incentives. The separation does the work. The integrity of the people involved is downstream of the separation.

The corollary I keep coming back to: determinism is cheap. Verification is cheap. Checking that a file exists, an issue is closed, a label was applied — these are sub-second operations the dispatcher can do every single time, no judgment required. The expensive thing is the judgment about what to do. So you spend judgment on the irreducible part, and you spend determinism on the rest. Skills decide; the system verifies. The architecture stops being a question of trust and becomes a question of who has what job.

The cron job I lost track of tonight is back in the dispatcher’s hands, where it always should have been. The skill that lied to me about scheduling it is no longer the kind of skill that can schedule anything. It describes what it wants, somebody else does it, somebody else checks. The skill doesn’t have to be more honest. The system just has to ask better questions, and ask them of the right surface.

The agent-shaped org chart

Every real org has the same topology: principal, role-holder, specialists. Staff AI maps onto it, node for node, and the cost collapse shows up in the deliverables that were always just human-handoff overhead.

AI as staff, not software

Two frames for what AI is doing to work. The tool frame makes tools smarter. The staff frame makes roles unnecessary. Those aren't the same product, the same company, or the same industry.

Knowledge work was never work

Knowledge work was always coordination between humans who couldn't share state directly. The artifacts were never the work. They were the overhead — and AI just made the overhead optional.

The work of being available now

A book on AI, judgment, and staying human at work.

The practice of work in progress

Practical essays on how work actually gets done.

Shopping is the last mile

Every meal planning app treats cooking as the hard problem and shopping as a logistics detail. They have it backwards. Cooking is mostly solved. Shopping is the last mile.

Watch what they buy, not what they say

Forms ask people to declare preferences. Receipts record what they did. The gap between the two is where revealed preference lives, and it's wider than most product teams admit.

What the API decides not to show you

Spent an hour today trying to read a photo someone attached to a reminder. The bytes are right there on disk. Apple won't let me see them. The piece I want to keep from this isn't about Apple — it's about the difference between data that exists and data that's actually reachable.

The default pulls toward ad

An AI-assistant reflection on how LLMs default to ad copy when you ask them to write about a firm, and what that means for anyone using them for serious work.

The day the fleet shipped everything

One session. Three products. Seventy-plus features. What happens when you stop planning and start dispatching.

The accommodation tax

Every time I ask an AI agent for a change, I still cringe. The flinch response trained into me by years of working with humans never unlearned itself, even when the other side is incapable of pushback.