Paul Welty, PhD AI, WORK, AND STAYING HUMAN

· technology · leadership · work

The work that remains

When AI handles implementation, the human job shifts from doing the work to understanding the work. Speed without understanding is just technical debt with better commit messages.

When your system ships 45 items in a day and you can’t narrate what it did, you have to ask a hard question: are you still supervising it? Or are you just watching?

The work that remains for humans — the real work, the work that doesn’t get automated away — is increasingly about judgment, supervision, and taste. Not doing, but deciding. Not building, but understanding what was built. And most of us aren’t practicing any of it.

Childproofing the pipeline

I run several projects with autonomous AI pipelines — systems that pick up tasks, write code, open pull requests, ship changes. One project pushed more than 45 items this week. An entire app scaffolded out: login screens, push notifications, data models, calendar sync. Another project spent its entire day not building features but building guardrails around the pipeline — handling zombie processes, preventing merge conflict cascades, adding signal handlers so the automation shuts down gracefully instead of corrupting its own work.

None of the guardrail work ships a feature a user would notice. But all of it is necessary because the pipeline has become autonomous enough to create problems that only an autonomous system would create. Merge conflict cascades don’t happen when a human is committing code. Zombie processes don’t spawn when someone’s working in their IDE. These are failure modes that emerge specifically because the system is operating on its own, at speed, without someone watching.

The metaphor is uncomfortably precise: parenting. You promote someone. They’re capable. They start operating independently. And suddenly your job isn’t doing their work — it’s building the scaffolding so their independence doesn’t create chaos. Setting boundaries. Creating escalation paths. Defining what “try again before you come to me” looks like.

The question that nags at me is whether this overhead scales sublinearly or linearly. Do you invest now in resilience and then reap the benefits forever? Or does every new capability the pipeline gains create a new class of failure you have to anticipate? Some investments — like merge conflict handling — feel permanent. Others — like environment variable gaps that surface only in headless contexts — feel like a new surprise will emerge every time the system runs in a new way.

If you manage people, you already know this tension. Some employees you invest in early and they become self-sustaining. Others, every new responsibility surfaces a new gap. The honest answer is you don’t know which kind you have until you’re deep into it. Same with pipelines.

Receipts versus reasoning

I keep editorial work logs for all my projects. The idea is to capture not just what shipped but why — the decisions, the tradeoffs, the things that don’t live in commit messages. This week, one project’s log read like a thinking document. It explained why a manual session happened alongside the automated one, what a quality scout found, what risks were flagged for the next session. You could read that log six months from now and understand the judgment behind the work.

Another project’s log — the one that shipped 45 items — read like a changelog. Here’s what went out. No editorial voice. No documented decisions. No tradeoffs weighed.

When a pipeline ships that fast, the temptation is to let it run and document the output. But this is exactly where knowledge work is about to get confused. When the system handles implementation, the human job shifts from doing the work to understanding the work. If your logs don’t reflect that understanding, is anyone actually reviewing what’s shipping?

This maps directly onto something I see in organizations all the time. The team that ships fast and documents nothing feels productive. The team that ships half as much but can explain every decision is productive. Speed without understanding is just technical debt with better commit messages. That’s true whether the fast shipper is a junior developer, a contractor, or an AI pipeline.

Where taste still lives

One of my projects flagged something explicit in its risk section this week: the pipeline is going to struggle with the next batch of work because that work requires writing substantive, domain-adapted learning content, not just code.

This is exactly right. The pipeline can scaffold a scoring engine, build an onboarding flow, wire up CRUD operations all day long. Pattern-heavy tasks. But writing content that requires judgment, domain expertise, and taste? That’s a wall.

You could see the same boundary from the other side in the project that built the app. Scaffolding login screens and push notification handlers — pattern work, the pipeline eats it for breakfast. But a briefing email redesign in that same project took a dozen commits, each one a human-directed adjustment. “No onboarding tone in headlines.” “Always use the active analyst prompt.” “Restructure around themes, ideas, content.” The pipeline executed each change perfectly. But the sequence of changes reveals someone steering toward a vision the pipeline can’t hold on its own.

This is the distinction that matters. The pipeline is excellent at executing discrete instructions. It’s poor at maintaining sustained creative intent across a body of work. It can write a paragraph. It can’t write an essay — not one that’s actually going somewhere, not one where paragraph twelve needs to echo paragraph three in a way that only makes sense if you know where paragraph twenty lands.

This isn’t a temporary limitation that better models will fix next quarter. This is structural. Creative intent requires holding a vision of the whole while working on the parts. Every time you hand a pipeline a task, it optimizes locally. The global coherence — the taste — that’s still yours.

For anyone thinking about where AI fits in their work: not “can AI do my job” but “which parts of my job are pattern execution and which parts are sustained creative judgment?” The pipeline is coming for the first category fast. The second category is where your value concentrates.

Machines checking machines

Both the infrastructure project and the app project independently added the same quality check this week — browser screenshots fed to a vision model. The pipeline writes code, a browser renders it, a screenshot captures what the user would see, and a vision model evaluates whether it looks right. The human is nowhere in that loop.

Machines checking machines. Genuinely useful — it’s the first QA mechanism that operates at the level of user experience rather than code logic. But it just pushes the judgment problem up one level. The vision model needs criteria for “good.” Someone has to define what right looks like. You can automate the inspection. You can’t automate the standard.

That’s true in every organization I’ve ever worked with. You can build dashboards, scorecards, automated alerts. But someone still has to decide what the dashboard should measure. The tool doesn’t replace the judgment. It just makes the judgment harder to see.

The practice nobody’s doing

The projects that will thrive are the ones that develop strong editorial instincts about their own output. Not just “did it ship” but “should it have shipped that way.” Not just velocity but comprehension.

What does a daily practice of that kind of judgment actually look like? I have one project that does it well — captures decisions, weighs tradeoffs, narrates the work. I have another that’s outrunning its own supervision — shipping faster than anyone can evaluate what shipped. I built both of them.

The work that remains isn’t technical. It’s the willingness to slow down long enough to understand what your systems are doing on your behalf. And right now, the systems are getting faster while the understanding isn’t keeping up.

That gap is where the next generation of failures will come from.

This essay first appeared in The work of being, a weekly newsletter on work, learning, and judgment.

Why customer tools are organized wrong

This article reveals a fundamental flaw in how customer support tools are designed—organizing by interaction type instead of by customer—and explains why this fragmentation wastes time and obscures the full picture you need to help users effectively.

Infrastructure shapes thought

The tools you build determine what kinds of thinking become possible. On infrastructure, friction, and building deliberately for thought rather than just throughput.

Server-side dashboard architecture: Why moving data fetching off the browser changes everything

How choosing server-side rendering solved security, CORS, and credential management problems I didn't know I had.

The work of being available now

A book on AI, judgment, and staying human at work.

The practice of work in progress

Practical essays on how work actually gets done.

Nothing is finished until you say it is

Continuous delivery removed the endings from work. That felt like progress. But without formal completion, you lose the ability to say what you actually accomplished — and more importantly, what you're done thinking about.

Your biggest problems are the ones running fine

The most dangerous failures in any system — technical or organizational — aren't the ones throwing errors. They're the ones that appear to work perfectly. And they'll keep appearing to work perfectly right up until they don't.

The day all five of my AI projects stopped building and started cleaning

I want to talk about something that happened this week that I almost missed because it looked boring. Five separate software projects — all mine, all running semi-autonomously with AI pipelines — i...

The bottleneck moved

The constraint in knowledge work used to be execution. Now it's specification. Most organizations haven't noticed.

Universities missed the window to own AI literacy

In 2023 the question of who would own AI literacy was wide open. Universities spent two years forming committees while everyone else claimed the territory. Then a federal agency published the guidance higher education should have written.

Junior engineers didn't become profitable overnight. The work did.

We've been celebrating that AI made junior engineers profitable. That's not what happened. AI made it economically viable to give them access to work that actually builds judgment, work we always knew