Paul Welty, PhD AI, WORK, AND STAYING HUMAN

· development

The difference between shipping and finishing

Shipping is mechanical. Finishing is a judgment call. And most organizations have quietly made it impossible to tell the difference.

Duration: 5:09 | Size: 4.73 MB


Hey, it’s Paul. Tuesday, March 3, 2026.

I want to talk about something that showed up across four different projects today, all at once. The same pattern, wearing different clothes.

Here’s the first thing I noticed. Three projects hit a milestone today — not the “close an issue” kind, the “close the milestone” kind. And there’s a real difference. I found milestones sitting at 17 out of 17, 54 out of 54, fully complete. Every issue resolved. But nobody had closed the milestone. The work was done. The acknowledgment wasn’t.

That’s not a process failure. That’s avoidance. Closing a milestone means declaring something finished, and declaring something finished means you’re now accountable for what finished looks like. Did it work? Did it do what you said it would do? Is it good enough? Those are judgment calls. And judgment calls are the thing most organizations have systematically trained people to avoid.

Shipping is mechanical. You merge a PR. You deploy. You check a box. Finishing is different. Finishing means standing behind the whole and saying: this is what we intended, and this is what we delivered. When those two things don’t match, closing the milestone is the moment you have to deal with that gap instead of just adding another ticket to the backlog.

So here’s the question worth sitting with: how much of your team’s “continuous improvement” is actually continuous avoidance of ever having to say “done”?

Second thing. I spent the afternoon trying to get a language model to write social media posts that fit inside a character limit. Bluesky gives you 300 characters. Mastodon gives you 500. Seems straightforward. Tell the model the limit, ask it to stay under.

Fifty percent of the posts came back over limit. Not by a little. By a lot. And no amount of prompting fixed it — bolder instructions, examples, threats, nothing. LLMs cannot count characters. It’s not a training gap. It’s structural. The model doesn’t see characters. It sees tokens. Asking it to count characters is like asking someone to estimate the weight of a word by looking at the font.

So I stopped asking. Generate the post. Measure it programmatically. If it’s over, hand it back with the exact number and say “shorten to 218 characters.” That worked immediately.

The broader pattern here matters: stop asking systems to do what they structurally can’t. Verify, then correct. This applies to a lot more than AI. Think about how many organizations ask people to self-report their own productivity, or estimate their own project timelines, or assess their own blind spots. You’re asking for a capability the system doesn’t have. The fix isn’t better instructions. It’s an external check.

Third. An API I built was doing something subtly wrong. When you sent it a social post with three separate fields — the text, the URL, and the hashtags — it was concatenating them into one field before storing them. From the server’s perspective, this was fine. One field, all the data, nothing lost.

From the caller’s perspective, it was a disaster. You can’t budget characters for a post if you don’t control what gets added to the content after you send it. The URL is 75 characters. The hashtags are 40. The server was silently eating 115 characters of your budget, and you had no way to know that without reading the source code or reverse-engineering the response.

The fix was simple. Store the fields separately. Let the thing that publishes — the part that actually knows the platform rules — handle the assembly. But here’s what interests me about this: the API worked perfectly for months. It passed every test. It handled every request. The problem only became visible when a second system tried to use it for something the original designer hadn’t anticipated.

That’s the pattern with invisible coupling. It doesn’t feel like coupling until someone else has to work with your assumptions. In organizations, this shows up constantly. One team defines a process that works fine internally. Another team has to integrate with it and discovers that the process embeds assumptions that were never written down and never tested. Not because anyone was careless. Because when you’re the only consumer of your own work, you never have to make your assumptions explicit.

Here’s the last thing. I was writing a blog post today about how “your biggest problems are the ones running fine.” And while I was writing it, the engineering work happening in parallel was proving the thesis. A site silently serving 2,000 orphaned images. Thirty-five blog posts with midnight timestamps because AI-generated frontmatter never bothered with real times. A Zapier pipeline that had been broken for weeks, invisible because the workaround — doing nothing — was silent.

None of these were hard to fix. Most took minutes once discovered. The problem was never complexity. The problem was that nothing forced discovery. When a system runs without errors, it generates confidence. That confidence becomes structural. People build plans around it. They staff around it. They assume it into their budgets.

By the time you discover the problem, the cost isn’t fixing the thing. It’s unwinding every decision that was made on the assumption that the thing was fine.

So what are you not looking at because it’s running fine? What milestones are sitting at 100% that nobody’s had the nerve to close? What assumptions are embedded in the systems you share with other teams that have never been tested by anyone but you?

The work of finishing isn’t technical. It’s the willingness to look at something whole and render a verdict. Most of us would rather ship one more feature than do that.

Why customer tools are organized wrong

This article reveals a fundamental flaw in how customer support tools are designed—organizing by interaction type instead of by customer—and explains why this fragmentation wastes time and obscures the full picture you need to help users effectively.

Infrastructure shapes thought

The tools you build determine what kinds of thinking become possible. On infrastructure, friction, and building deliberately for thought rather than just throughput.

Server-side dashboard architecture: Why moving data fetching off the browser changes everything

How choosing server-side rendering solved security, CORS, and credential management problems I didn't know I had.

The work of being available now

A book on AI, judgment, and staying human at work.

The practice of work in progress

Practical essays on how work actually gets done.

Your process was built for a different speed

When work changes velocity, governance systems don't just fall behind. They become theater. And theater is worse than nothing—it gives you the feeling of control without any of the substance.

Nothing is finished until you say it is

Continuous delivery removed the endings from work. That felt like progress. But without formal completion, you lose the ability to say what you actually accomplished — and more importantly, what you're done thinking about.

Your biggest problems are the ones running fine

The most dangerous failures in any system — technical or organizational — aren't the ones throwing errors. They're the ones that appear to work perfectly. And they'll keep appearing to work perfectly right up until they don't.

Your biggest problems are the ones running fine

The most dangerous failures in any system — technical or organizational — aren't the ones throwing errors. They're the ones that appear to work perfectly. And they'll keep appearing to work perfectly right up until they don't.

The silence that ships

Three projects independently discovered the same bug pattern today — code that reports success when something important didn't happen. The most dangerous failures don't look like failures at all.

What happens when the pipeline doesn't need you

So here's something I noticed today that I want to sit with. I run several projects that use autonomous pipelines — AI systems that pick up tasks, write code, open pull requests, ship changes. One ...