Paul Welty, PhD AI, WORK, AND STAYING HUMAN

· development

The second project problem

Your system works. Then you try it somewhere else and it falls apart. The gap between 'works here' and 'works anywhere' is where most automation dies — and most organizations never look.

Duration: 4:32 | Size: 4.16 MB


Hey, it’s Paul. Saturday, March 7th.

I spent today watching an autonomous system break. Not because it was bad — it had just passed every test I threw at it. Ten successful runs in a row. Every issue resolved, every pull request merged, costs under control. By any reasonable measure, it worked.

Then I pointed it at a different project. And it broke on the first API call.

Here’s what happened. I’ve been building a pipeline where AI agents handle the full lifecycle of a software issue: read the spec, write the code, run the tests, review the changes, merge them. It’s been running on its home project for days. I was feeling good about it. So I ran it against a different codebase — a web app with a Python backend and a Next.js frontend. Different structure, different conventions, but the same pipeline.

Three bugs in the first three minutes. The API key lookup was hard-coded to one environment variable name. The codebase map was four megabytes — sixty-two thousand lines of node_modules and build artifacts that nobody had noticed because the home project was small. And a label the system needed didn’t exist on the target repository because the setup script didn’t create it.

None of these were subtle. None were edge cases. They were the kind of thing you’d catch in the first five minutes of using the tool somewhere new. But I hadn’t used it somewhere new. I’d been testing it on itself.

This is a pattern I’ve seen in every organization I’ve consulted with, and it has nothing to do with AI. You build a process. You refine it. You test it in the environment where you built it. It works beautifully. Then someone in a different department, a different office, a different country tries to use it, and it falls apart. Not because the process was wrong, but because your environment was baked into the process and you couldn’t see it.

The technical term is overfitting. You optimize so well for your training data that you lose the ability to generalize. But the human version is more interesting. It’s not just that you optimize for your environment — it’s that your environment becomes invisible to you. The API key convention, the project structure, the size of the codebase map. These aren’t decisions you made. They’re things you stopped noticing.

I think this is why the second project is always the real test. Not the tenth project. Not the enterprise rollout. The second one. Because the second project is close enough to the first that you expect it to work, but different enough that all your invisible assumptions become visible. The sixty-two thousand lines of node_modules in a file that was supposed to be a navigation aid. The environment variable that was there on one project and missing on another.

What’s interesting is the nature of the bugs. Adding the blocked label — a simple concept, “this issue can’t be worked on yet because it depends on something else” — required changes in four separate places in the codebase. Outcome detection. Priority ordering. Skip lists. Cleanup logic. Each one was discovered by running the system and watching it fail. Not by reading the code. Not by reasoning about it. By running the loop and paying attention.

Four integration points for one label. And the code wasn’t structured to make this visible. You couldn’t look at a diagram and say “ah yes, when you add a new state, you need to update these four places.” You had to discover it empirically, one crash at a time.

This is how complexity actually works in organizations. Not as a big obvious thing you can point at, but as invisible coupling between parts of the system that look independent. The label is just a string. The outcome detector is just a function. The skip list is just a set. They don’t reference each other. They don’t know about each other. But they all need to agree, and when they don’t, the system behaves in ways nobody predicted.

Every manager has experienced this. You change one policy, and three unrelated workflows break. You reorganize one team, and a reporting chain that nobody documented stops functioning. The coupling was always there. It just wasn’t visible until you changed something.

So here’s the practical lesson. When you build a system — any system, technical or organizational — don’t ask “does it work?” Ask “does it work somewhere else?” Take whatever you’ve built, strip it of the environment it grew up in, and put it in a different context. Not a hostile context. Just a different one.

The bugs you’ll find won’t be in your logic. They’ll be in your assumptions. The things you stopped seeing because they were always true in your world. And those are the bugs that will kill you when you try to scale.

The four-megabyte codebase map worked fine when the codebase was small. The hard-coded environment variable worked fine when every project used the same name. The missing label worked fine when every repository had been set up by the same person. None of these are engineering failures. They’re the natural result of building in one place and never leaving.

Your system doesn’t work until it works somewhere you didn’t build it. Everything before that is just a demo.

Why customer tools are organized wrong

This article reveals a fundamental flaw in how customer support tools are designed—organizing by interaction type instead of by customer—and explains why this fragmentation wastes time and obscures the full picture you need to help users effectively.

Infrastructure shapes thought

The tools you build determine what kinds of thinking become possible. On infrastructure, friction, and building deliberately for thought rather than just throughput.

Server-side dashboard architecture: Why moving data fetching off the browser changes everything

How choosing server-side rendering solved security, CORS, and credential management problems I didn't know I had.

The work of being available now

A book on AI, judgment, and staying human at work.

The practice of work in progress

Practical essays on how work actually gets done.

The smartest code you'll ever delete

The most dangerous kind of waste isn't the thing that doesn't work. It's the thing that works beautifully and shouldn't exist.

The first real user breaks everything

Your product works until someone actually uses it. The gap between 'works in dev' and 'works for a person' is where most systems fail — and most organizations avoid looking.

The loop nobody bothers to close

Most systems observe. Almost none learn. The difference is a feedback loop — and the boring cleanup work that makes it possible.

The smartest code you'll ever delete

The most dangerous kind of waste isn't the thing that doesn't work. It's the thing that works beautifully and shouldn't exist.

The difference between shipping and finishing

Shipping is mechanical. Finishing is a judgment call. And most organizations have quietly made it impossible to tell the difference.

Your biggest problems are the ones running fine

The most dangerous failures in any system — technical or organizational — aren't the ones throwing errors. They're the ones that appear to work perfectly. And they'll keep appearing to work perfectly right up until they don't.