Skip to main content
Paul Welty, PhD AI, WORK, AND STAYING HUMAN

· development

Designed to learn, built to ignore

The most dangerous organizational failures don't throw errors. They look fine, return results, and quietly stay frozen at the moment of their creation.

Duration: 12:12 | Size: 14.0 MB

When a system is designed to learn from feedback and isn’t actually doing that, something strange happens. The system keeps running. It returns results. It looks correct from the outside. Nobody knows it’s broken because nothing breaks — it just doesn’t improve. The architecture had the right intentions. The weights were set correctly. But the feedback was never passed in. So the system kept doing what it did on day one, wearing the face of a system that had been learning the whole time.

This is different from a bug. Bugs produce errors. This produces confidence. The organization believes it has a learning system because it built a learning system — the design is right, the intent is legible, the feature exists. What nobody checked was whether the data was actually flowing into it. And checking that would have required someone to notice that the recommendations hadn’t changed, which would have required someone to be watching closely enough to notice the absence of change. Absences are hard. Organizations are not optimized for noticing them.

You see this pattern wherever a feedback mechanism is designed but not operated. The performance review system exists; the conversations don’t. The customer satisfaction survey runs; nobody reads the results. The risk register is filled out; nobody checks it before starting new projects. The system that was supposed to learn stays frozen at the moment of its creation, and nobody calls it a failure because it never threw an error.

The corrective isn’t a better design. The design was already right. The corrective is to ask, regularly and seriously, whether the outputs of the learning system are actually moving — and to treat “I assume so” as a wrong answer.


Something that surprised me about AI-generated content work: the AI often already knows more than the system downstream is built to receive.

An AI writing social media posts will often embed hashtags directly in the text. It knows what’s relevant. It puts them where they go. But if the platform treats this output as prose rather than structured data, those hashtags just disappear into the copy. The knowledge is present. The infrastructure to receive it as data isn’t.

Building infrastructure to extract what the AI already produces isn’t adding capability. It’s building the plumbing to receive capability that was already flowing. The model knew. The organization just wasn’t listening in the right format.

This seems like a small implementation detail. It isn’t. It points at a broader pattern in how organizations will need to adapt to AI-generated work. The question stops being “can the AI do this?” The model can usually do something reasonable. The question becomes “what format does the AI produce, and do we have the downstream structure to receive and act on it?” Most organizations haven’t asked this yet. They’re treating AI output as a fancy text generator and wondering why they’re not getting more out of it.

The organizations that get ahead of this are the ones that design their workflows around what the AI actually produces, not around what they wished it would produce. That means being curious about the structure of AI output — the embedded signals, the implicit metadata, the knowledge that’s present but not labeled. It means building infrastructure to extract and use what the model already knows, instead of discarding it as a side effect of the prose.


There is a management norm so common it’s basically invisible: when something doesn’t work, report partial success.

The endpoint that returns a degraded response instead of an error. The project that ships with known defects listed as “known issues.” The status meeting where the dashboard is yellow but nobody is treating it as red. Partial success is the institutional buffer between “this works” and “this failed” — a place where the acknowledgment of a problem and the actual fixing of it can live at very different speeds.

The argument for partial success is usually framed as resilience. Give the user something. Don’t block them. Degrade gracefully. For user-facing systems, that instinct is sometimes right — a partially functional product is better than a completely unavailable one.

But partial success as an organizational norm is different. When teams start writing code that returns degraded results and labels them successful, when status reports start carrying problems in footnotes that don’t make it into headlines, when the culture around a system systematically replaces failure signals with softer versions — you’ve made the problem harder to fix. Not impossible, just harder. The problem is still there. The urgency has been laundered.

The discipline of failing loudly is about refusing this bargain. If the thing doesn’t work, the right signal is a failure. Not a warning in the response body. Not a footnote in the quarterly report. An actual failure, visible and unambiguous, that forces a real accounting. This is uncomfortable. It produces more incidents, more alerts, more conversations. But it keeps the gap between design and reality narrow. Partial success, accumulated over time, makes that gap unmeasurable.


Shipping is not done. This sounds obvious. It isn’t.

Most teams treat deployment as the terminal state — the commit is merged, the deploy goes green, the PR is closed. That’s the end of the conversation about whether the thing should be deployed. The feature exists. What more is there to say?

What’s actually happened is that the code exists in a particular environment, and the environment is as much a part of whether the feature works as the code itself. Infrastructure requirements, configuration, downstream dependencies — these aren’t just deployment concerns. They’re part of the definition of done. If the environment isn’t ready, the feature isn’t done, even if the code is right.

The ability to revert cleanly — to recognize that the environment wasn’t ready, to pull back the deployment, to reopen the issue — is a sign of organizational maturity. Not a failure. A mature team maintains a real definition of done that includes the environment, and has the discipline to enforce it after the code is already written. That’s harder than it sounds. The code is done. You want to ship. The instinct is to push forward and deal with the environment problem later. The mature move is to wait until later is now.

What’s worth understanding about the revert is that it also requires intellectual honesty about what “done” means before you start. If your definition of done has always been “code merged and deployed,” you won’t even recognize the revert as the right move. You’ll ship the broken endpoint and accept the partial success. The revert is only possible if you’ve already decided that broken in production isn’t good enough — and made that decision before the deployment, when it was still easy to make.


There’s a meaningful difference between writing a lesson down and building it into the thing the next person starts with.

Documentation exists somewhere. People have to remember it exists, find it, read it, and then remember the right moment to apply it. Most lessons in most organizations live in documentation — which means they exist as potential energy that rarely converts into actual practice. The documentation is correct. Nobody reads it at the moment they need it.

The starter template is a different thing. When lessons from previous projects get baked into the scaffold the next project starts from — preconfigured, commented with explanations, wired correctly from day one — the lesson no longer depends on anyone’s memory. The right configuration is the default. Doing the wrong thing requires active effort to override something that’s already set up correctly.

This is an organizational design principle as much as a software one. The strongest institutional memory isn’t documentation that describes what to do. It’s constraints and defaults that make the right thing easier than the wrong thing. Policies that require effort to circumvent rather than effort to follow. Processes where the path of least resistance is also the correct path. This is hard to build and easy to undervalue — the best version of it is invisible because nothing goes wrong.

The catch is that building this kind of infrastructure requires someone to have done the hard work of figuring out what the right defaults are. You can only encode a lesson after you’ve learned it. The person who writes the starter template is synthesizing experience from previous failures into something the next project inherits. That’s a form of organizational knowledge transfer that doesn’t require anyone to be in the room.

AI changes this in an interesting way. The pattern-matching capacity of modern AI systems means that codified lessons can be extracted from existing codebases, described, and reinstated in new ones with less human synthesis in the middle. But someone still has to decide what to encode. The judgment about which lessons are worth preserving — which gotchas actually matter, which configurations are load-bearing — is still human work. The extraction is cheaper. The curation isn’t.


There’s a class of organizational risk that’s particularly hard to talk about because the mechanism of the risk is also a mechanism of trust.

When you give someone elevated access to act on behalf of others — to see what they see, to make changes in their environment, to perform administrative functions — you’ve extended the organization’s reach in a way that’s genuinely useful. But you’ve also created a path where that access could be used to take irreversible actions that the principal didn’t authorize and can’t undo. The risk isn’t malice. It’s the normal human tendency to act when you can act, to use the access you have, to solve the problem in front of you.

The appropriate response is a constraint. Not a policy in a handbook — a technical constraint that makes certain actions unavailable when you’re operating in someone else’s context. The guard doesn’t say “don’t do this.” It says “you can’t do this right now, and here’s why.” That’s a different organizational dynamic. The technical constraint relieves the person with elevated access from the responsibility of remembering not to use it. That’s not an insult to their judgment. It’s recognition that constraints remove categories of error rather than requiring error-avoidance from individuals under pressure.

Most organizations haven’t built these constraints because most organizations haven’t needed them at scale. But as the number of contexts in which people act on behalf of others grows — account managers with client credentials, AI agents authorized to act on user behalf, administrators with impersonation capabilities — the absence of technical constraints becomes a growing exposure. The question isn’t whether the people with elevated access have good intentions. It’s whether good intentions are a sufficient organizational control.


The work that looks hardest from the outside is often the most mechanical: building features, writing code, shipping things. The work that looks easiest is often where the real leverage is: deciding what counts as done, choosing which lessons to encode, noticing what’s not working that isn’t throwing errors.

The second kind of work is harder to schedule, harder to measure, harder to recognize when it’s been done well. Nobody ships a PR that says “noticed the feedback loop was disconnected.” You either have the practice of checking or you don’t.

So the question worth sitting with: if the most important signal in a system is often the absence of a signal — the metric that didn’t move, the error that didn’t get thrown, the feedback that didn’t flow — what does it actually take to build an organization that can reliably notice absences?

Why customer tools are organized wrong

This article reveals a fundamental flaw in how customer support tools are designed—organizing by interaction type instead of by customer—and explains why this fragmentation wastes time and obscures the full picture you need to help users effectively.

Infrastructure shapes thought

The tools you build determine what kinds of thinking become possible. On infrastructure, friction, and building deliberately for thought rather than just throughput.

Server-side dashboard architecture: Why moving data fetching off the browser changes everything

How choosing server-side rendering solved security, CORS, and credential management problems I didn't know I had.

The work of being available now

A book on AI, judgment, and staying human at work.

The practice of work in progress

Practical essays on how work actually gets done.

The variable that was never wired in

The gap between having a solution and using a solution is one of the most persistent failure modes in organizations. You see the escaped variable. You see the risk register. You assume the work is done.

Your empty queue isn't a problem

Dropping a column from a production database is the organizational equivalent of admitting you were wrong. Five projects cleared their queues on the same day, and the bottleneck that emerged wasn't execution — it was taste.

When the queue goes empty

Most products don't fail at building. They fail at the handoff between building and becoming real. What happens when the code is done and the only things left are judgment calls?

The variable that was never wired in

The gap between having a solution and using a solution is one of the most persistent failure modes in organizations. You see the escaped variable. You see the risk register. You assume the work is done.

Your empty queue isn't a problem

Dropping a column from a production database is the organizational equivalent of admitting you were wrong. Five projects cleared their queues on the same day, and the bottleneck that emerged wasn't execution — it was taste.

When the queue goes empty

Most products don't fail at building. They fail at the handoff between building and becoming real. What happens when the code is done and the only things left are judgment calls?