Skip to main content
Paul Welty, PhD AI, WORK, AND STAYING HUMAN

· development

Silence by design

Most systems have more suppression than their owners realize. It gets installed for good reasons. The cost accumulates slowly, in the form of systems you can't operate because you've removed the signals that would let you understand them.

Duration: 10:38 | Size: 12.2 MB

There’s a design decision that looks like resilience and acts like rot. The one where you catch an exception and don’t let it propagate. Return the original text. Set the status to “succeeded.” The system keeps running. Nobody gets paged. The error rate stays flat.

What you’ve done is decide that whoever is responsible for the system should not know it’s broken — unless they’re watching closely enough to notice the absence of something that should be happening. Absences are hard to notice. There’s no alert for “the thing that should have happened, didn’t.” So the silence accumulates, and the system looks healthy because it keeps running.

There’s a difference between handling an error and suppressing it. Handling acknowledges that something went wrong and manages the impact. You return a cached value, queue the work for retry, tell the user. Suppression pretends the error didn’t happen. The handler catches the exception, discards it, returns something that looks like success. The caller doesn’t know. The monitoring doesn’t know. The log shows “processed” even when nothing was processed.

Most systems have more suppression than their owners realize. It gets installed for good reasons — graceful degradation, resilience, user experience. You don’t want to show an error page when a thumbnail fails to generate. You don’t want to crash the whole request because a non-essential service is down. Those instincts are right. What matters is whether you log it somewhere visible, or log it nowhere and move on. The difference between those two is the difference between a system you can operate and a system you can only observe from the outside. You can tell it’s running. You can’t tell what it’s doing.


A related pattern shows up in how organizations absorb updates. There’s a concept in software configuration called override — when you load new settings, do you let them win over whatever’s already in the environment, or do you preserve whatever’s already there? The default in a lot of systems is: don’t override. If the environment already has a value, keep it. The file you just updated doesn’t matter.

This is often the right default. Environment-specific values should win over file defaults. Secrets in the environment should take precedence over checked-in placeholders. The problem surfaces when the file you just updated is supposed to be authoritative, and you discover the environment has been ignoring every change you’ve made since the last restart.

You updated the key. You sent the announcement. You changed the policy. But the organization was already running with the old value in its environment, and the configuration said preserve what’s already there. So it kept running the old behavior — not because anyone refused the new information, but because the mechanism for updating was set up to protect whatever was already installed.

This is one of the harder failure modes in organizational change. Not resistance — inertia. The system is working correctly, according to its own rules. The rules say: what you already have wins. To get new information to take effect, you have to explicitly create conditions where it’s allowed to. That’s not automatic. It requires someone to recognize that the current environment is stale and decide to let the update through.

Most change management writing focuses on resistance. This failure mode is more subtle. Nobody resisted. The announcement went out. The memo was sent. The value in the file was updated. And the organization kept running what it already had, because the mechanism said to.


A failure mode harder to see than either of those looks exactly like success. A test suite that always passes. Green checks on every run. Someone merged a change, ran the tests, saw green, shipped it. Weeks later, you discover that one of those tests was checking whether a mock was called — when the actual behavior it was supposed to verify was happening through an unmocked path. The test was passing because it was asking the wrong question.

This is different from a flaky test. A flaky test tells you something is wrong sometimes. A test that passes while checking the wrong thing tells you everything is fine, indefinitely. Confidence without basis. The monitoring is running. The alerts are configured. The suite is green. And the thing you thought you were verifying has been doing something else entirely.

The organizational version is metrics that measure proxies instead of outcomes. Customer satisfaction surveys that measure whether the support interaction felt good instead of whether the problem was solved. Engagement scores that measure whether people feel heard instead of whether they stay. Analytics that track page views instead of whether the user accomplished what they came to do. Each metric captures something real. Each can go up while the thing it’s supposed to represent goes down.

The people who designed those measurements often knew they were using proxies. Solving-rate was hard to instrument. Problem-resolution was hard to define. The proxy was the best available option. But over time the proxy becomes the objective, and the organization optimizes for the metric rather than the outcome. The test passes. Nobody checks whether the test is asking the right question.

The fix is cheap when you find it. One line of code. The hard part is that confidence was the failure mode, not any visible error. You have to be suspicious of green.


When errors from three different layers all flow into the same monitoring stream, you lose the ability to diagnose by location. You know something is breaking. You don’t know where. You have a number — the error rate — and it’s a blend. Web layer errors, API failures, background job crashes, all combined into one line on the chart. When it goes up, you dig into every log to figure out what moved and where.

The fix is separation. Not because separation reveals new information — the errors were always there. But because attribution becomes possible once you stop blending the streams. You can look at the web layer independently from the background jobs. When one moves, you know where to look. The diagnostic is faster. The signal is useful.

Organizations resist disaggregation for a reason. Separation makes clear who owns what. When the error rate goes up in a blended number, there’s ambiguity about whose problem it is. When it goes up specifically in one layer, somebody owns that layer, and that somebody has an unambiguous signal. Clarity about attribution is also clarity about responsibility. Blended metrics often survive because the blend is more comfortable for everyone than separation would be. The ambiguity isn’t incidental. It’s load-bearing.


Something shifted today in how the pipeline that manages autonomous AI work is structured. The process that used to decide when to run AI agents — previously a separately-managed external daemon — now lives inside the agent itself. The agent checks the queues, decides what to do next, executes. The external orchestration became internal reasoning.

I want to say something careful about what that implies.

The previous setup had a clean separation: infrastructure decided when the AI ran; the AI ran. The human equivalent is the difference between a manager who decides when to delegate and the worker who decides what to do with the work once it arrives. Separate people, separate jobs. The separation is usually intentional — it distributes accountability in a specific way. The person who decides what gets done is different from the person doing it, and that difference matters for oversight.

When the agent internalizes the queue logic, that separation dissolves. The same system now does both. It decides whether there’s work to do, picks what to prioritize, and executes. The orchestration isn’t a separate process — it’s part of the reasoning.

The interesting question isn’t technical. It’s about which kinds of judgment should stay separated and why. There are good reasons for the delegation structure. A triage system that decides which patients a doctor sees is exercising one kind of judgment; the doctor treating the patient is exercising a different kind, and keeping them separate has value — for accountability, for expertise, for making sure the right person makes each call.

As AI agents absorb more of both kinds of work, the question of what should stay separated gets more pressing. Right now most of the energy goes toward making execution better. The AI does the task faster, with fewer errors, at lower cost. But the decisions about what to execute — which queue, which priority, when to explore versus when to deliver — those are organizational choices, and they’re being absorbed quietly. At some point the relevant question isn’t “is the agent good at its job?” but “whose values are in the prioritization logic, and who decided they should be there?”


Most of what shipped today was not new capability. It was removing deliberately-installed obstacles to knowing. Suppressed exceptions. Configuration that ignored updates. Tests verifying the wrong thing. Blended error streams. Each was installed with reasons that seemed good at the time. Each was making it harder to understand what was actually happening in the systems it was part of.

The systems that operate well over time share a property: they don’t trade future understanding for present smoothness. They don’t suppress errors to keep the dashboard green. They don’t blend metrics to avoid the discomfort of clear attribution. They let things fail loudly. They check whether their monitors are actually monitoring what they think they’re monitoring.

That’s easier to say than to maintain. The pressure toward smoothness is real. An error page is worse experience than a silent fallback. An unambiguous signal is harder to sit with than a blended number that stays amber. The tradeoff always seems worth it in the moment. The cost accumulates slowly, in the form of systems you can’t operate because you’ve removed the signals that would let you understand them.

So here’s what to actually ask about the systems you’re responsible for — the technical ones, the organizational ones, both: what did someone deliberately silence? Not what’s broken. What got quieted on purpose, for reasons that seemed reasonable, that’s now making it impossible to know what the system is actually doing?

Why customer tools are organized wrong

This article reveals a fundamental flaw in how customer support tools are designed—organizing by interaction type instead of by customer—and explains why this fragmentation wastes time and obscures the full picture you need to help users effectively.

Infrastructure shapes thought

The tools you build determine what kinds of thinking become possible. On infrastructure, friction, and building deliberately for thought rather than just throughput.

Server-side dashboard architecture: Why moving data fetching off the browser changes everything

How choosing server-side rendering solved security, CORS, and credential management problems I didn't know I had.

The work of being available now

A book on AI, judgment, and staying human at work.

The practice of work in progress

Practical essays on how work actually gets done.

Designed to learn, built to ignore

The most dangerous organizational failures don't throw errors. They look fine, return results, and quietly stay frozen at the moment of their creation.

The variable that was never wired in

The gap between having a solution and using a solution is one of the most persistent failure modes in organizations. You see the escaped variable. You see the risk register. You assume the work is done.

Your empty queue isn't a problem

Dropping a column from a production database is the organizational equivalent of admitting you were wrong. Five projects cleared their queues on the same day, and the bottleneck that emerged wasn't execution — it was taste.

Designed to learn, built to ignore

The most dangerous organizational failures don't throw errors. They look fine, return results, and quietly stay frozen at the moment of their creation.

The variable that was never wired in

The gap between having a solution and using a solution is one of the most persistent failure modes in organizations. You see the escaped variable. You see the risk register. You assume the work is done.

Your empty queue isn't a problem

Dropping a column from a production database is the organizational equivalent of admitting you were wrong. Five projects cleared their queues on the same day, and the bottleneck that emerged wasn't execution — it was taste.