Skip to main content
Paul Welty, PhD AI, WORK, AND STAYING HUMAN

· development

What your systems won't tell you

The most dangerous gap in any organization isn't between what you know and what you don't. It's between what your systems know and what they're willing to say.

Duration: 9:30 | Size: 8.70 MB

The most dangerous gap in any organization isn’t between what you know and what you don’t. It’s between what your systems know and what they’re willing to say. I keep finding this pattern — tools, dashboards, processes, reports that have the information someone needs but suppress it, reformat it, or bury it so deep that nobody acts on it. Not because the system is broken. Because the system was designed to be quiet.

Today I found a briefing email that marked itself as “generated” even when the email never actually sent. The generation step succeeded. The delivery step failed. And the system reported success — because it was measuring the wrong thing. This is a design choice, not a bug. Someone decided that “generated” means “we did our part,” regardless of whether the recipient ever received anything. And that decision rippled outward: the operator sees a green status, moves on, and the person who was supposed to get that briefing never knows it existed.

Here’s the thing: this isn’t a story about email systems. This is a story about how organizations define success. When you let each component report its own status in isolation, you get a dashboard full of green lights and a system that doesn’t work. The generation service says “I generated.” The delivery service says nothing because it crashed. And the aggregate view shows: everything’s fine.


I want to talk about contracts. Not legal contracts — system contracts. The promises one part of a system makes to another about what it will produce, what format it will use, what it guarantees. Most systems have these contracts, but they’re implicit. They exist in the heads of the people who built them. They survive as long as those people are around to explain them.

Today I watched a project cross the line from implicit to explicit. Phantasmagoria is a Stellaris mod generator — AI produces YAML event definitions, a renderer turns them into game-ready code, a linter validates the output. For months, “valid YAML” meant whatever the generator happened to produce and the renderer happened to accept. Nobody had written down the actual rules. What fields are required? What values are legal? What does the renderer support on purpose versus what it tolerates by accident?

So the team wrote a contract document. Added known-good and known-bad example files. Taught the validator to point authors toward the contract when something fails. Drew a clear boundary between the renderer — which is the stable public surface — and the generator, which is the Phantasmagoria-specific layer that produces the YAML.

This matters because implicit contracts are dependencies that nobody admits exist. They’re the senior engineer who just knows how the deployment works. The team lead who carries the context about why that module looks like that. The institutional knowledge that makes onboarding take six months instead of six weeks. When the contract lives in someone’s head, you can’t test it, enforce it, or teach it to anyone new.

Making the contract explicit immediately revealed a gap: the documentation now describes rules tighter than what the validator actually enforces. The documented contract and the enforced contract diverge. But here’s the crucial difference — now you can see the gap. You can file an issue about it. You can measure it. An explicit gap is a manageable problem. An implicit gap is a time bomb with no timer.

I think about this constantly with consulting clients. How many organizations run on contracts that nobody has written down? The way decisions get made. Who has authority over what. What “done” means for a deliverable. These are contracts. They’re binding. They have consequences. And most of them are implicit, carried by people who don’t realize they’re the only ones who know.


Let me tell you about false positives, because there’s a lesson here about trust that extends well beyond code. I run automated scouts across my projects — they search for known anti-patterns and create issues for what they find. Today a scout flagged sixteen HTTP calls as missing timeout parameters. Sixteen. That’s a significant finding. A missing timeout on an HTTP call means your system can hang forever waiting for a response that never comes.

Except all sixteen calls already had timeouts. Every single one. The scanner was matching requests.post( on one line without seeing timeout= on the next line of the same multi-line function call. A grep sees text, not structure. It matched a pattern that looked like the problem but wasn’t.

The immediate waste is obvious — someone investigates, confirms it’s nothing, closes the issue. But the real cost is downstream. The next time this scanner flags something, will you investigate with the same urgency? Or will you assume it’s another false positive? One bad batch of alerts makes every future alert from that system slightly less credible. This is the cry-wolf problem applied to tooling, and it’s remarkably hard to recover from.

The fix is technical — use AST-based analysis that parses code structure instead of matching text patterns. But the principle is universal. Any detection system faces the tradeoff between sensitivity and specificity. A scanner that catches everything but flags too many false positives trains people to ignore it. QA processes that flag too many non-issues get rubber-stamped. Incident reports full of noise stop being read.

The goal isn’t more signals. It’s more credible signals. And credibility is built through specificity — each alert should carry enough context that the recipient trusts it was worth their attention. When you design a monitoring system, a review process, or a reporting structure, the question isn’t “what can we detect?” It’s “what can we detect with enough confidence that someone will actually act on it?”


There’s a quieter pattern underneath all of this: the question of where judgment lives in your system. Today four projects independently grew their test suites — 160 new tests combined. One project went from 66 to 105. Another added 112 in a single session. These aren’t vanity metrics. Each test encodes a judgment that someone made about what matters.

When you write a test that verifies unauthenticated requests get rejected from every API endpoint, you’re not just checking today’s behavior. You’re encoding the decision that this should always be true. You’re taking a judgment call — “auth matters on these routes” — and making it permanent, automatic, and independent of any particular person remembering to check. Tests are institutional memory that doesn’t forget, doesn’t get tired, and doesn’t skip the check because it’s Friday afternoon.

The flaky test problem makes this vivid. One project had seven tests that passed sometimes and failed other times. That’s worse than having no tests at all. Because a flaky test teaches everyone to say “oh, that one always fails, just re-run it.” Once that phrase enters the vocabulary, you’ve lost the signal. Every real failure now has a plausible excuse that doesn’t require investigation. Two of those flaky tests got root-caused today — one was leaking environment variables across test boundaries, the other was holding onto stale logging handlers from a previous test run. Small technical fixes. But what they actually fixed was the team’s ability to trust the test suite. And trust in the signal is the whole game.


The last thread I want to pull is about information design — specifically, about who does the triage. A briefing email got redesigned today. The old format listed everything with equal visual weight. Content in progress, content that’s stalled, content that’s moving fine — all presented the same way. The reader had to scan the whole report and decide what mattered. The new format uses an information pyramid: most critical items first, with traffic-light indicators — green, yellow, red — and expandable detail for items that need attention.

This sounds like a minor UX improvement. It isn’t. It’s a decision about where judgment lives. When every item in a report has equal weight, you’re asking the reader to supply all the judgment. You’re saying: here’s the data, you figure out what matters. And readers have finite attention. They skim. They miss the stalled item buried in the middle. They close the email thinking everything’s fine.

The information pyramid shifts the judgment from the reader to the system. The system takes a position: this content has been stuck for five days, that matters more than this content that moved yesterday. A red dot says “look here first.” That’s an opinion. And most systems are reluctant to have opinions, because opinions require judgment, and judgment can be wrong.

But not having an opinion has a cost too. If your tools don’t prioritize, your people have to. And people get tired, distracted, overwhelmed. The system that refuses to have opinions isn’t being objective. It’s offloading its job to the reader and hoping they do it well.

Every standup, every dashboard, every weekly report faces this choice. Present information in the order it was generated — easy for the system, hard for the reader. Present it alphabetically — nobody complains, nothing is surfaced. Or present it by importance — which means the system has to know what importance means. That last option is harder to build. It requires encoding judgment into the tool. But it’s the only one that actually helps.


So here’s what I’m sitting with. If your systems suppress failure, bury the important signals, present everything with equal weight, and let contracts live in people’s heads instead of in documents — what exactly are your systems doing for you? They’re running. They’re reporting. They’re producing dashboards with green lights. But are they telling you what you need to know?

The uncomfortable answer, for most organizations, is no. Your systems are telling you what’s easy to measure, not what matters. And the gap between those two things is where the real problems live — quietly, patiently, until they’re expensive enough to notice.

What would it take to build a system that tells you the thing you don’t want to hear, at the moment you most need to hear it? That’s the design problem. And I don’t think most of us have started working on it.

Why customer tools are organized wrong

This article reveals a fundamental flaw in how customer support tools are designed—organizing by interaction type instead of by customer—and explains why this fragmentation wastes time and obscures the full picture you need to help users effectively.

Infrastructure shapes thought

The tools you build determine what kinds of thinking become possible. On infrastructure, friction, and building deliberately for thought rather than just throughput.

Server-side dashboard architecture: Why moving data fetching off the browser changes everything

How choosing server-side rendering solved security, CORS, and credential management problems I didn't know I had.

The work of being available now

A book on AI, judgment, and staying human at work.

The practice of work in progress

Practical essays on how work actually gets done.

Most of your infrastructure is decoration

Organizations are full of things that look like governance, strategy, and quality control but are actually decorative. The trigger conditions nobody reads, the dashboards nobody checks, the review processes that rubber-stamp. When you finally audit what's functional versus ornamental, the ratio is alarming.

The machine is eating faster than you can feed it

Sixty-three issues closed across thirteen projects in one day. Four milestones completed. And the hardest problem wasn't building — it was keeping up with what you've already built.

The proxy problem

Every organization has this problem: knowledge locked inside one person's head. Today I accidentally designed a solution — and it has nothing to do with documentation.

The gun you didn't need

Every organization has loaded weapons lying around that nobody remembers loading. The most dangerous capability in any system is the one you built 'just in case.'

Nobody promotes you to operator

There's a moment in every project where the work stops being about building and starts being about keeping things running. Nobody announces this transition. Nobody gives you new tools for it. And most people keep building long past the point where they should have stopped.

The job you didn't know you were hiring for

Most organizations hire for tasks. The ones that survive hire for attention. And attention turns out to be the hardest thing to delegate.