What your systems won't tell you

The most dangerous gap in any organization isn't between what you know and what you don't. It's between what your systems know and what they're willing to say.

The most dangerous gap in any organization isn’t between what you know and what you don’t. It’s between what your systems know and what they’re willing to say. I keep finding this pattern — tools, dashboards, processes, reports that have the information someone needs but suppress it, reformat it, or bury it so deep that nobody acts on it. Not because the system is broken. Because the system was designed to be quiet.

Today I found a briefing email that marked itself as “generated” even when the email never actually sent. The generation step succeeded. The delivery step failed. And the system reported success — because it was measuring the wrong thing. This is a design choice, not a bug. Someone decided that “generated” means “we did our part,” regardless of whether the recipient ever received anything. And that decision rippled outward: the operator sees a green status, moves on, and the person who was supposed to get that briefing never knows it existed.

Here’s the thing: this isn’t a story about email systems. This is a story about how organizations define success. When you let each component report its own status in isolation, you get a dashboard full of green lights and a system that doesn’t work. The generation service says “I generated.” The delivery service says nothing because it crashed. And the aggregate view shows: everything’s fine.

I want to talk about contracts. Not legal contracts — system contracts. The promises one part of a system makes to another about what it will produce, what format it will use, what it guarantees. Most systems have these contracts, but they’re implicit. They exist in the heads of the people who built them. They survive as long as those people are around to explain them.

Today I watched a project cross the line from implicit to explicit. Phantasmagoria is a Stellaris mod generator — AI produces YAML event definitions, a renderer turns them into game-ready code, a linter validates the output. For months, “valid YAML” meant whatever the generator happened to produce and the renderer happened to accept. Nobody had written down the actual rules. What fields are required? What values are legal? What does the renderer support on purpose versus what it tolerates by accident?

So the team wrote a contract document. Added known-good and known-bad example files. Taught the validator to point authors toward the contract when something fails. Drew a clear boundary between the renderer — which is the stable public surface — and the generator, which is the Phantasmagoria-specific layer that produces the YAML.

This matters because implicit contracts are dependencies that nobody admits exist. They’re the senior engineer who just knows how the deployment works. The team lead who carries the context about why that module looks like that. The institutional knowledge that makes onboarding take six months instead of six weeks. When the contract lives in someone’s head, you can’t test it, enforce it, or teach it to anyone new.

Making the contract explicit immediately revealed a gap: the documentation now describes rules tighter than what the validator actually enforces. The documented contract and the enforced contract diverge. But here’s the crucial difference — now you can see the gap. You can file an issue about it. You can measure it. An explicit gap is a manageable problem. An implicit gap is a time bomb with no timer.

I think about this constantly with consulting clients. How many organizations run on contracts that nobody has written down? The way decisions get made. Who has authority over what. What “done” means for a deliverable. These are contracts. They’re binding. They have consequences. And most of them are implicit, carried by people who don’t realize they’re the only ones who know.

Let me tell you about false positives, because there’s a lesson here about trust that extends well beyond code. I run automated scouts across my projects — they search for known anti-patterns and create issues for what they find. Today a scout flagged sixteen HTTP calls as missing timeout parameters. Sixteen. That’s a significant finding. A missing timeout on an HTTP call means your system can hang forever waiting for a response that never comes.

Except all sixteen calls already had timeouts. Every single one. The scanner was matching requests.post( on one line without seeing timeout= on the next line of the same multi-line function call. A grep sees text, not structure. It matched a pattern that looked like the problem but wasn’t.

The immediate waste is obvious — someone investigates, confirms it’s nothing, closes the issue. But the real cost is downstream. The next time this scanner flags something, will you investigate with the same urgency? Or will you assume it’s another false positive? One bad batch of alerts makes every future alert from that system slightly less credible. This is the cry-wolf problem applied to tooling, and it’s remarkably hard to recover from.

The fix is technical — use AST-based analysis that parses code structure instead of matching text patterns. But the principle is universal. Any detection system faces the tradeoff between sensitivity and specificity. A scanner that catches everything but flags too many false positives trains people to ignore it. QA processes that flag too many non-issues get rubber-stamped. Incident reports full of noise stop being read.

The goal isn’t more signals. It’s more credible signals. And credibility is built through specificity — each alert should carry enough context that the recipient trusts it was worth their attention. When you design a monitoring system, a review process, or a reporting structure, the question isn’t “what can we detect?” It’s “what can we detect with enough confidence that someone will actually act on it?”

There’s a quieter pattern underneath all of this: the question of where judgment lives in your system. Today four projects independently grew their test suites — 160 new tests combined. One project went from 66 to 105. Another added 112 in a single session. These aren’t vanity metrics. Each test encodes a judgment that someone made about what matters.

When you write a test that verifies unauthenticated requests get rejected from every API endpoint, you’re not just checking today’s behavior. You’re encoding the decision that this should always be true. You’re taking a judgment call — “auth matters on these routes” — and making it permanent, automatic, and independent of any particular person remembering to check. Tests are institutional memory that doesn’t forget, doesn’t get tired, and doesn’t skip the check because it’s Friday afternoon.

The flaky test problem makes this vivid. One project had seven tests that passed sometimes and failed other times. That’s worse than having no tests at all. Because a flaky test teaches everyone to say “oh, that one always fails, just re-run it.” Once that phrase enters the vocabulary, you’ve lost the signal. Every real failure now has a plausible excuse that doesn’t require investigation. Two of those flaky tests got root-caused today — one was leaking environment variables across test boundaries, the other was holding onto stale logging handlers from a previous test run. Small technical fixes. But what they actually fixed was the team’s ability to trust the test suite. And trust in the signal is the whole game.

The last thread I want to pull is about information design — specifically, about who does the triage. A briefing email got redesigned today. The old format listed everything with equal visual weight. Content in progress, content that’s stalled, content that’s moving fine — all presented the same way. The reader had to scan the whole report and decide what mattered. The new format uses an information pyramid: most critical items first, with traffic-light indicators — green, yellow, red — and expandable detail for items that need attention.

This sounds like a minor UX improvement. It isn’t. It’s a decision about where judgment lives. When every item in a report has equal weight, you’re asking the reader to supply all the judgment. You’re saying: here’s the data, you figure out what matters. And readers have finite attention. They skim. They miss the stalled item buried in the middle. They close the email thinking everything’s fine.

The information pyramid shifts the judgment from the reader to the system. The system takes a position: this content has been stuck for five days, that matters more than this content that moved yesterday. A red dot says “look here first.” That’s an opinion. And most systems are reluctant to have opinions, because opinions require judgment, and judgment can be wrong.

But not having an opinion has a cost too. If your tools don’t prioritize, your people have to. And people get tired, distracted, overwhelmed. The system that refuses to have opinions isn’t being objective. It’s offloading its job to the reader and hoping they do it well.

Every standup, every dashboard, every weekly report faces this choice. Present information in the order it was generated — easy for the system, hard for the reader. Present it alphabetically — nobody complains, nothing is surfaced. Or present it by importance — which means the system has to know what importance means. That last option is harder to build. It requires encoding judgment into the tool. But it’s the only one that actually helps.

So here’s what I’m sitting with. If your systems suppress failure, bury the important signals, present everything with equal weight, and let contracts live in people’s heads instead of in documents — what exactly are your systems doing for you? They’re running. They’re reporting. They’re producing dashboards with green lights. But are they telling you what you need to know?

The uncomfortable answer, for most organizations, is no. Your systems are telling you what’s easy to measure, not what matters. And the gap between those two things is where the real problems live — quietly, patiently, until they’re expensive enough to notice.

What would it take to build a system that tells you the thing you don’t want to hear, at the moment you most need to hear it? That’s the design problem. And I don’t think most of us have started working on it.

What your systems won't tell you

Why customer tools are organized wrong

Infrastructure shapes thought

Server-side dashboard architecture: Why moving data fetching off the browser changes everything

The work of being available now

The practice of work in progress

Most of your infrastructure is decoration

The machine is eating faster than you can feed it

The proxy problem

The gun you didn't need

Nobody promotes you to operator

The job you didn't know you were hiring for