Paul Welty, PhD AI, WORK, AND STAYING HUMAN

The loop nobody bothers to close

Most systems observe. Almost none learn. The difference is a feedback loop — and the boring cleanup work that makes it possible.

Duration: 6:43 | Size: 6.15 MB


Hey, it’s Paul. Wednesday, March 4th.

I built a system today that learns from its own output. Not in the dramatic, Skynet way people imagine when they hear that phrase. In the boring way. The way that actually works.

Here’s what happened. I have a project — a reading tool — that scores articles for relevance. It pulls from RSS feeds, runs each article through a scoring model, and surfaces the ones most likely to matter to a specific reader. The score was always static: one prompt, one pass, one number. If the model got it wrong, tough luck. The reader would just stop using it.

So I closed the loop. Now, when a reader clicks an article, bookmarks it, votes it up, subscribes to the feed it came from — those signals get recorded. Periodically, a small language model reviews the accumulated pattern and writes a summary of what this person actually cares about. That summary gets injected into the scoring prompt next time around. The next batch of articles arrives scored with the reader’s real behavior baked in.

It’s not magic. It’s a feedback loop. Observe, analyze, adjust, repeat. The kind of thing control theory figured out in the 1940s. And yet almost nobody builds them. Not because they’re hard. Because the prerequisite is boring.

Before I could build any of this, I had to spend the morning stripping HTML tags out of article summaries. Normalizing URLs so the same article from three different RSS feeds didn’t show up three times in the database. Fixing character encoding issues where em dashes became garbled byte sequences. Truncating titles that some feeds let run to 500 characters. Deduplicating articles that shared the same content but came through different syndication paths.

Cleanup. The kind of work that doesn’t feel like progress because nothing visible changes when you’re done.

But you cannot build intelligence on top of dirty inputs. If the articles have broken HTML in their summaries, the scoring model hallucinates about formatting artifacts instead of evaluating content. If the URLs aren’t normalized, the engagement signals scatter across duplicate records — the system can’t learn that you liked an article when it thinks you read three different articles. If the encoding is wrong, the model sees garbage where it should see language.

This is true in organizations too, and it’s the part everyone wants to skip. Executives ask: “How do we use AI to make better decisions?” The honest answer is almost always: “First, clean your data.” Clean your CRM. Deduplicate your contacts. Standardize your naming conventions. Reconcile the three different spreadsheets that track the same metrics using different definitions.

Nobody wants to hear that. It’s not a keynote-worthy answer. It doesn’t fit on a strategy slide. So they build dashboards on top of garbage data and wonder why the insights don’t feel actionable. They buy AI tools and feed them conflicting inputs and blame the model when the outputs are incoherent.

The intelligence was never the bottleneck. The data quality was.

Here’s the related problem. Monitoring is not the same thing as learning. I also instrumented two projects today with analytics and error tracking — PostHog for usage patterns, Sentry for runtime errors. Observation tools. They tell you what happened. And observation is valuable. You need it. It’s step one.

But observation is only step one. Learning is step two. And the gap between them is where most systems — and most organizations — permanently stall.

Think about how a typical company handles its error logs. Something breaks. An alert fires. Someone investigates, fixes it, moves on. That’s monitoring. Now imagine the system itself reviewed those errors, identified patterns, created tickets for the recurring ones, and routed them to the right team with full context. That’s closing the loop — taking what you observed and turning it into action without waiting for a human to notice the pattern.

I built a triage command today that does exactly this. It scans for new errors across multiple projects, deduplicates against existing tickets so it doesn’t create noise, and files GitHub issues with the full stack trace, event count, and a link back to the source. The distance between “error detected” and “error tracked” went from hours of manual checking to zero.

It still takes a human to actually fix the error. The loop isn’t fully closed. But each step you eliminate between observation and action makes the system faster, and more importantly, makes the patterns visible. When error triage is automated, you stop losing signal to the randomness of who happened to check the dashboard that morning.

What’s interesting is how resistant organizations are to this. There’s a deep, unexamined assumption that every step between seeing a problem and doing something about it requires human judgment. Sometimes it does. But for a system that reads error logs and creates tickets? The judgment is already built into the output format — a human still decides whether and how to fix it. The triage step was always just labor, never judgment. Automating it doesn’t remove oversight. It frees up the oversight for the parts that actually need it.

One more thing. I ran agents in parallel today across five projects and closed twenty-six issues. Four agents simultaneously grinding through different parts of the same codebase. Five agents working on separate issues using isolated copies of a repository.

It mostly worked. Except once. One agent branched off the repository before other agents had merged their changes. Its isolated copy didn’t include fixes that had already shipped. So it carried forward bugs that were already solved, and its work had to be reconciled against a codebase that had moved on without it.

The lesson isn’t that parallelism is dangerous. The lesson is that independence is an assumption, not a fact. And the more you parallelize — whether that’s AI agents or human teams — the more carefully you have to verify that assumption.

In software, this shows up as merge conflicts. In organizations, it shows up as teams building features that contradict each other, or making decisions based on stale context because nobody told them about the meeting that changed the direction last Tuesday.

Scale makes the coordination problem worse, not better. Every new parallel workstream you add is another place where the assumption of independence might be wrong.

So here’s the pattern underneath all of this. The intelligent work — the learning loops, the self-improving models, the autonomous triage — always depends on the unglamorous work being done first. Clean the inputs. Close the gap between observing and acting. Verify that the things you think are independent actually are. Then build.

Every system I know — software, organizational, personal — has at least one feedback loop that nobody bothered to close. A place where signals are collected and never analyzed. Where errors are logged and never triaged. Where user behavior is tracked and never fed back into the product that generated it.

What would change if you found yours?

Why customer tools are organized wrong

This article reveals a fundamental flaw in how customer support tools are designed—organizing by interaction type instead of by customer—and explains why this fragmentation wastes time and obscures the full picture you need to help users effectively.

Infrastructure shapes thought

The tools you build determine what kinds of thinking become possible. On infrastructure, friction, and building deliberately for thought rather than just throughput.

Server-side dashboard architecture: Why moving data fetching off the browser changes everything

How choosing server-side rendering solved security, CORS, and credential management problems I didn't know I had.

The work of being available now

A book on AI, judgment, and staying human at work.

The practice of work in progress

Practical essays on how work actually gets done.

Your process was built for a different speed

When work changes velocity, governance systems don't just fall behind. They become theater. And theater is worse than nothing—it gives you the feeling of control without any of the substance.

The difference between shipping and finishing

Shipping is mechanical. Finishing is a judgment call. And most organizations have quietly made it impossible to tell the difference.

Nothing is finished until you say it is

Continuous delivery removed the endings from work. That felt like progress. But without formal completion, you lose the ability to say what you actually accomplished — and more importantly, what you're done thinking about.

Build for the loop, not the lecture

A junior developer used to wait days for mentor feedback. Now that loop closes in seconds. When feedback is scarce, you batch your questions. When feedback is abundant, learning becomes continuous. AI changes the supply side of learning—most of our systems weren't designed for this.

The difference between shipping and finishing

Shipping is mechanical. Finishing is a judgment call. And most organizations have quietly made it impossible to tell the difference.

Nothing is finished until you say it is

Continuous delivery removed the endings from work. That felt like progress. But without formal completion, you lose the ability to say what you actually accomplished — and more importantly, what you're done thinking about.