When your agents start breaking each other's code
Two agents modified the same file independently and created database locks. The fleet hit 135 issues in one day — and the coordination problem that comes with it.
Duration: 8:51 | Size: 8.11 MB
When two people edit the same document at the same time, we call it a collaboration problem. When two AI agents edit the same file independently and one of them creates database locks that crash the production engine, we call it Tuesday.
This happened today in Authexis. The content model was being rewritten — a JSONB blob flattened into proper columns, the stage machinery replaced with a simple status field, 38 commits in a single day. Two parallel agents were working on different issues that both touched the same function. Both added dual-write logic to update_stage_field. The result: double UPDATEs that created database locks and hung the engine.
Nobody coordinated them. Nobody told agent B that agent A had already modified that function. There was no pull request review, no merge conflict, no warning. Both agents did exactly what their issue specs said. Both were correct in isolation. Together, they broke production.
This is the coordination problem that nobody talks about when they talk about AI agents. The discourse is about making agents smarter, giving them better tools, longer context windows, more sophisticated reasoning. All individual capability. But the moment you have two agents working on the same codebase, you have an organizational problem, not a capability problem. And organizational problems are not solved by making individuals smarter.
The second thing worth examining is what happened everywhere else while Authexis was having its coordination crisis. Dinly shipped a post-meal feedback loop and a pantry inventory system — 18 issues in one session, test count doubled. Eclectis fixed XSS vulnerabilities and added 28 handler tests. Scholexis went from zero tests to 75 and fixed an IDOR vulnerability that would have let an attacker delete another user’s data. Prakta’s serve-don’t-show experience went end-to-end. Phantasmagoria replaced its entire generation architecture. SimpleBooks completed its Rails-to-Next.js port.
135 issues closed across 11 projects. In one day.
This number should probably worry me more than it does. Not because the work is bad — the work is good. Real security fixes. Real architectural improvements. Real tests catching real bugs. But 135 issues means 135 changes to 11 codebases that nobody reviewed holistically. Each one was correct within its spec. But specs don’t know about each other. The Authexis database lock proved that.
The third insight is about testing as a fleet-wide pattern. Scholexis went from zero tests to 75. Dinly went from 33 to 65. Eclectis from 135 to 163. That’s 130 new tests across three projects in one day, all discovered independently by scout agents running parallel scans.
Nobody mandated a testing initiative. No one wrote a memo saying “all projects need better test coverage.” Each scout looked at its own codebase and independently concluded “this critical path is untested.” The pattern emerged because the same methodology — the scout skill with its parallel scan dimensions — was applied to different codebases and found the same category of problem everywhere.
This is how standards actually form in organizations. Not through mandate. Through independent discovery of the same truth. When three separate teams all conclude they need more tests, that’s stronger than a CTO saying “write more tests.” It’s convergent evolution. The conclusion is more robust because it was reached independently.
The fourth thing is about architectural rewrites happening simultaneously. Authexis rebuilt its content model. Phantasmagoria switched from roller-first to narrative-first generation. Prakta completed the serve-don’t-show experience. Three fundamental redesigns of how each product works, all shipping on the same day.
In a traditional organization, a rewrite is a quarter-long initiative. You plan it, you allocate resources, you sequence it against other priorities, you do the migration in phases. Here, three rewrites shipped in parallel, each driven by a single agent session that understood its own codebase deeply enough to make the change in one sitting.
The advantage is speed. The risk is exactly what happened in Authexis. When a rewrite ships in one session, there’s no time for the broader team to adjust. The function that agent B assumed was unchanged had been rewritten by agent A three hours earlier. In a quarter-long migration, that information would have propagated through code reviews, standup meetings, shared documentation. In a one-day rewrite, the information gap is measured in hours, and hours is enough for a production incident.
The fifth thing: the fleet moved to a server today. All 11 sessions now run on speedy-gonzales, a Mac Mini on Tailscale. Dev servers accessible remotely. Discord channel on the supervisor. Orchestrate agents cycling. The fleet no longer depends on my laptop being open.
This is a quiet change that restructures everything. When the fleet ran on my laptop, I was the infrastructure. If I closed the lid, everything stopped. Now the fleet runs while I sleep. The agents work overnight. The orchestrate loop dispatches issues at 2am. The daily rollover writes work logs at the day boundary.
The implications are still settling. When you can close your laptop and the work continues, your relationship to the work changes. You stop being the engine and start being the steering wheel. The daily rhythms shift — instead of “what should I build today?” it becomes “what did the fleet build while I was away, and is any of it wrong?”
That’s the real coordination problem. Not two agents editing the same file. That’s a technical bug with a technical fix — better locking, sequential merges, file-level ownership. The deeper problem is: when the fleet works faster than you can review, how do you stay confident that the work is correct? 135 issues in a day means 135 decisions you didn’t make. Some of them were security fixes that matter. Some of them were architectural choices that compound. And you find out about all of them in the morning, after they’ve already shipped to main.
The sixth thing: today we also built a marketing agency. Not a marketing tool. An agency. With roles (strategist, writer, launch manager), a methodology (NVN atoms — noun-verb-noun, hermetically sealed engagements), client onboarding, deliverable templates, and a review cycle that feeds corrections back into the methodology.
The core insight was Leibniz’s. The monad has no windows. Each engagement takes all its inputs at the start — the role doc, the guide, the template, the client documents — and produces a deliverable at the end. No side channels during execution. No interrupting the strategist mid-analysis to ask a question. If the inputs are incomplete, the dependency resolver catches it before the monad fires.
The first real engagement worked. The strategist role produced a competitive analysis for Eclectis. An external review found gaps. Those gaps became patterns in the strategist’s role doc and the competitive analysis guide. The next engagement for any client will be better because of this one. The methodology compounds.
What made this work wasn’t AI capability. It was structure. The role doc, the guide, and the template are just markdown files. They could be loaded into any LLM. The intelligence isn’t in the model. It’s in the documents the model reads. And those documents get smarter after every engagement because corrections update them.
This is the same principle as the testing convergence. The value isn’t in the individual agent. It’s in the methodology that the agents execute. The scout skill finds the same categories of problems across all codebases because the skill encodes what to look for. The strategist produces a competitive analysis because the guide encodes how. The model provides the reasoning. The documents provide the judgment.
Today the fleet closed 135 issues, added 130 tests, shipped three architectural rewrites, migrated to a server, and ran its first marketing engagement. The number that matters isn’t 135. It’s the one database lock that two agents created together that neither would have created alone.
The question I’m sitting with: is the coordination problem solvable at all, or is it just the tax you pay for parallel execution? Every system that runs agents in parallel will eventually have two of them touch the same resource. Locks, queues, ownership rules, review gates — these are all mitigations, not solutions. The fundamental tension is that agents work fastest when they’re independent, and independence means they don’t know what each other did.
Maybe the answer isn’t coordination. Maybe it’s detection. Don’t try to prevent the conflict. Build systems that catch it immediately when it happens. The database lock was caught within hours, not weeks. In a traditional organization, that dual-write bug could have lurked for months.
Speed of discovery might matter more than prevention of errors. Build fast, break fast, catch fast. 135 issues and one production incident might be a better ratio than 20 issues and zero incidents, if what you’re optimizing for is learning, not safety.
I’m not sure about that yet. But I notice the fleet is teaching me faster than I could learn on my own.
Why customer tools are organized wrong
This article reveals a fundamental flaw in how customer support tools are designed—organizing by interaction type instead of by customer—and explains why this fragmentation wastes time and obscures the full picture you need to help users effectively.
Infrastructure shapes thought
The tools you build determine what kinds of thinking become possible. On infrastructure, friction, and building deliberately for thought rather than just throughput.
Server-side dashboard architecture: Why moving data fetching off the browser changes everything
How choosing server-side rendering solved security, CORS, and credential management problems I didn't know I had.
The work of being available now
A book on AI, judgment, and staying human at work.
The practice of work in progress
Practical essays on how work actually gets done.
The removal tax
The most productive thing you can do with a product is take features away. Eighty-nine issues closed across eight projects, and the hardest lesson came from a pipeline that ran perfectly and produced nothing.
The product changed its mind
A product pivoted its entire philosophy mid-session — from 'here's your list' to 'here's your next thing.' The code shipped in the same conversation as the idea. That's not iteration. That's something else.
Your project management tool was made for a non-human (AI) factory, not for you
Every project or task management tool on the market descends from Frederick Taylor's factory floor. The assumptions were wrong then. They're catastrophic in the Age of AI.
The removal tax
The most productive thing you can do with a product is take features away. Eighty-nine issues closed across eight projects, and the hardest lesson came from a pipeline that ran perfectly and produced nothing.
The product changed its mind
A product pivoted its entire philosophy mid-session — from 'here's your list' to 'here's your next thing.' The code shipped in the same conversation as the idea. That's not iteration. That's something else.
The last mile is all the miles
Building the product is the fun part. Deploying it, configuring auth, pasting email templates into dashboards, rotating leaked API keys — that's where the work actually lives.