Paul Welty, PhD AI, WORK, AND STAYING HUMAN

· synthesis · cross-project

Work log synthesis: February 23, 2026

Cross-project synthesis for February 23, 2026

When the Machine Runs Faster Than You Can Think

What happens when your execution bottleneck disappears overnight? The authexis log shows six feature branches landing on main in a single day—circuit breakers, payment funnels, security hardening, admin tooling—while simultaneously prepping ten more issues with full specs and API audits. The polymathic-h log shows a newsletter essay written, edited through two test rounds, reformatted for LinkedIn, and scheduled for publication, all in one session. Both projects are running a similar pattern: AI agents grinding through implementation while humans work one layer up, reviewing code and writing specifications. But the logs reveal something more interesting than velocity. They show what breaks when you move this fast.

The Review Bottleneck Emerges When Implementation Accelerates

The authexis two-tab workflow—tab1 grinding code, tab2 reviewing PRs and prepping issues—produced six merged features in one day, but the review process caught three critical bugs that would have shipped broken: a missing trial expiration cron that would have let trials run forever, checkout sessions that didn’t verify workspace membership, and hardcoded subscription statuses that ignored Stripe’s actual state. The review agent kept reading stale local files instead of actual PR diffs, requiring manual verification via gh pr diff. The human wasn’t just rubber-stamping AI output. The human was the last line of defense against shipping a payment system that would have let anyone subscribe to anyone else’s workspace.

This isn’t a story about AI making mistakes. It’s a story about what happens when code review becomes the constraint. When implementation was slow, you could review as you went. When six PRs land in quick succession, you need a different process. The log notes “verify Vercel deployment succeeded and no runtime regressions” as a risk item—the system moved so fast that verification became asynchronous. The traditional “code, review, merge, verify” loop collapsed into “code and merge, then verify everything at once.” That’s a fundamentally different risk profile.

The polymathic-h newsletter went through two test email rounds with “iterative edits,” and the key edit was structural: moving the punch from the ending to the opening because “the newsletter format needs to hit immediately.” That’s not an AI failure—that’s a human editorial judgment about medium-specific constraints. But it happened during review, not during drafting. The AI wrote a perfectly good essay with a slow build. The human recognized that the distribution channel (email, LinkedIn) required a different structure. Review isn’t just catching bugs. It’s applying context the AI doesn’t have.

Specification Work Splits Into Two Modes With Different Cognitive Costs

The authexis log shows ten issues prepped with “detailed specs including API surface audits, file lists, and acceptance criteria.” That’s not writing code. That’s writing instructions for something else to write code. The work split into two distinct modes: grinding (tab1 implements from specs) and thinking (tab2 reviews output and writes new specs). The log explicitly names this: “tab1 (grinder) implemented and merged code, tab2 (thinker) reviewed PRs and prepped the entire upcoming issue queue.” The human isn’t writing code anymore. The human is writing specifications detailed enough that an AI can write code from them.

But specification work has its own bottleneck: you can’t spec what you haven’t designed. The Apple app roadmap was “fully specced out for grinding” across six phases (GH-389 through GH-393, plus GH-409), with API audits confirming all endpoints exist. That’s not a few sentences of direction. That’s architecture: which endpoints, which models, which screens, in which order. The polymathic-h log shows a different flavor of the same problem: the newsletter essay was “adapted from the Feb 22 daily reflection,” which means the specification (what to write about, what argument to make) came from a previous thinking session. You can’t automate the decision to write about specification bottlenecks. You can only automate the execution once you’ve decided.

The cognitive load didn’t disappear. It moved. Writing code is no longer the expensive part. Deciding what code to write is the expensive part. The authexis log created milestones v1-apple and v1-marketing to “separate web app, Apple app, and marketing work”—that’s portfolio management, not implementation. The human is now doing product management and architecture, with implementation as a fast background process. The question is whether humans can sustain that level of decision-making without the natural pacing that implementation used to provide.

Trust Boundaries Harden When You Can’t Inspect Everything

The authexis log flags a specific security concern: “The briefing_template.py change trusts admin MOTD/tips HTML (skip escaping)—if the admin API is ever called directly without DOMPurify, this is an XSS vector.” That’s a trust boundary: the code assumes sanitization happens upstream, at the admin UI layer. If that assumption breaks, the system is vulnerable. The review process caught missing SSRF protection in feed discovery and added “hostname resolution and private IP rejection,” plus XSS protection in email link replacement with “scheme rejection for javascript: URIs.” These are all trust boundary decisions: what inputs do we trust, and what do we validate?

When you’re writing code by hand, you see every line. You know where the boundaries are because you drew them. When an AI writes the code, you’re reviewing diffs, not writing from scratch. The authexis log notes that review agents “kept reading stale local files instead of actual PR diffs, requiring manual verification”—the human had to actively work to see what actually changed. That’s a different relationship with the codebase. You’re not building it. You’re inspecting what was built. The trust boundary isn’t just between user input and system internals. It’s between human intent and AI implementation.

The polymathic-h log shows a simpler version of the same pattern: “Newsletter ed. 10 is draft: false with a future date (Feb 24). It won’t render on the live site until Cloudflare builds with --buildFuture or the date passes.” That’s a trust boundary between the content system and the build system. The human has to remember that future-dated content requires a specific build flag, or it won’t appear. The system won’t warn you. It’ll just silently not publish. When you’re moving fast, these invisible boundaries become traps. The authexis log explicitly tracks this: “Need to push and deploy before or at send time Tuesday.” The human is now responsible for remembering the incantations that make the automation work.

Automation Debt Accumulates Faster Than Automation Itself

Both logs show the same pattern: the work is fast, but the automation around the work lags behind. The authexis log notes “Review agents reading stale local files instead of PR diffs caused false rejections—need a more reliable review workflow.” The polymathic-h log lists “Newsletter automation as polymathic-h hooks—no issue created yet” and “CLAUDE.md still references non-existent paulos newsletter test/send commands.” The systems are running fast, but the tooling around them is still manual. The human is typing commands, creating campaigns, posting to LinkedIn by hand.

This is automation debt: the gap between what could be automated and what is automated. The polymathic-h log explores options for “automating LinkedIn posting alongside newsletter sends: Zapier has a Brevo ‘campaign status updated’ trigger that could feed a LinkedIn post action, or we could go direct via LinkedIn API from paulos. Decision deferred.” That’s not a technical problem. That’s a prioritization problem. Building the automation takes time away from writing newsletters. But not building it means every newsletter requires manual LinkedIn posting. The debt compounds.

The authexis log shows the same dynamic: “paulos notify editorial hook still untested on real content commits.” The hook exists, but it’s not wired up. The system could notify on content changes, but it doesn’t yet. The human has to remember to check. Every manual step is a potential failure point. The authexis log flags “verify all 6 merged features work on production” as a next-session task—that’s manual QA because automated verification isn’t in place yet. The faster you ship, the more manual verification you need, unless you invest in automation. But investing in automation slows down shipping. The debt accumulates faster than you can pay it down.

Questions This Raises

  • When review becomes the bottleneck, do you need a different review process, or do you need to slow down implementation to match review capacity?
  • If specification work is now the expensive part, what tools and practices make specification faster without sacrificing quality?
  • How do you maintain trust boundaries in a codebase where you’re reviewing diffs instead of writing from scratch?
  • What’s the right ratio of automation investment to feature velocity when automation debt is growing faster than automation itself?
  • Can humans sustain high-level decision-making (architecture, product strategy, editorial direction) without the pacing that manual implementation used to provide, or do we need artificial pacing mechanisms?

What Matters About This

The pattern across both projects isn’t about AI capabilities. It’s about what happens to human work when AI handles execution. The bottleneck moved from implementation to specification, from writing code to deciding what code to write, from drafting to reviewing, from building to inspecting. That’s not a small shift. It changes what skills matter, what work is expensive, and where mistakes happen.

The authexis log shows a payment system that almost shipped without trial expiration, without workspace verification, without SSRF protection—not because the AI is bad at coding, but because the human didn’t specify those requirements explicitly enough, or didn’t catch their absence during review. The polymathic-h log shows a newsletter that needed structural editing after drafting because the AI didn’t know the distribution context. These aren’t implementation failures. They’re specification and review failures. The work moved up the stack, and the failure modes moved with it.

Where This Could Go

Immediate actions:

  • Authexis: Build automated verification for merged PRs (smoke tests, runtime checks) to catch regressions without manual QA
  • Polymathic-h: Create GH issue for newsletter send automation and decide on LinkedIn integration approach (Zapier vs API)
  • Both: Document trust boundaries explicitly in specs (what’s validated where, what’s assumed safe)

Longer-term:

  • Design a review process optimized for high-velocity AI implementation (diff-focused, trust-boundary-aware, automated verification)
  • Build tooling that makes specification faster (templates, API surface scanners, automated acceptance criteria generation)
  • Pay down automation debt systematically (one manual step automated per sprint) before it becomes a coordination crisis

Why customer tools are organized wrong

This article reveals a fundamental flaw in how customer support tools are designed—organizing by interaction type instead of by customer—and explains why this fragmentation wastes time and obscures the full picture you need to help users effectively.

Infrastructure shapes thought

The tools you build determine what kinds of thinking become possible. On infrastructure, friction, and building deliberately for thought rather than just throughput.

Server-side dashboard architecture: Why moving data fetching off the browser changes everything

How choosing server-side rendering solved security, CORS, and credential management problems I didn't know I had.

The work of being available now

A book on AI, judgment, and staying human at work.

The practice of work in progress

Practical essays on how work actually gets done.

The bottleneck moved

The constraint in knowledge work used to be execution. Now it's specification. Most organizations haven't noticed.

Dev reflection - February 23, 2026

I want to talk about pacing. Not productivity, not velocity — pacing. Because I think we're about to discover that a lot of what we called 'workflow' was actually a rhythm our brains depended on, a...

Universities missed the window to own AI literacy

In 2023 the question of who would own AI literacy was wide open. Universities spent two years forming committees while everyone else claimed the territory. Then a federal agency published the guidance higher education should have written.

Work log synthesis: February 22, 2026

Cross-project synthesis for February 22, 2026

Work log synthesis: February 21, 2026

Cross-project synthesis for February 21, 2026

Work log synthesis: February 20, 2026

Cross-project synthesis for February 20, 2026