Paul Welty, PhD AI, WORK, AND STAYING HUMAN

Work log: March 1, 2026

What shipped today

Today was about two things: finishing the user-facing polish from yesterday’s telemetry/email sprint, and making a big leap forward on article discovery.

Search infrastructure overhaul. We replaced Google CSE with Serper.dev for article discovery (GH-716). This was Roland’s pain point — he needs international content for his music touring business, and Google CSE was returning US-centric results exclusively. After evaluating Kagi (closed beta, not viable), Tavily, and Exa.ai, Serper won on cost ($0.001/search vs $0.005), international support, and API simplicity. The swap was clean: POST to google.serper.dev/search with an API key header, parse organic results. We then added per-workspace search_countries — Roland’s workspace now searches across US, Germany, Austria, UK, Spain, Italy, and France. His first scan pulled in 192 new articles from 232 found, tripling his library from 86 to 278 articles.

Scoring architecture fix. The multi-country search surfaced a latent bug: our batch scoring approach (sending all results to Claude in one prompt, expecting a massive JSON array back) collapsed at scale. With 229 results across 7 countries, Claude couldn’t reliably generate a 229-element JSON array within the token limit. We refactored to score each result individually with Haiku — simpler, more reliable, and eliminates the index-matching complexity of batch scoring. This also uncovered a cross-module dependency issue: content_research_sources.py imported the old filter_with_claude function name, causing deploy failures until we tracked down and fixed the import.

Morning sprint items. Earlier in the day we shipped article sort-by-score (GH-717), fixed Google scan’s search term parsing (JSON decode + per-line splitting), repaired a corrupted Xcode project file, and landed the email/telemetry work from yesterday’s session (signed_in events, email_override fix, newsletter pitch in briefings).

Completed

  • GH-716 — Evaluate search providers beyond Google CSE (Serper.dev selected and implemented)
  • GH-717 — Sort articles by AI score descending by default
  • GH-697 — User activity tracking and telemetry (signed_in event)
  • GH-713 — Include newsletter pitch in briefing email footer
  • GH-707 — Add topic CTA and UTM tracking to briefing emails
  • GH-714 — Hide scans page from non-admin users
  • GH-715 — Hide recent activity on dashboard for non-admin users
  • GH-700 — Investigate PostHog analytics gaps
  • GH-663 — Telemetry discussion (closed, superseded by GH-697)
  • GH-712 — Make dashboard recent activity user-facing (closed, superseded by GH-715)
  • GH-718, 719, 720, 721 — Deploy failure issues (all resolved)

Carry-over

  • Kelly Pentecost’s Google scan returning 0 results — her search terms may need investigation or her workspace may need search_countries configured
  • Tags field format inconsistency — some articles have tags stored as plain comma-separated strings instead of JSON arrays, causing parse failures in scripts
  • Temp .ts scripts in web/ — debugging scripts (check-articles.ts, check-scan.ts, check-final.ts, clear-requeue.ts) should be cleaned up

Risks

  • Serper API key exposure — key is in Railway env vars and engine .env. The .env is gitignored but worth confirming it stays that way.
  • Scoring cost at scale — individual Haiku calls per result means 232 API calls per scan for Roland. At Haiku pricing this is pennies, but worth monitoring as more workspaces come online with multi-country search.

Flags and watch-outs

  • Railway deploys from GitHub pushes — do NOT use railway redeploy CLI (it hangs in non-interactive mode). Just push to main.
  • When renaming exported functions in handler modules, check all importers — content_research_sources.py imports from google_search_scan.py and was missed initially.
  • Batch scoring with Claude doesn’t scale past ~30 results. The individual scoring pattern is the right approach going forward.

Next session

  • Check Kelly Pentecost’s workspace scan configuration — does she have search terms? Does she need search_countries?
  • Review Roland’s 192 new articles — are the international results actually relevant? Check score distribution and content quality.
  • Clean up temp .ts scripts from web/ directory.
  • Continue with briefings work (GH-705) or pick next from the queue.

Why customer tools are organized wrong

This article reveals a fundamental flaw in how customer support tools are designed—organizing by interaction type instead of by customer—and explains why this fragmentation wastes time and obscures the full picture you need to help users effectively.

Infrastructure shapes thought

The tools you build determine what kinds of thinking become possible. On infrastructure, friction, and building deliberately for thought rather than just throughput.

Server-side dashboard architecture: Why moving data fetching off the browser changes everything

How choosing server-side rendering solved security, CORS, and credential management problems I didn't know I had.

The work of being available now

A book on AI, judgment, and staying human at work.

The practice of work in progress

Practical essays on how work actually gets done.

Your biggest problems are the ones running fine

The most dangerous failures in any system — technical or organizational — aren't the ones throwing errors. They're the ones that appear to work perfectly. And they'll keep appearing to work perfectly right up until they don't.

The day all five of my AI projects stopped building and started cleaning

I want to talk about something that happened this week that I almost missed because it looked boring. Five separate software projects — all mine, all running semi-autonomously with AI pipelines — i...

The silence that ships

Three projects independently discovered the same bug pattern today — code that reports success when something important didn't happen. The most dangerous failures don't look like failures at all.