Work log: Phantasmagoria — March 28, 2026
What shipped today
The big theme today was pipeline architecture and AI-driven quality gates. Three major pieces landed.
Decoupled Stage 1 and Stage 2. Previously, generate_release.py --stage 2 would regenerate Stage 1 narratives before generating outcomes, which meant you couldn’t iterate on Stage 2 without risking Stage 1 drift. Now Stage 1 outputs are saved as snapshots in stage1/, and Stage 2 reads from those snapshots. This makes the pipeline deterministic — Stage 1 is frozen once approved, and Stage 2 can be re-run freely against a stable narrative foundation.
Split Stage 3 into render (Stage 3) and validate (Stage 4). The old check_c3_to_ship() did both rendering and linting in one pass. Now rendering and validation are separate stages with their own contract gates (check_c3_to_c4() and check_c4_to_ship()). The linter was renamed to “validator” (stellaris_mod_validator.py) to better reflect its role. This also fixed renderer bugs — modifier directory output and on_action trigger syntax were broken.
AI-based choice tension evaluation (Stage 2B+). This is the headline feature: after Stage 2A generates event outcomes, a new Stage 2B+ pass scores each event’s option set for tension on a 1-5 scale. Events scoring below 3 get rejected with specific critique, and Stage 2A regenerates with that feedback — up to 5 retries. The evaluator checks for dominant options (one choice clearly better than all others), reward stacking, tech parity violations, and punished caution. Both test events (The Cartographer’s Obsession and The Weighted Void) passed after 2-3 retries, showing the feedback loop catches and fixes real problems.
Key design rules baked into the prompts: max 2 effects per option (down from 3), one premium reward per option (tech OR follow-up OR strong modifier — pick one), follow-up events reframed as gambles rather than free bonuses, and the follow-up used as a balancing lever (fewer other effects on the follow-up option since it already offers “more content”).
Completed
- Pipeline decoupling: Stage 1 snapshots, Stage 2 reads from frozen narratives
- Stage 3/4 split: render and validate as separate pipeline stages
- Renderer fixes: modifier directory, on_action syntax
- Playtest mode: zeroed out min_planets and min_years gates
- Stage 2B+ tension evaluation with anti-dominance rules
- 9 documentation files updated across all changes
- Created
/shipskill for streamlined doc-update-commit-push workflow
Release progress
- v2: 5/5 closed (complete)
- v1.5: 18/18 closed (complete)
No open milestones — next release milestone hasn’t been created yet.
Carry-over
- Issue #249 (Stage 2B dominant option detection) is
ready-for-prep— the core 2B+ evaluator shipped today, but the issue may need its spec updated to reflect what actually landed vs. what’s still needed - Issue #247 (on_action syntax validation) is
ready-for-devwith PR #248 open — needs rebase since main renamed linter to validator - Issue #246 (common/ subdirectory validation) is
ready-for-dev - Generated events under
data/releases/celestial_equinox/events/were regenerated multiple times during testing — not committed. Need human review of final output before committing. - Backup files (
SUMMARY_pre_stage1_reset_*,events_pre_stage1_reset_*) are untracked, intentionally not committed
Risks
- The 2B+ evaluator uses the same AI model (Claude Sonnet 4.6) for both generation and evaluation — there’s a risk of shared blind spots where the evaluator doesn’t catch patterns the generator favors
- Stage 1 narrative balance affects Stage 2 outcomes significantly — if Stage 1 options are structurally asymmetric (e.g., only one has a follow-up), Stage 2A has to work harder to create tension
Flags and watch-outs
- All pipeline stages use
anthropic/claude-sonnet-4-6viaAI_VENDOR/AI_MODELenv vars — this is configurable but hasn’t been tested with other providers - The
resource_gain_scaledvsmodifier_valuedistinction is subtle: scaled is a one-time percentage hit on stockpile, modifier_value is an ongoing production multiplier. Docs were updated but this remains a common confusion point.
Next session
- Review generated events — the celestial_equinox events under
events/need human review before committing. Run Stage 2 fresh if needed. - Rebase PR #248 — on_action syntax validation PR needs rebase after linter→validator rename
- Close or update #249 — the 2B+ evaluator shipped; decide if the issue scope is satisfied or if more work is needed
- Execute #246 — common/ subdirectory validation is ready for dev
- Consider creating a v3 milestone — both v1.5 and v2 are fully closed, time to plan the next release
Why customer tools are organized wrong
This article reveals a fundamental flaw in how customer support tools are designed—organizing by interaction type instead of by customer—and explains why this fragmentation wastes time and obscures the full picture you need to help users effectively.
Infrastructure shapes thought
The tools you build determine what kinds of thinking become possible. On infrastructure, friction, and building deliberately for thought rather than just throughput.
Server-side dashboard architecture: Why moving data fetching off the browser changes everything
How choosing server-side rendering solved security, CORS, and credential management problems I didn't know I had.
The work of being available now
A book on AI, judgment, and staying human at work.
The practice of work in progress
Practical essays on how work actually gets done.
Silence by design
Most systems have more suppression than their owners realize. It gets installed for good reasons. The cost accumulates slowly, in the form of systems you can't operate because you've removed the signals that would let you understand them.
Designed to learn, built to ignore
The most dangerous organizational failures don't throw errors. They look fine, return results, and quietly stay frozen at the moment of their creation.
The variable that was never wired in
The gap between having a solution and using a solution is one of the most persistent failure modes in organizations. You see the escaped variable. You see the risk register. You assume the work is done.