Work log: Phantasmagoria — March 28, 2026

What shipped today

The big theme today was pipeline architecture and AI-driven quality gates. Three major pieces landed.

Decoupled Stage 1 and Stage 2. Previously, generate_release.py --stage 2 would regenerate Stage 1 narratives before generating outcomes, which meant you couldn’t iterate on Stage 2 without risking Stage 1 drift. Now Stage 1 outputs are saved as snapshots in stage1/, and Stage 2 reads from those snapshots. This makes the pipeline deterministic — Stage 1 is frozen once approved, and Stage 2 can be re-run freely against a stable narrative foundation.

Split Stage 3 into render (Stage 3) and validate (Stage 4). The old check_c3_to_ship() did both rendering and linting in one pass. Now rendering and validation are separate stages with their own contract gates (check_c3_to_c4() and check_c4_to_ship()). The linter was renamed to “validator” (stellaris_mod_validator.py) to better reflect its role. This also fixed renderer bugs — modifier directory output and on_action trigger syntax were broken.

AI-based choice tension evaluation (Stage 2B+). This is the headline feature: after Stage 2A generates event outcomes, a new Stage 2B+ pass scores each event’s option set for tension on a 1-5 scale. Events scoring below 3 get rejected with specific critique, and Stage 2A regenerates with that feedback — up to 5 retries. The evaluator checks for dominant options (one choice clearly better than all others), reward stacking, tech parity violations, and punished caution. Both test events (The Cartographer’s Obsession and The Weighted Void) passed after 2-3 retries, showing the feedback loop catches and fixes real problems.

Key design rules baked into the prompts: max 2 effects per option (down from 3), one premium reward per option (tech OR follow-up OR strong modifier — pick one), follow-up events reframed as gambles rather than free bonuses, and the follow-up used as a balancing lever (fewer other effects on the follow-up option since it already offers “more content”).

Completed

Pipeline decoupling: Stage 1 snapshots, Stage 2 reads from frozen narratives
Stage 3/4 split: render and validate as separate pipeline stages
Renderer fixes: modifier directory, on_action syntax
Playtest mode: zeroed out min_planets and min_years gates
Stage 2B+ tension evaluation with anti-dominance rules
9 documentation files updated across all changes
Created /ship skill for streamlined doc-update-commit-push workflow

Release progress

v2: 5/5 closed (complete)
v1.5: 18/18 closed (complete)

No open milestones — next release milestone hasn’t been created yet.

Carry-over

Issue #249 (Stage 2B dominant option detection) is ready-for-prep — the core 2B+ evaluator shipped today, but the issue may need its spec updated to reflect what actually landed vs. what’s still needed
Issue #247 (on_action syntax validation) is ready-for-dev with PR #248 open — needs rebase since main renamed linter to validator
Issue #246 (common/ subdirectory validation) is ready-for-dev
Generated events under data/releases/celestial_equinox/events/ were regenerated multiple times during testing — not committed. Need human review of final output before committing.
Backup files (SUMMARY_pre_stage1_reset_*, events_pre_stage1_reset_*) are untracked, intentionally not committed

Risks

The 2B+ evaluator uses the same AI model (Claude Sonnet 4.6) for both generation and evaluation — there’s a risk of shared blind spots where the evaluator doesn’t catch patterns the generator favors
Stage 1 narrative balance affects Stage 2 outcomes significantly — if Stage 1 options are structurally asymmetric (e.g., only one has a follow-up), Stage 2A has to work harder to create tension

Flags and watch-outs

All pipeline stages use anthropic / claude-sonnet-4-6 via AI_VENDOR/AI_MODEL env vars — this is configurable but hasn’t been tested with other providers
The resource_gain_scaled vs modifier_value distinction is subtle: scaled is a one-time percentage hit on stockpile, modifier_value is an ongoing production multiplier. Docs were updated but this remains a common confusion point.

Next session

Review generated events — the celestial_equinox events under events/ need human review before committing. Run Stage 2 fresh if needed.
Rebase PR #248 — on_action syntax validation PR needs rebase after linter→validator rename
Close or update #249 — the 2B+ evaluator shipped; decide if the issue scope is satisfied or if more work is needed
Execute #246 — common/ subdirectory validation is ready for dev
Consider creating a v3 milestone — both v1.5 and v2 are fully closed, time to plan the next release

Work log: Phantasmagoria — March 28, 2026

What shipped today

Completed

Release progress

Carry-over

Risks

Flags and watch-outs

Next session

Why customer tools are organized wrong

Infrastructure shapes thought

Server-side dashboard architecture: Why moving data fetching off the browser changes everything

The work of being available now

The practice of work in progress

Silence by design

Designed to learn, built to ignore

The variable that was never wired in