Knowing, Doing, Deciding — applied to a fine-tuning lab ↗
Fifteen catalogued failures from the iris-ft-lab fine-tuning lab map onto the Agent Epistemic Integrity framework with a 6/3/6 split. Most pain accumulated at the upstream and downstream ends of the coupling chain. What the pattern says about training-instrument selection and trajectory-level evaluation.
collab-eval — Reward Design for Document Tasks ↗
Builder's notes for collab-eval — a task environment and grader harness for open-ended document manipulation tasks, built to make reward design explicit. The core question: what does it take to build a grading system an RL loop can trust?
When SFT and DPO could not teach "don't drop the row" ↗
The row-preservation chapter of the collab-eval fine-tuning trajectory: four SFT runs and one DPO run, none promoted. Stress data_preservation flat at 0.20–0.25 across every cycle. SFT and DPO supervise pre-computed outputs; row preservation is a generation-time policy decision.
Trace — Initial Version ↗
Builder's notes for Trace — a local-first witness agent that reads your AI conversation history and generates weekly narrative dispatches. Not a summary or dashboard, but a short-form piece of writing that surfaces patterns, tensions, and the distance between intention and action across time.
How Modern AI Reads: Three Ways to Solve the Attention Problem ↗
A learner's guide to the three main strategies modern AI uses to handle long context — dense attention, sliding window, and memory-augmented retrieval — and the tradeoffs each makes between speed, coverage, and cost.

Longer treatment: Trajectory-Level Eval Taxonomy →

Updated irregularly · Last touched April 2026