Things I've built, architected, or thought too hard about.
A local-first longitudinal agent that reads AI conversation history and generates weekly narrative dispatches. Runs entirely on Ollama (qwen3:14b). Succeeds only when you feel witnessed — never when a task is completed. The design question it answers: what does memory look like when the goal is not recall but becoming?
Designed a four-tier memory taxonomy (semantic, episodic, procedural, prospective) for agentic systems serving tens of millions of users. The prospective tier — forward-looking task memory — is the original contribution. Paired with a two-stage generation architecture that reduced hallucination and GPU costs significantly.
A multi-agent eval system with specialized roles: Proposer, Critic, Scorer, Synthesizer. Compressed two weeks of human annotation to one agent-day without sacrificing coverage. Built to run continuously, not just at release gates.
Three-layer framework for assessing agentic tool quality: contract quality, execution success rate, and orchestration triggering accuracy. Designed to be measurable, not just principled.
Built knowledge graphs at billion-node scale for academic literature. Adopted by the OECD and the Stanford AI Index as infrastructure for science-of-science research. Learned that knowledge representation is never just a data problem — it is always also a question of what you think knowledge is. More selected works: A web-scale scientific taxonomy; Science-of-science studies.
Exploring preference optimization approaches — DPO, GRPO, and RLAIF — on small open-weight models to encode memory retrieval and generation behavior as learned policy rather than prompted behavior. Ongoing.