Personal · 2026–present

A local-first longitudinal agent that reads AI conversation history and generates weekly narrative dispatches. Runs entirely on Ollama (qwen3:14b). Succeeds only when you feel witnessed — never when a task is completed. The design question it answers: what does memory look like when the goal is not recall but becoming?

Microsoft M365 Copilot · 2025–present

Designed a four-tier memory taxonomy (semantic, episodic, procedural, prospective) for agentic systems serving tens of millions of users. The prospective tier — forward-looking task memory — is the original contribution. Paired with a two-stage generation architecture that reduced hallucination and GPU costs significantly.

Swarm-Based Evaluation Pipeline
Microsoft M365 Copilot · 2026–present

A multi-agent eval system with specialized roles: Proposer, Critic, Scorer, Synthesizer. Compressed two weeks of human annotation to one agent-day without sacrificing coverage. Built to run continuously, not just at release gates.

Tool & Skill Quality Framework ↗ Microsoft M365 Copilot · 2026–present

Three-layer framework for assessing agentic tool quality: contract quality, execution success rate, and orchestration triggering accuracy. Designed to be measurable, not just principled.

Microsoft Research · prior

Built knowledge graphs at billion-node scale for academic literature. Adopted by the OECD and the Stanford AI Index as infrastructure for science-of-science research. Learned that knowledge representation is never just a data problem — it is always also a question of what you think knowledge is. More selected works: A web-scale scientific taxonomy; Science-of-science studies.

Memory-as-Policy
Personal research · 2025–present

Exploring preference optimization approaches — DPO, GRPO, and RLAIF — on small open-weight models to encode memory retrieval and generation behavior as learned policy rather than prompted behavior. Ongoing.

More to come · Updated irregularly · Last touched April 2026