A Four-Tier Memory Taxonomy for Enterprise Agentic Systems
Why semantic, episodic, and procedural memory are necessary but not sufficient — and what the missing tier reveals about how agents fail at scale.
What Falls Through the Cracks
Enterprise agentic systems lose information at a specific, predictable point: the session boundary. A user asks an agent to draft a proposal, send it by Thursday, and schedule a follow-up if there's no response by Friday. The agent executes the first action. When the session ends, the Thursday deadline and Friday follow-up condition are gone — not because the model forgot them, but because the system had no designated place to store forward-looking commitments.
This failure is not a retrieval quality problem. It is a taxonomy problem. Standard memory architectures don't have a tier for "things the agent is supposed to do or check in the future." That gap has a name in cognitive science: prospective memory — memory for intended future actions, as distinct from memory for past facts, events, or behavioral patterns.
Four Tiers, Four Distinct Problems
Each tier answers a different question about the agent's cognitive state. The tiers are not alternatives — they are all necessary, and none is sufficient alone.
Compressed, decontextualized facts about the user's world: roles, relationships, domains of expertise, organizational context, communication preferences. These are durable beliefs that should persist across all sessions and be invalidated only when a fact explicitly changes.
Time-indexed records of specific events, interactions, decisions, and outcomes — the agent's personal experience stream. Episodic memory provides the context for interpreting current requests in light of prior history, including unresolved threads and past commitments made by the user.
Behavioral patterns distilled from repeated observed actions — the user's implicit preferences for how tasks should be executed. Procedural memory is stored as structured skill definitions inferred from behavioral signals, not as passive embeddings of prior text. It answers: "Given this task type, how does this user prefer it done?"
Forward-looking commitments, deadlines, pending decisions, and intended future actions — memory for things the agent is supposed to do or check. This is the tier absent from most deployed systems and the tier most consequential for always-on operation. In cognitive science, prospective memory is well-established as a distinct memory system; its absence in AI agent architectures is the primary reason agents drop deferred tasks at session boundaries.
Against Existing Frameworks
The table below positions the four-tier taxonomy against the most commonly cited alternatives in the agent memory literature.
| Framework | Tiers / Types | Missing | Notes |
|---|---|---|---|
| This paper | Semantic · Episodic · Procedural · Prospective | — | Prospective as a first-class storage tier with distinct retrieval and expiry semantics |
| CoALA (2023) | Semantic · Episodic · Procedural · Working | Prospective | Working memory is treated as a storage tier; this paper treats it as a retrieval artifact |
| MemGPT (2023) | In-context (working) · External (archival) | Semantic · Procedural · Prospective | Engineering-focused; storage topology over cognitive taxonomy |
| Survey literature (2024–25) | Semantic · Episodic · Procedural | Prospective | Standard three-tier framing; adequate for session-scoped agents, insufficient for persistent ones |
| Cognitive science baseline | Semantic · Episodic · Procedural · Prospective | — | All four tiers well-established in human memory research; AI agent literature has lagged in adopting prospective |
Prospective Memory as the Steerability Primitive
The prospective memory tier is not only a storage concern. It is the primitive that makes long-horizon agent steerability tractable.
Steerability — the ability for authorized parties to correct an agent's behavior mid-task — requires a live representation of what the agent intends to do next. Without prospective memory, the agent's future action plan is implicit: it lives in the model's forward pass, which cannot be inspected, paused, or corrected without terminating the task entirely. With prospective memory as a first-class system artifact, human interrupts have a well-defined insertion point. Correction becomes additive rather than destructive.
This is the connection between memory architecture and governance that existing frameworks do not make explicit. The four-tier taxonomy is not just about recall quality — it is about whether a persistent agent's behavior can be understood and corrected by the humans responsible for it.
What This Implies for System Builders
Treat all four tiers as first-class storage concerns, not retrieval strategies. Each tier requires its own storage representation, expiry semantics, and conflict resolution logic. Systems that implement one flat memory store and rely on retrieval quality to sort out the rest will consistently drop prospective items — not because retrieval fails, but because prospective records have no natural similarity to the current query and will never surface via similarity search alone.
Prospective memory requires proactive injection, not reactive retrieval. The failure mode is not "user asks about a deadline and the agent can't find it." The failure mode is "the deadline expires and the agent never notices." Prospective items must be surfaced on a cadence, not on demand. This requires a background monitoring process, not a retrieval pipeline.
Source authority is a first-class provenance attribute. A formal commitment in a calendar invite carries different epistemic weight than a commitment mentioned casually in a chat message. Memory generation must track source authority, and the retrieval layer must calibrate confidence accordingly. Systems that treat all sources as equivalent will produce memory entries of wildly varying reliability indistinguishably.
The session boundary is the taxonomy stress test. A simple diagnostic: run the system across a session boundary and check which commitments, decisions, and intended future actions survive. If the answer is "only the ones the user explicitly re-states," the system has a prospective memory gap, regardless of how sophisticated its other memory tiers are.
Research notes, half-baked ideas. Probably overthought, definitely over-architected.