Agent Epistemic Integrity: A Framework for Knowing, Doing, and Deciding Across the Session-to-Long-Running Transition

Iris Shen · April 2026 · iris-axon-lab.github.io

Abstract

As agentic systems move from single-session interactions to long-running operation, they encounter coupled failure modes that are poorly captured when memory, tool state, and steerability are treated as separate concerns. This paper introduces Agent Epistemic Integrity (AEI) as a conceptual architectural framework for reasoning about that coupling. AEI asks whether what an agent knows, does, and intends remains coherent, inspectable, and correctable across time.

The paper makes three claims. First, the session-to-long-running transition is the stress test that turns freshness, partial execution, and goal validity from hidden assumptions into explicit state-management problems. Second, prospective memory — a durable representation of forward commitments — provides a practical correction surface for human steering, though it is not by itself sufficient for safety or correctness. Third, long-running agents require trajectory-level evaluation rather than turn-level success alone.

The contribution here is primarily conceptual. This paper does not yet offer a complete formal state model, canonical schema, or proof system for AEI. Instead, it offers a systems framing that links persistent memory, capability state, and steerability into a single runtime integrity problem.

1Introduction

Consider a long-running agent coordinating a vendor contract renewal over three weeks.

In week one, the agent is authorized to renew the contract by May 15 with a 10% cost-reduction target and no change to the existing SLA. In week two, another team adds a new security-certification requirement to the project documentation, but no one restates that requirement in the active chat. In week three, the agent prepares to retry an earlier proposal step without realizing that a prior email has already committed external state.

Nothing in this example is exotic. The agent can retrieve relevant text, call tools correctly, and still fail in a structurally predictable way. It may act on beliefs that were once true but are no longer current. It may repeat actions without reasoning over the side effects that already occurred. It may continue pursuing a goal whose validity conditions have changed without an explicit surface for correction.

This is the setting that motivates Agent Epistemic Integrity.

In session-bounded systems, many of these issues are partially hidden by the session boundary itself. Context is assumed to be fresh. Tool invocations are assumed to be locally bounded. Goals are assumed to remain valid for the duration of the interaction. As systems move into long-running operation — persistent assistants, multi-day workflows, background delegation, cross-session task execution — those assumptions stop holding. The result is not simply "more memory needed." It is the emergence of a coupled systems problem spanning memory, action, and steering.

The field has responded with capable partial frameworks. Memory architectures address retention and retrieval [CoALA, MemGPT]. Action frameworks address reasoning, tool use, and self-correction [ReAct, Reflexion]. Governance frameworks address oversight and interruptibility [OpenAI]. What the field lacks is a clear account of how these three domains constrain one another — and why that constraint becomes more visible, specifically and structurally, as systems transition from session-based to long-running operation.

This paper argues that the right unit of analysis is not memory alone, nor planning alone, nor governance alone, but their interaction. AEI is offered as a framework for that interaction. It makes three claims. First, the session-to-long-running transition turns freshness, partial execution, and goal validity from hidden assumptions into explicit state-management problems. Second, prospective memory provides a practical correction surface for human steering across that transition. Third, long-running agents require trajectory-level evaluation rather than turn-level success alone.

2The Framework: Agent Epistemic Integrity

2.1A Conceptual Architectural Property

Agent Epistemic Integrity (AEI) is the architectural property that an agent's active state remains sufficiently coherent, inspectable, and correctable for the actions it takes over time. The phrase "active state" is doing real work here. It includes not just retrieved facts, but the system's current uncertainty annotations, its understanding of what tools have already been used and with what effects, and its representation of what it is committed to doing next. A system may produce fluent outputs while lacking AEI if it cannot expose the state that justifies those outputs to inspection and correction.

AEI is therefore not a claim about omniscience. It is a claim about calibrated operation under incomplete information. A system can be uncertain and still maintain AEI if it behaves proportionally to that uncertainty. Conversely, a system can violate AEI even when individual answers sound correct, if it acts as though stale, partial, or weakly grounded state were fully reliable.

AEI is a system-level framing, not a model-level guarantee. It is not ensured by any individual component — not the retriever, not the planner, not the tool executor — and it cannot be evaluated on the basis of individual outputs. Its value is in directing architectural attention toward how the system represents epistemic state and exposes that representation for correction. No amount of fine-tuning produces AEI in a system whose architecture lacks the inspection and correction surfaces it requires.

2.2Three Invariant Domains

The problem space of agentic systems decomposes, at the architectural level, into three domains.

Knowing is memory and context management — what the agent holds true about the world, itself, and its history. The problem is not storage but coherence over time: non-contradictory beliefs, recognition of supersession, and traceable provenance.

Doing is tool and capability management — what actions the agent can take, under what conditions, and what the cumulative effect of its invocations has been. The challenge is not enumeration but runtime state: capability availability is not binary, effects are not idempotent, and degradation is frequently unannounced.

Deciding is planning and reasoning — how the agent selects and sequences actions given its beliefs and capabilities. This spans single-step selection, plan decomposition, and the meta-level question of when to pause and solicit input.

These three domains are invariant in the sense that any agentic system must solve some version of each, and no solution in one substitutes for a solution in another. They correspond to the three components of a purposive system: representation, action, and deliberation.

2.3The 3×2 Grid

The framework organizes these three domains against two deployment paradigms: the session-based paradigm, in which the agent operates within a bounded interaction window, and the long-running paradigm, in which the agent persists across arbitrary time horizons and may act during periods of reduced supervision. Figure 1 illustrates the six cells of the framework and the new architectural primitive each transition requires.

Agent Epistemic Integrity framework: 3×2 grid Three domains (Knowing, Doing, Deciding) against two deployment paradigms (Session-Based and Long-Running). Each domain requires a new architectural primitive in the long-running regime. Prospective memory unifies all three. Agent Epistemic Integrity Domain Session-Based Long-Running New Primitive Knowing Context fresh by session boundary (freshness assumed) Detect staleness; revise superseded beliefs Belief revision Doing State ephemeral; invocations isolated (idempotency assumed) Track cumulative state; reason about safe resumption and retry Capability state management Deciding Goal fixed for session duration (goal validity assumed) Goals evolve, expire; track validity; surface intent drift Goal lifecycle management Prospective memory the unifying primitive across all three long-running requirements

Figure 1. The AEI 3×2 grid. The session-based column reflects what current architectures implicitly assume — and the assumptions each column silently encodes are shown in italic. The long-running column is where those assumptions fail, requiring new architectural primitives. Prospective memory (§3.4) is the unifying primitive across all three right-column cells. AEI is the architectural property governing the full grid.

2.4Two Horizontal Constraints

Two concerns cut across all six cells and represent qualitatively distinct constraints on the AEI problem.

The Model Layer is the source of instability that any architecture must absorb. Foundation model behavior is not static: models are updated on schedules that may not be communicated to system operators. Any system capability that depends on specific model behavior — uncertainty calibration, tool-use syntax, reasoning patterns — is subject to silent regression. AEI requires that architectures treat model-layer instability as a design invariant, not an edge case.

The Economics Layer is the optimization constraint that any real deployment must satisfy. Context tokens are not free. Long-horizon tasks accumulate retrieval, reranking, and state-reconstruction costs across sessions. The economics layer does not ask which design maximizes epistemic integrity in principle; it asks which design maximizes integrity subject to a cost budget. This hard constraint shapes which architectures are actually deployable.

2.5The Stable What / Volatile How Principle

The stable what is the invariant requirement — the architectural obligation that persists regardless of implementation. For Knowing: the agent must maintain non-contradictory beliefs and detect when they have been superseded. For Doing: it must track the cumulative state of its capability invocations and reason about safe resumption. For Deciding: it must maintain an explicit history of its intent states and surface changes for human review.

The volatile how is the mechanism — the specific implementation that satisfies the requirement today. A timestamp-weighted vector store, a Bayesian revision algorithm, a TTL-based staleness flag. Each is a reasonable answer to the stable what; none is the only answer. When models are upgraded, retrieval infrastructure changes, or better algorithms emerge, the how changes. The stable what does not.

A system organized around the stable what will survive those changes with its core behavior intact. A system organized around today's specific mechanism will require partial or complete redesign each time the mechanism evolves. AEI is offered as a stable what for the agentic domain: the mechanisms that satisfy it are a research and engineering agenda; the requirement itself is fixed.

3The Session-to-Long-Running Transition

The session boundary is not a UX convenience. It is an architectural assumption baked into every layer of how agents are currently designed: how they retrieve context, how they invoke capabilities, and how they represent goals. Within a session, that assumption does its work silently — it absorbs complexity that would otherwise have to be handled explicitly. Remove it, and the complexity does not disappear. It surfaces, simultaneously, across all three domains of epistemic integrity.

A session is, in architectural terms, a freshness guarantee. When an agent begins a new session, its retrieved beliefs are implicitly current — staleness that predates the session boundary is outside the agent's concern. Its invoked tools are stateless with respect to prior runs — whatever happened before is not the agent's problem to reconcile. Its goal is stable by assumption — the user stated it moments ago, and nothing has had time to change it. The session is a coherence envelope: it bounds the space in which the agent must reason about time, consistency, and intent. Long-running operation tears that envelope away. The agent must supply its own coherence — continuously, across all three domains.

Domain Session-Scoped Long-Running New Primitive Required
Knowing Retrieve what is relevant now; freshness assumed Detect staleness; revise superseded beliefs Belief revision
Doing Select and invoke the right capability; state is ephemeral Track cumulative state; reason about safe resumption and retry Capability state management
Deciding Plan the next step; goal is stable Track goals' validity conditions over time; surface intent drift Goal lifecycle management

3.1Knowing: From Retrieval to Belief Revision

In a session-scoped system, the retrieval problem is a relevance problem: find the context most pertinent to the current query and surface it. Freshness is not a retrieval criterion because the session boundary enforces it structurally — nothing retrieved from a session that began moments ago can be meaningfully stale.

Long-running operation dissolves this guarantee. An agent running for days, weeks, or months carries beliefs about user preferences, organizational state, prior decisions, and external facts — beliefs that were accurate when formed and may not be accurate now. The agent has no external freshness signal to rely on. It must treat its own belief state as an object of ongoing epistemic management: tracking provenance, estimating decay, detecting inconsistency across newly arriving evidence, and revising beliefs when the evidence warrants.

This is a qualitatively different operation from retrieval. Retrieval asks which memory is most relevant. Belief revision asks which memories are still true — and what to do when the answer is no. Without a mechanism for belief revision, a long-running agent does not accumulate knowledge over time; it accumulates drift, surfacing outdated beliefs with the same confidence it would apply to fresh ones.

3.2Doing: From Invocation to Capability State Management

The tool-use model underlying most current architectures is implicitly stateless. A capability is selected, invoked, and its result consumed within the scope of a single reasoning step. If the invocation fails, it either retries immediately or surfaces an error. There is no persistent record of partial progress, no accounting for what a prior invocation may have already accomplished, and no mechanism for reasoning about the relationship between current and prior invocations.

Within a session, this is largely acceptable — sessions are short, tools are fast, and failure modes are recoverable. Long-running operation changes the failure model completely. A task spanning hours or days may involve dozens of capability invocations. Partial completions are not aberrations; they are the normal operating condition. Retries do not reset cleanly — they must be aware of what has already been effected in the world. Tools that write external state — sending emails, modifying documents, updating records — are no longer idempotent in the session sense, and naive retry logic can compound errors rather than recover from them.

The required primitive is capability state management: an explicit, durable representation of what each tool invocation has accomplished, what remains, and what constraints govern resumption or retry. Without it, the long-running agent cannot distinguish between a task that has not started and one that is half complete, nor can it reason about whether a given action is safe to repeat.

3.3Deciding: From Step Planning to Goal Lifecycle Management

In a session-scoped agent, the goal is a fixed point. The user stated it at the beginning of the session; it has not changed; the agent's job is to make progress toward it. Planning is a local operation — identify the next best action given the current state.

Long-running operation transforms the goal from a fixed point into a trajectory through time. A goal valid when issued may be partially satisfied, fully superseded, or simply expired by the time the agent acts on it. Organizational priorities shift. User circumstances change. Deadlines pass. New information renders prior goals incoherent. A long-running agent that cannot reason about the lifecycle of its own goals will faithfully execute against objectives that are no longer valid — completing tasks no one needed, optimizing for outcomes no one wants.

Goal lifecycle management is the primitive that addresses this: the agent must maintain an explicit history of its intent states, track the conditions under which each goal was issued, detect when those conditions have changed, and surface that detection for human review rather than silently continuing.

3.4Prospective Memory as the Unifying Primitive

All three escalations — from retrieval to belief revision, from invocation to capability state management, from step planning to goal lifecycle management — share a common structural requirement. Each requires the agent to maintain a live, queryable model of its own intended future state: what it believes it will need to know, what it expects to do, and what it is committed to accomplishing. This is the function of prospective memory.

The term draws from cognitive psychology [Brandimonte], where it refers to the capacity to remember to perform an intended action at a future time or in response to a future cue. The engineering construct proposed here is related in motivation but broader in scope: a first-class, serializable, inspectable data structure that encodes not just pending actions but the validity conditions, provenance, and dependencies of the agent's forward commitments — and exposes them through a query interface to both the execution engine and the human oversight layer. The cognitive construct motivates the name; the architectural specification stands on its own terms.

Prospective memory is the natural insertion point for human steering. What a human most needs to inspect and correct is not the agent's past actions — which cannot be undone — but its forward intentions, which can. If those intentions are represented explicitly, they are queryable, auditable, and correctable before they commit. If they exist only implicitly within a chain of reasoning steps, human steering requires reconstructing the agent's intent from the outside — slow, error-prone, and architecturally fragile.

Steerability is not a property of the user interface. It is a property of the memory architecture. An agent designed from the start with prospective memory as a first-class primitive has structural support for steerability. An agent for which steering is added afterward is steerable only by workaround.

4Implications for System Design

Current agentic systems are built around task completion. Their internal architecture reflects this: a planner emits steps, a tool executor runs them, and outputs accumulate until a termination condition is met. Epistemic state is implicit — carried in context, shaped by model behavior, and largely invisible to the system itself. When something goes wrong, post-hoc inspection of logs may reveal what happened but rarely why the agent was confident enough to proceed. This is not a logging gap. It is a design gap.

A system architected around epistemic integrity inverts the priority. Task output is still the product, but epistemic state is a first-class citizen of the runtime — tracked, surfaced, and made actionable at every layer. Three prescriptions follow, one per domain. A system that implements only a subset has added instrumentation, not epistemic integrity.

4.1Prescription 1 (Knowing): Uncertainty as a First-Class Output

Every task output should be accompanied by an uncertainty audit trail — a structured, machine-readable record that distinguishes three epistemic modes across the trajectory: what the agent knew with high confidence, what it inferred from incomplete context, and what it assumed without verification. This is not logging. Logs record events; the uncertainty audit trail records the agent's epistemic posture at the moment of decision. These are different artifacts with different consumers.

Each substantive action in a task — a retrieval, a synthesis step, a tool invocation, a commitment to a subgoal — should emit a tagged epistemic annotation:

TagMeaningExample annotation
confirmed Grounded in retrieved or directly provided evidence "Project deadline is April 30 — confirmed from calendar retrieval"
inferred Derived by reasoning from confirmed facts "Given the deadline and today's date, the task is running late"
assumed Taken as true without available grounding "Assuming vendor preferences from the prior session still apply"

Downstream systems — human reviewers, orchestrating agents, risk management layers — can then operate on the audit trail as a first-class input: escalating on assumption density, routing high-inference steps to verification loops, or presenting the trail to the user as part of the task deliverable. In the long-running setting, where tasks span extended time horizons and may resume after model state has been reconstructed, the audit trail provides epistemic continuity that context windows alone cannot. A resuming agent that inherits an uncertainty audit trail knows not just where it left off but how confident to be about what it recorded.

4.2Prescription 2 (Doing): Capability State as a Durable Runtime Artifact

Human interruption of agentic tasks is currently destructive by default. Without a well-defined insertion point, an interrupt either cancels in-flight work, corrupts task state, or is queued until the agent reaches a natural pause — at which point the moment for correction may have passed. The root cause is architectural: most systems have no explicit representation of what each tool invocation has accomplished, what remains, or what constraints govern safe resumption. Interruption has nowhere to land, and neither does a retry.

Capability state — the execution record of each tool invocation — should be a durable, queryable artifact maintained by the system rather than reconstructed from logs after the fact. At minimum, this record should encode: what the invocation was asked to do, what it has completed, what remains, whether resumption is safe or requires human review, and what side effects have already been committed to external systems. A system with capability state management can distinguish a safe retry from a dangerous one; without it, the agent is guessing.

This closes the Doing gap with the same structural logic that the uncertainty audit trail applies to Knowing. The form differs; the principle — making implicit state explicit and inspectable — is the same.

4.3Prescription 3 (Deciding): Prospective Memory as the Steerability Surface

Prospective memory — the agent's live model of its own planned commitments — provides the steerability surface that the Deciding domain requires in the long-running setting. If the agent maintains an explicit, queryable representation of what it intends to do next (and why), then a human interrupt can target that representation directly: canceling a specific intended action, modifying a goal parameter, or injecting a new constraint before execution rather than after. This is the difference between corrective steering and emergency braking.

Engineering this surface requires that prospective memory be a durable, inspectable artifact — not an ephemeral planning state inside a single model call. It should be serializable, versioned, and accessible to both the human interface layer and the execution engine. An agent that cannot surface its own intentions cannot be steered; it can only be stopped.

Existing evaluation frameworks are predominantly turn-level: they measure whether a given output was correct given the preceding input. For agentic trajectories, this is systematically misleading. A sequence of individually plausible steps can constitute a catastrophically flawed trajectory — one in which uncertainty accumulated silently, assumptions went unflagged, and goal validity was never reconfirmed despite shifting conditions. Epistemic integrity therefore requires trajectory-level evaluation: measuring not just whether outputs were correct, but whether the epistemic journey that produced them was sound.

4.4Cross-Cutting: Uncertainty-Gated Execution and Model Independence

Model independence. When the uncertainty audit trail, capability state record, and prospective memory surface are defined at the system level — not inside the model — they persist across model upgrades, swaps, and fine-tuning cycles. The stable what does not change when the model changes; only the volatile how by which epistemic state is produced changes. This is the stable-what principle made operational against the model layer.

Uncertainty-gated execution. The audit trail provides a principled basis for compute allocation: steps where the agent has high confirmed grounding proceed with shallow deliberation; steps where assumption density is high trigger deeper reasoning or human escalation. This is not a heuristic — it is a property of the audit trail made actionable. A system with epistemic integrity is, by construction, a system that avoids wasteful deep deliberation on steps it is already equipped to take — directly addressing the economics constraint from §2.4.

5Worked Example: A Multi-Week Vendor Negotiation

To make the framework concrete, consider a long-running agent coordinating a vendor contract renewal over three weeks — drafting correspondence, scheduling meetings, maintaining state across conversations, and surfacing decisions for human review.

Week 1

The user authorizes a goal: Renew the XYZ contract by May 15; target 10% cost reduction; maintain current SLA terms. The agent writes this into prospective memory as a goal object with a deadline, two explicit validity conditions (cost target, SLA), and an issuance timestamp. The uncertainty audit trail records the initial state of each condition as confirmed.

Week 2

The user, in a meeting with a separate team, learns of a new internal requirement: XYZ must commit to a security certification as part of any renewal. The user updates a project document but does not explicitly re-instruct the agent.

In a session-bounded architecture, this update is invisible to the agent — it has no session in which to receive it. In a long-running architecture without AEI, the agent continues executing the original goal, oblivious to the new constraint. It surfaces a successfully negotiated renewal without the certification term. The user discovers the gap too late to renegotiate without relational cost.

With AEI, three things happen differently. Belief revision, triggered by the agent's next ingestion pass over the project document, detects inconsistency between the original goal's validity conditions and newly observed evidence. The prospective memory entry for the goal is flagged: original conditions are marked potentially superseded. The agent does not autonomously re-scope the goal — that is outside its authorization. Instead, it emits a steerability signal: "Original goal conditions may be superseded by new requirement in project document; confirm scope before proceeding." The uncertainty audit trail records the assumption the agent would otherwise have made (assumed: original goal conditions unchanged) as an explicit annotation, making the decision pathway legible to any reviewer.

Week 3

An earlier tool invocation — a proposal email sent to the vendor — committed external state. The user now requests a revised proposal with updated pricing. A naive retry would issue a duplicate proposal, confusing the vendor. Capability state, however, records that the email was sent, with a message ID, and that any follow-up requires either a correction message or an explicit retraction. The execution path branches accordingly. The capability state record is updated to reflect the revised commitment chain.

The scenario is deliberately mundane. That is the point. AEI does not address exotic failure modes. It addresses failure modes that become systemic when session-bounded architectures meet multi-week operation, and it is in mundane workflows that the costs compound fastest and with the least visibility.

6Limitations and Open Problems

AEI is meant as a design framework, not a complete theory. Several limits should be made explicit.

6.1This Paper Is Not Yet a Full Formal Specification

This draft does not provide a canonical state tuple, update algebra, or proof obligations for AEI. It offers a conceptual decomposition and a set of architectural consequences. A more formal specification — defining precise state models, transition semantics, and measurable integrity conditions — remains future work. The framework is offered as a systems framing that sharpens the design target; it does not yet deliver the formal machinery needed to verify whether a given system meets it.

6.2Better State Surfaces Do Not Fix Goal Quality

AEI assumes there is some meaningful goal to maintain and revise. It does not solve the problem of poorly specified or misaligned goals at issuance time. A system with perfect AEI can faithfully pursue a goal that was misspecified from the start. AEI improves traceable execution; it does not substitute for intentional alignment.

6.3Calibration Remains Empirically Open

The uncertainty audit trail presupposes that agents can produce meaningful uncertainty estimates. Recent work on large language model calibration [Kadavath] finds that even instruction-tuned models can express high confidence in incorrect conclusions and hedge excessively on well-grounded ones. An uncertainty trail is only as useful as the uncertainty estimates attached to it. AEI creates the infrastructure in which calibration matters; it does not guarantee that calibration is solved.

6.4Multi-Agent Composition Is Harder Than Single-Agent Integrity

AEI is developed primarily for a single agent operating with a defined set of capabilities and a coherent principal hierarchy. Real deployments increasingly involve networks of specialized agents, sub-agents spawned dynamically, and orchestrators that are themselves model-driven. In such settings, epistemic integrity becomes a compositional property. One agent's clean state can still be polluted by another agent's stale or overconfident outputs. This extension is a necessary direction for future work.

6.5Explicit State Introduces Cost and Attack Surface

Persisting belief state, capability state, and prospective memory adds storage, latency, synchronization, and security burdens. It also creates new objects that can be tampered with if poorly secured: a poisoned belief can propagate through the audit trail as a confirmed annotation, and a manipulated prospective memory entry can redirect behavior while appearing legitimate to a human auditor [Greshake]. Those costs and risks are real. They are part of the engineering tradeoff, not a reason to avoid making state explicit altogether.

6.6The Cold Start Problem for Prospective Memory

How does a long-running agent bootstrap a coherent prospective memory representation when first deployed, when transitioning between tasks, or when recovering from unexpected state loss? The framework identifies prospective memory as the right primitive without specifying how its contents should be initialized, validated, or recovered. That operational specification — a schema and lifecycle protocol for prospective memory objects — is a concrete engineering challenge this paper leaves open.

6.7Adoption Is a Coordination Problem

The primitives proposed here — audit trail, capability state record, prospective memory surface — are architectural, not standards. Their practical value depends on adoption across the ecosystem: tool developers who surface capability state, orchestrators that consume uncertainty annotations, evaluation platforms that operate on trajectory-level inputs. A single system that implements these primitives in isolation gains internal benefits but cannot participate in the cross-system integrity that multi-agent deployments require. The path from architectural principle to deployed standard is a coordination and incentive problem that the quality of the specification alone does not close.

8Conclusion

As agentic systems move from session-bounded interactions to persistent operation, the field needs a clearer vocabulary for what actually breaks. The central problem is not simply that long-running agents need more memory. It is that what they know, what they have already done, and what they are still trying to do must remain coherent and correctable across time.

Agent Epistemic Integrity is offered as a framework for naming that requirement.

The framework is intentionally stronger than a memory taxonomy and intentionally weaker than a complete formal system. Its immediate value is architectural: it highlights why stale beliefs, partial side effects, and goal drift cannot be treated as independent edge cases once agents persist across sessions. Its longer-term value, if the framing proves useful, would be to guide schemas, benchmarks, and runtime interfaces that make long-running agents easier to inspect, interrupt, and trust.

The stable what / volatile how principle is the organizing commitment underneath the framework. The requirement to maintain coherent, correctable epistemic state across Knowing, Doing, and Deciding does not change as models improve or deployment contexts diversify. The mechanisms — retrieval strategies, uncertainty scoring, capability logs, prospective memory schemas — will keep evolving. Systems built around the stable requirement will remain architecturally coherent across that evolution. Systems built around today's implementation patterns will require reinvention each time those patterns change.

The long-running agent is not a future concern. It is a present deployment reality, and the systems running in that regime today are, in the main, session-bounded architectures operating beyond their design envelope. The failure modes this paper describes — stale beliefs acted upon as current, side effects not tracked, goals pursued without surfaces for correction — are happening now, without the architectural vocabulary to name them clearly. This paper is an attempt to supply that vocabulary.

References