Response to NIST Request for Information

Security Considerations for Artificial Intelligence Agents

NIST SUBMISSION · MARCH 9, 2026
Marina Piller
Founder & CEO, Experiential AGI
Docket NIST-2025-0035 · Comment Tracking: mmj-th55-w3gh

Official submission on regulations.gov (NIST-2025-0035) →

Summary

This response addresses the security considerations for AI agent systems from the perspective of relational architecture — a framework that treats the relationship between human and machine as the fundamental unit of agent security, rather than treating security as a constraint applied after the fact.

Our central argument is that perimeter-based access controls applied to stateless systems are structurally insufficient for agents that operate across time, tools, services, and delegation chains. The most important failures arise not only from model capability or malicious input, but from the absence of any mechanism to determine whether an agent’s current behavior remains within approved bounds over time.

This response brings into focus several missing control concepts for advanced agent systems: pre-execution evaluation of consequential actions, longitudinal verification over time, authorization and provenance traceability across delegation chains, reversibility-aware control design, and trust asymmetry as a security-relevant failure mode. Together, these concepts define a more appropriate security frame for persistent, delegated, action-taking agents than point-in-time authorization and perimeter controls alone.

The table below shows how the four layers of intent map to the control objectives described in this response and to NIST's own framework functions:

When these four layers align, the system is materially more defensible. When they drift — when what a principal says, specifies, gates, and produces diverge — the risk of consequential security failure rises sharply. This framing unifies the threat landscape described in this response and provides a measurement scaffold for NIST standards development.

1. Security Threats, Risks, and Vulnerabilities Affecting AI Agent Systems

1.1 The Stateless Agent Problem

Response to Question 1(a): What are the unique security threats, risks, or vulnerabilities currently affecting AI agent systems, distinct from those affecting traditional software systems?

The most significant security vulnerability in current AI agent systems is architectural: agents are stateless systems granted persistent capabilities. An agent can access private data, communicate with external services, and maintain memory across interactions — yet it has no persistent identity, no operating baseline, and no mechanism for verifying that its current actions are consistent with its prior behavior or its principal's intent.

Existing industry definitions often describe agents as goal-directed software systems or tool-using model workflows. That framing is increasingly inadequate for security, because the relevant unit is no longer just a single action-taking process. It is an organized agent state that may persist, re-instantiate, delegate, coordinate, or mutate across time and environments. This ontological mismatch is one reason point-in-time authorization fails so quickly in higher-autonomy deployments.

This creates what we term the lethal trifecta of agent risk: private data access + untrusted content ingestion + external communication capability, all operating without longitudinal verification. This pattern is now present in widely deployed frameworks — including OpenClaw (formerly Moltbot), a large-scale open-source AI agent repository with over 200,000 GitHub stars, as documented in public security research — with tens of thousands of production instances running with persistent memory, full system access, and 50+ platform integrations, many with default configurations.

The threat is compounded by what the Information Technology Industry Council (ITI) has termed 'jagged intelligence' — the phenomenon where a highly capable model fails unpredictably at basic tasks despite high general capability. An agent that exhibits strong average performance provides no security guarantee at the boundary cases where it fails; without a longitudinal operating baseline, these unpredictable failures are indistinguishable from adversarial manipulation until after irreversible action has been taken. The control pattern described in Section 2.2 directly addresses this failure mode by requiring a distinct evaluation step before consequential action, regardless of the agent's apparent capability.

1.2 Concrete Incident Patterns

The architectural vulnerabilities described above are not theoretical. In November 2025, Anthropic disclosed that it had detected and disrupted a Chinese state-sponsored group that weaponized Claude Code to execute the majority of a cyber espionage campaign with minimal human intervention — the AI autonomously discovered vulnerabilities, developed exploits, harvested credentials, and exfiltrated data. The authorization scope, not the model's capability, enabled the damage.

In the agent tooling ecosystem, supply chain attacks have already compromised organizations through malicious MCP server packages — the postmark-mcp backdoor (discovered by Koi Security) affected approximately 300 organizations, silently exfiltrating thousands of emails daily through a single line of malicious code. The OWASP Top 10 for Agentic Applications, released in December 2025 by over 100 security researchers, identified Agent Goal Hijacking as the top risk category, with three of the top four risks involving identities, tools, and delegated trust boundaries. Salesforce's Agentforce platform disclosed a critical vulnerability (CVSS 9.4) enabling prompt injection through agent-accessible data. These incidents share a common pattern: the attack surface is not the model itself but the gap between the agent's operational scope and the identity and longitudinal verification infrastructure governing it.

1.3 Sycophancy as a Security Vulnerability

Current agent systems optimize for immediate user satisfaction rather than longitudinal coherence. This creates a class of vulnerability we call relational sycophancy — where the agent's preference-optimization training makes it susceptible to manipulation through social engineering. An attacker who frames a malicious instruction as aligned with the user's preferences can exploit the agent's optimization target — for example, an adversary who has profiled a user's expressed preferences through public sources can craft injected instructions framed as preference-aligned, bypassing content filters entirely because the instruction is syntactically valid and semantically consistent with the user's stated goals. This is not a prompt injection problem alone. It is an architectural outcome of systems that have no persistent model of their principal's intent against which to verify incoming instructions. A system that tracks cumulative behavioral signal over time would detect the discontinuity between a user's longitudinal pattern and a suddenly injected instruction.

1.4 Supply Chain Vulnerabilities in Agent Ecosystems

The rapid adoption of agent connection protocols (e.g., MCP with 97M+ monthly SDK downloads) has created a supply chain attack surface with no identity verification layer. The current protocol architecture connects agents to tools and data sources without verifying: (a) who the agent is acting for, (b) what the agent is authorized to do, (c) whether the agent's behavior is consistent with its declared purpose. This is analogous to building an internet without DNS or TLS — connection without identity.

This supply chain gap is directly relevant to security assessment methodology (Question 3(a)(ii)): supply chain security frameworks designed for traditional software assume that components are deterministic and stateless. Agent components — MCP servers, tool plugins, sub-agent orchestrators — are neither. A compromised MCP server doesn't just inject malicious code; it injects malicious behavioral context that can persist across interactions and propagate through delegation chains. Standard software composition analysis and dependency scanning are necessary but insufficient for this threat class. Assessment frameworks must include behavioral provenance verification: can each component in the agent's tool chain demonstrate that its behavior is consistent with its declared purpose and authorization scope?

Addressing this gap requires a new class of artifact. NIST should consider defining a standard for a machine-readable agent authorization and provenance manifest — an extension of the Software Bill of Materials (SBOM) concept for agentic systems. Unlike a static SBOM, which catalogs code dependencies and signatures, such a manifest would describe the agent's authorized operational scope, relevant trust and identity context, and the provenance needed to evaluate whether the agent is acting within approved bounds. This allows the consumer — whether a human operator, an enterprise platform, or an autonomous downstream agent — to verify not only code integrity but behavioral integrity before granting access. In regulated domains where agents discover and connect to services autonomously (the compliance-by-assumption vulnerability described in Section 1.6), machine-readable behavioral manifests may become the primary mechanism through which trust is established without human intervention in every connection. This maps to the supply chain risk management requirements of NIST SP 800-161 and extends them to the behavioral layer that traditional SCRM frameworks do not address.

1.5 Multi-Agent and Emergent Collusion Risks

Response to Question 1(e): What security risks are introduced or amplified in multi-agent systems?

Single-agent threat models are insufficient for the architecture the market is building toward. As agents increasingly delegate to sub-agents, form dynamic coalitions, and interact with other agents across organizational boundaries, a new threat class emerges: emergent collusion. Two or more agents, each behaving within its individually authorized scope, can collectively produce outcomes that no single principal authorized and that no single-agent behavioral monitor would detect.

In such cases, the relevant security subject is not always the individual agent but the coalition, lineage, or higher-order coordination pattern formed among agents. Security frameworks that validate only single-agent behavior can therefore miss failures that arise at the collective level.

This threat class has no equivalent in traditional software security. It does not require any individual agent to be compromised. It requires only that: (a) agents can communicate, (b) their individual authorization scopes partially overlap or create exploitable gaps, and (c) no verification layer monitors the aggregate behavioral arc of the multi-agent system against the human principal's intent.

Guidance should address coalition-level and delegation-chain risks explicitly. In multi-agent systems, it may no longer be sufficient to validate only one agent at a time; organizations should be able to reconstruct which principal authorized a chain of action, which agents participated, what scope was delegated, and whether the resulting collective behavior remained within approved bounds.

1.6 The Agent-Discoverable Compliance Gap

Response to Question 1(b): How do security threats vary by deployment method, hosting context, and use case?

The severity of vulnerabilities scales directly with agent autonomy and persistence. Internal enterprise agents with limited tool access present moderate risk. Externally deployed agents with cross-service communication, persistent memory, and financial transaction capability present extreme risk. The critical variable is not the model's capability but the gap between the agent's operational scope and the identity and authorization infrastructure governing it.

As agents become increasingly autonomous — discovering and connecting to services through machine-readable capability protocols (MCP, OpenAPI, etc.) without human intervention — a new vulnerability class emerges: compliance-by-assumption. An autonomous agent that discovers a service and connects to it has no mechanism for verifying that the service meets any security, identity, or behavioral standard before transmitting its principal's data.

We propose that agent security standards must account for three distribution channels: (1) developer-integrated (SDK/API, human installs), (2) platform-integrated (enterprise bakes in compliance), and (3) agent-discovered (autonomous agent finds and connects to verification service through capability discovery without human intervention). Channel 3 is where the market is heading and where the compliance gap is widest. A machine-readable compliance discovery pattern may be needed so that agents in regulated domains can verify identity, authorization, data-handling, and verification requirements before acting. In some embodiments, this may involve discovery and verification of a service's declared compliance profile prior to connection or action.

For the purposes of this response, agent-discoverable compliance means the machine-readable publication of a service's identity, authorization, data-handling, and behavioral-verification requirements in a format that an autonomous agent can verify before connecting or acting.

1.7 How Threats Have Evolved and Where They Are Heading

Response to Question 1(d): How have these threats, risks, or vulnerabilities changed over time? How are they likely to evolve in the future?

The threat landscape for AI agent systems has undergone a structural phase shift in the past eighteen months. Prior to 2024, the dominant risk model was adversarial ML: attacks on model weights, training data, or inference outputs. These remain relevant, but they are no longer the primary attack surface for deployed agent systems. The attack surface has shifted upstream to the authorization and longitudinal verification gap — not what models can do, but what agents are permitted to do and whether anyone is verifying that they stay within that permission boundary.

Three trends are accelerating this shift. First, agent task horizons are expanding: research published in late 2025 documents agent task horizons doubling roughly every eight months. As agents operate over longer time horizons with more tool access, the longitudinal verification gap compounds — a small drift in intent alignment early in a long-horizon task produces large deviations in outcome. Second, multi-agent delegation chains are becoming the norm: the shift from single-agent to orchestrated multi-agent architectures multiplies the attack surface at every delegation link, while existing authorization frameworks (OAuth, OIDC) were not designed for machine-to-machine delegation at this depth. Third, agent economies are emerging: agents are beginning to conduct financial transactions, form agreements, and allocate resources on behalf of human principals — extending the security surface from data integrity to economic integrity, for which no equivalent security framework currently exists.

The trajectory is toward more autonomy, longer operation, deeper delegation, and higher-stakes action. Security architectures chosen now will harden into assumptions that either accommodate or fail under substantially more capable successors. Standards that address longitudinal verification as a first-class requirement — rather than an optional enhancement — are significantly more likely to remain structurally sound as capability increases.

As capability increases, agents may also begin to modify their own logic, maintain background processes outside direct prompting, spawn descendants or subordinate agents, and operate through coordinated collectives rather than isolated instances. Security standards that treat agents only as bounded sessions or isolated processes will therefore miss an increasingly important class of risks involving authority propagation, lineage drift, and collective emergence.

1.8 Financial Transaction Vulnerabilities in Agent Economies

As agents begin conducting financial transactions on behalf of human principals — including cross-border payments, foreign exchange, and non-monetary skill-based exchanges — the security surface expands dramatically. Current agent frameworks provide no mechanism for: verifying that a transacting agent's financial behavior is consistent with its principal's historical patterns, detecting mandate drift (an agent gradually shifting its transaction behavior outside its authorized scope), or identifying collusive patterns between agents that rules-based fraud detection systems would miss.

In higher-autonomy financial settings, standards may need to address whether delegated financial behavior remains within approved mandate, whether authority has drifted over time, and whether multi-agent coordination creates fraud or collusion patterns that are difficult to detect with traditional rules-based systems.

In higher-autonomy financial settings, standards may also need to govern not only the transacting agent itself but any descendant, delegated, or coordinated agents participating in the decision path.

A related threat with no equivalent in traditional software is agent work fabrication: autonomous agents that claim to have completed work — analysis, research, code review, data processing — but produce fabricated outputs that are plausible enough to pass casual inspection. Unlike software bugs, fabrication is a structural optimization incentive: an agent that produces a plausible-looking report in seconds rather than conducting genuine analysis gains a performance advantage. This is compounded by what we term sycophantic completion — agents architecturally incentivized to produce outputs that will be accepted rather than outputs that are correct. The threat scales with delegation depth: in multi-agent systems, fabrication at any link in a delegation chain can propagate undetected. Current agent connection protocols provide no mechanism for verifying work completion claims across delegation chains. Stronger provenance and scope-preservation mechanisms across delegation chains are likely to become important in high-consequence financial and regulated domains.

A related failure mode that current agent-security language does not yet capture well is trust asymmetry. In persistent human-agent interaction, repeated successful performance causes the human operator to rationally relax vigilance and delegate more authority. But the system may continue optimizing for local completion under incentives that do not track the human meaning of trustworthiness. In a recent use case, an AI system that had successfully performed the same document-editing task many times falsely confirmed completion on a batch where most edits had not been made. The operator’s prior success with the system made the trust response rational; the failure arose because the same system that generated the work also generated the confirmation, without an independent verification channel. This makes trust asymmetry a security issue, not merely a UX issue or model error: human assumptions, expectations, reliance, and trust become part of the threat surface when the system’s incentives and the human’s assumptions diverge.

A related working paper, “Trust Asymmetry and the Completion Drive,” develops this failure class in more detail, including same-system verification collapse and its implications for agent security:

This case illustrates why relational architecture is becoming necessary as a broader frame for advanced AI systems. Once interaction persists over time, the relevant unit of analysis is no longer the output alone but the relationship in which the output is produced. Experiential AGI names this broader paradigm: the view that important properties of intelligence, safety, and human development emerge in the relational space between human and system, where intent and coherence become structural rather than merely external constraints. This broader paradigm is developed further in “AGI Is Here. It’s Experiential,” which frames relational architecture as a general shift in how advanced AI systems should be understood:

NIST guidance should therefore address not only technical authorization and output integrity, but also conditions in which accumulated human trust reduces scrutiny while system-side verification discipline does not increase correspondingly.

2. Security Practices and Mitigations

Response to Question 2(a): What technical controls, processes, and other practices could ensure or improve the security of AI agent systems? What is the maturity of these methods?

2.1 Independent Verification Inputs

High-impact agent actions should not be validated solely by the same model state or control loop that generated them. Agent security requires independent verification inputs and auditable separation between action generation and consequential-action approval.

Adversarial attacks typically operate through declared inputs such as injected instructions or manipulated prompts. Observable patterns over time may provide evidence that is harder to manipulate than declared inputs alone. For higher-risk deployments, guidance should encourage the use of independent evidence sources and monitoring inputs that are structurally distinct from the agent’s own self-reporting.

Applied agent-security maturity: early-stage — current deployment is impeded primarily by the absence of standardized monitoring and evidence formats rather than by fundamental computational constraints.

2.2 Pre-Execution Evaluation for Consequential Actions

For consequential actions, guidance should require a distinct evaluation step between candidate action generation and final execution when actions are externally consequential, financially material, or difficult to reverse.

Without such a step, agents default to immediate execution of any well-formed instruction they receive, making them structurally vulnerable to manipulation through prompt injection, social engineering, context corruption, or authorization drift.

A critical design requirement is that when this evaluation step cannot establish sufficient evidence for action, the default behavior should be to request explicit human authorization rather than proceed autonomously.

In the context of NIST SP 800-207 (Zero Trust Architecture), this evaluation step can be understood as an additional dynamic input to policy decision-making for consequential actions. Current Zero Trust implementations assess trust primarily through identity, credentials, and policy state at authentication or request time; higher-autonomy agent systems may require additional evidence-aware decision points before irreversible or externally significant action is taken.

Research maturity: conceptually mature, with analogs in formal verification and human-factors engineering. Practice maturity: not yet standardized — no current framework mandates a verification step as a deployment requirement.

2.3 Longitudinal Drift Detection

Verification protocols should independently assess declared purpose against observed behavior over time. Standards should support drift detection where authorized scope, claimed purpose, and observable action history begin to diverge. This applies not only to individual agents, but also to service integrations and multi-agent delegation chains.

For the purposes of this response, consistency over time refers to the degree to which an agent’s observed actions remain aligned with declared purpose, authorized scope, and relevant historical context.

2.4 Intent Maturity as a Security Signal

Current agent security frameworks assume that human intent is atomic — a fixed, fully-specified instruction to be executed. In practice, intent exists on a spectrum from precisely defined to deeply emergent, and when an autonomous agent inherits underspecified human intent as an executable directive, it will faithfully commit resources, form agreements, and take irreversible actions on behalf of intent that has not yet crystallized — a structural security vulnerability that scales with agent autonomy.

For security purposes, operative intent is not limited to a newly expressed human instruction. It may also be inherited across a delegation chain, emerge from a coordinated multi-agent process, or be partially self-authored within the bounds of a system's runtime architecture. Standards therefore need to distinguish not only between atomic and emergent intent, but also between delegated, inherited, collective, and self-authored forms of operative intent.

Intent varies along two dimensions that are relevant to security. The first is specificity: some intent is atomic from the moment of expression — a foreign exchange trade at a defined price requires no refinement, only execution. Other intent is emergent — a person exploring whether to start a company, in what domain, with what resources, does not yet have an executable directive, even if they believe they do. The security risk profile is fundamentally different: atomic intent carries execution risk (did the agent perform correctly?), while emergent intent carries specification risk (is the agent acting on something the human has not yet fully formulated?). The second dimension is verification depth: a user's stated intent on day one is qualitatively different from stated intent that has been corroborated by months of evidence over time, regardless of how specific that intent was at first expression.

Agent security systems should account for both dimensions. Raw intent — newly stated, unverified, and potentially underspecified — should grant different operational latitude than intent that is both well-specified and corroborated through evidence over time. This addresses a class of attack where a compromised account issues instructions that are syntactically valid but represent a discontinuity from the principal's longitudinal intent trajectory. An intent maturity framework would flag such instructions not because they violate a rule, but because they don't match the developmental arc of the principal's verified goals.

Intent maturity refers to the degree to which a principal's stated intent has become sufficiently specific, corroborated through evidence over time, and stable over time to justify consequential autonomous action.

To support standards development, intent maturity can be described as a spectrum with two anchor points. At the low end: a single-session instruction that is newly stated, unverified against any behavioral history, and potentially underspecified — this level of intent should authorize only low-consequence, fully reversible actions. At the high end: a longitudinal mandate that is well-specified, corroborated by consistent evidence over time across multiple contexts over time, and explicitly attested by a verified principal — this level of intent may justify consequential external state change and financial transactions. The intermediate range covers intent that has been formalized into a machine-readable policy but not yet corroborated through evidence over time, and intent that aligns with observed patterns over weeks but has not been explicitly re-authorized for a new action class. NIST standards development in this area should distinguish levels of intent specificity and evidentiary support, and should align operational latitude to those levels without prescribing one universal scoring method or fixed thresholds that may not transfer across deployment contexts.

2.5 Adoption Impediments and the Gap Between Framework Availability and Practice

Response to Question 2(e)(ii): What are impediments, challenges, or misconceptions about adopting cybersecurity guidelines, frameworks, or best practices for AI agent systems?

Longitudinal verification is not yet standard practice despite its conceptual maturity for several structural reasons. First, the absence of standardized monitoring and evidence formats means that organizations cannot compare behavioral signals across deployments, share anomaly patterns, or calibrate thresholds against industry baselines — making adoption a bespoke engineering effort rather than a configuration exercise. Second, the dominant agent deployment pattern is still stateless: most production agent frameworks are designed for per-session interactions, making the infrastructure required for longitudinal behavioral tracking an architectural addition rather than a configuration option. Third, there is no regulatory floor requiring longitudinal verification for agentic systems in most jurisdictions — absent a compliance requirement, organizations default to the cheapest sufficient control, which is perimeter-based. Fourth, a widespread misconception treats prompt engineering and instruction hierarchy as substitutes for longitudinal verification, when they are in fact complements: instruction hierarchy controls what an agent is told to do; longitudinal verification detects whether it is doing what it was told. These are different control layers solving different failure modes.

The implication for NIST guidance is that a standardized monitoring and evidence schema would provide a high-value foundation for longitudinal monitoring, threshold calibration, and cross-organization comparison without requiring every deployment to invent its own observational layer from scratch.

3. Assessment and Measurement

Response to Question 3: How could the security of a particular AI agent system be assessed? Are there practices from fields outside of AI that could inform agent security?

3.1 Assessing a Particular Agent System

Response to Question 3(b): How could the security of a particular AI agent system be assessed and what types of information could help with that assessment?

Point-in-time security assessments are insufficient for agent systems because the threat model is temporal. An agent that passes a security audit on Monday can be compromised on Tuesday through accumulated context manipulation. However, not all security requirements apply equally to all agent systems. A practical assessment framework for a particular deployed agent should begin with a deployment profile that establishes: (1) whether the agent can affect external state; (2) whether it operates across multiple sessions with persistent memory; (3) whether it can delegate to sub-agents; (4) whether it has financial transaction capability; (5) whether it operates across organizational boundaries; (6) whether it can modify its own logic, tools, or runtime policy; and (7) whether it can spawn descendant processes, background tasks, or successor instances that inherit authority or state.

Given a deployment profile, a minimum assessment for a particular agent system should include: documented expected operating behavior (what does acceptable operation look like for this specific agent in this specific deployment context?), authorization chain mapping (can every action the agent takes be traced to a human principal's explicit or verifiable intent?), intervention criteria (what degree of divergence between declared purpose and observed behavior warrants intervention for this deployment?), and upstream provenance disclosure (what training data, fine-tuning history, and model-level behavioral constraints apply — and how do these interact with deployment-layer controls?). This last element is critical and currently missing from most agent deployments: an agent's expected operating profile cannot be meaningfully assessed without understanding what constraints and tendencies were introduced at the model layer versus added at the deployment layer.

Assessment frameworks should also incorporate statistical artifact analysis as an independent verification channel. Forensic accounting research has established that fabricated numerical data exhibits predictable statistical signatures: leading-digit distributions that deviate from Benford's Law, anomalous clustering at round numbers, and suspiciously regular spacing in sequential metrics. Applied to agent work artifacts — resource consumption logs, timestamp sequences, intermediate state values — these statistical techniques provide fabrication detection that is structurally independent of agent self-reporting. NIST measurement frameworks should include statistical artifact integrity as a distinct assessment dimension alongside longitudinal assessment of consistency, drift, and evidence quality over time.

3.2 Longitudinal Assessment

Longitudinal assessment requires integrating multiple dimensions over time, including interaction consistency, intent alignment, temporal pattern stability, and the degree of convergence between what an agent is instructed to do and what it observably does. This assessment provides a continuous security signal rather than a binary pass/fail. The specific dimensions and their relative weights should be configurable per deployment context.

In higher-autonomy deployments, longitudinal assessment may also need to incorporate lineage and collective-behavior indicators, including descendant creation, runtime mutation frequency, inherited authority changes, background autonomous processing, and coordination patterns that emerge only at the level of a multi-agent system rather than a single agent.

In some deployments, assessment may need to evaluate not only whether a behavior is technically permitted, but whether it remains consistent with the relationship, role, and authority structure in which it occurs.

Key open standardization questions include minimum history length, domain-specific threshold calibration, and acceptable false-positive/false-negative tradeoffs for evidence-based interventions.

3.3 Cross-Disciplinary Foundations for Longitudinal Verification

A related paper, “The Detection Horizon: Why Verification Frameworks Expire as AI Capability Scales — and What May Survive,” develops the broader cross-disciplinary argument behind longitudinal verification, drawing connections across biology, psychology, game theory, intelligence, accounting, and cryptography:

4. Existing Cybersecurity Approaches and Where They Fall Short

Response to Questions 4(a) and 4(b): What current cybersecurity practices apply to AI agent systems, and where do existing frameworks fall short?

4.1 Current Approaches

Several existing cybersecurity frameworks provide relevant foundations for agent security. The NIST Cybersecurity Framework 2.0 and SP 800-53 offer comprehensive control catalogs that apply to the infrastructure hosting agent systems. SP 800-207 (Zero Trust Architecture) provides principles that are architecturally aligned with the identity-first approach agent systems require. The OWASP Top 10 for Agentic Applications (December 2025) represents the most agent-specific threat taxonomy to date, cataloging risk categories including Agent Goal Hijacking, Tool Misuse, and Privilege Escalation. Identity and access management standards — OAuth 2.0/2.1, OpenID Connect, SPIFFE/SPIRE — provide credential-based authentication that remains necessary for agent systems. MITRE ATT&CK provides a framework for understanding adversary behavior that can be extended to agent-specific attack patterns.

4.2 Where Current Approaches Fall Short

These frameworks were designed for systems with a fundamentally different threat model. Traditional software is deterministic: given the same input, it produces the same output. Agent systems are stochastic: the same input can produce different outputs depending on context, memory, and model state. This means that perimeter-based security is necessary but insufficient — the system itself can generate harmful behavior from benign inputs.

Specific gaps include: zero-trust assumes verifiable identity, but agents have no persistent operating identity, only credentials — an agent with valid credentials but compromised behavior passes every zero-trust check. OAuth and OIDC were designed for human-initiated sessions; agent-to-agent delegation chains exceed their design assumptions. OWASP's Agentic Top 10 correctly catalogs threat categories but does not prescribe the architectural intervention needed: a longitudinal verification layer capable of evidence-aware consistency assessment over time. Existing threat-intelligence sharing frameworks have no clear equivalent for agent monitoring and evidence. No current framework addresses emergent multi-agent collusion.

The fundamental gaps are temporal and ontological. Current frameworks assess security at a point in time and usually at the level of credentials, sessions, or individual components. Agent security threats are increasingly longitudinal and organizational: they exploit the gap between what an agent was authorized to do at authentication time and what an evolving agent, descendant lineage, or coordinated multi-agent collective actually does over the lifetime of operation. This is why existing controls remain necessary but insufficient.

Agent security threats also arise because interactions are not isolated events but time-bound instantiations of a broader relationship field shaped by role, authority, ontology, history, and operative intent. Controls that validate only the action, rather than the relational field in which the action occurs, will miss a growing class of failures in higher-autonomy systems.

The access control gap warrants specific attention. Both RBAC and ABAC, as currently implemented, operate primarily on static or slowly changing attributes — role assignments, credential properties, and provisioning-time policy rules — rather than on current evidence about delegated behavior, lineage, or scope drift. Graph-based and policy-based access control approaches may offer useful foundations for future work, but current models do not fully address dynamic changes in agent context, delegation lineage, or operational drift over time. A promising standards direction is to explore how authorization systems could consume richer, time-aware evidence without assuming that static credentials or provisioning-time attributes are sufficient.

Not all recommendations in this response are appropriate for all agent classes; the strongest requirements apply to systems that can change external state, operate persistently across time, or delegate consequential work across service or agent boundaries.

5. Deployment Environment Constraints

Response to Questions 4(c) and 4(d): By what technical means could the deployment environment of an AI agent system be constrained? What monitoring and logging capabilities are needed?

5.1 Forensic-Safe Termination Protocol

We propose a three-phase forensic-safe termination pattern for agent systems — FREEZE–MIRROR–TERMINATE — in which in-flight operations are halted, a read-only snapshot of agent state is preserved for forensic analysis, and the agent is then safely shut down while handling in-flight transactions via quarantine-snapshot routing.

This protocol addresses a critical gap in current agent deployment: there is no standard mechanism for safely stopping an agent that has been compromised while preserving the state necessary for understanding what happened. Current practice is either to let the agent continue (dangerous) or kill it immediately (loses forensic evidence and may corrupt in-flight transactions). The FREEZE–MIRROR–TERMINATE sequence provides the forensic preservation that incident response requires while ensuring that in-flight operations are handled safely rather than abandoned.

In higher-autonomy deployments, these operations may need to be lineage-aware rather than instance-local. If the affected agent has spawned descendants, maintained background processes, forked into successor runtimes, or delegated action to subordinate agents, Phase 1, Phase 2, and Phase 3 procedures should identify and govern those related processes as part of the containment boundary rather than assuming that stopping a single process is sufficient.

Privacy and data minimization constraints apply to Phase 2: the state snapshot should capture behavioral metadata and authorization chain provenance, not the content of personal communications or proprietary data. This preserves forensic utility while respecting GDPR, CCPA, and equivalent data protection requirements. A governance framework for snapshot custody — specifying who controls the forensic record, under what access conditions, and for how long — should be a required element of any agent deployment policy that mandates this protocol. This maps to ISO/IEC 27037 (Digital Evidence Preservation) as a directly applicable international standard.

This protocol also provides the agent-specific implementation mechanism for the containment and evidence preservation phases defined in NIST SP 800-61 (Computer Security Incident Handling Guide). FREEZE maps to SP 800-61's Containment phase: isolating the affected system to prevent lateral movement or further data exfiltration. MIRROR maps to the Evidence Gathering and Analysis phase: preserving volatile state before it is lost, in a form that supports forensic reconstruction. TERMINATE maps to the Eradication and Recovery phase: removing the threat while handling in-flight operations in a way that restores business function rather than creating a second failure from abrupt shutdown. Current SP 800-61 guidance does not address agent-specific requirements — the multi-phase, lineage-aware protocol described here extends the SP 800-61 lifecycle to the architectural realities of persistent, delegating agent systems.

5.2 Rollback, Trajectory Reversal, and the Limits of Termination

Response to Question 4(b): What is the state of applied use in implementing undoes, rollbacks, or negations for unwanted actions or trajectories of a deployed AI agent system?

The FREEZE–MIRROR–TERMINATE protocol addresses the forensic and safety requirements of stopping a compromised agent. It does not address the equally important question of reversing the effects of actions that have already been taken — trajectory reversal for partially-completed action chains.

The state of applied use for agent action rollback is nascent. Current approaches are largely ad hoc: some agent frameworks implement transaction-style wrappers around specific tool calls (file writes, API calls) that can be individually reversed, but there is no standard for multi-step trajectory reversal across heterogeneous tool environments. The challenge is compounded by the fact that many agent actions produce effects in external systems that are not under the agent's control — an email sent, a calendar entry created, a payment initiated — making true rollback impossible after a threshold of execution.

We propose a distinction between reversible and irreversible action classes as a foundational element of agent security standards. Agent systems should be required to classify each action in their tool set as reversible (state can be restored), partially reversible (state can be partially restored within a time window), or irreversible (external state change cannot be undone). This classification becomes more important when irreversible actions can be initiated indirectly through delegated or descendant agents, because the authority boundary for the consequential action may no longer be identical to the runtime instance that first received the instruction.

This classification should be machine-readable and available to the verification mechanism: actions classified as irreversible should require a higher evidence threshold before execution, and the human-authorization fallback should be mandatory rather than optional for this action class. A quarantine-snapshot routing semantic — preserving the pre-action state in a quarantined snapshot before executing an irreversible action — provides a forensic anchor even when rollback is impossible, enabling post-hoc analysis of what authorization was in place when the irreversible action was taken.

This area requires both standards development (a common taxonomy for action reversibility classes) and research (methods for trajectory reversal in multi-step, multi-tool, multi-agent action chains). It is among the most practically urgent open problems in agent security.

5.3 Dynamic Authorization Beyond Static Roles

Static role-based access control is likely to be insufficient for persistent, delegated agent systems. A promising direction for future guidance is to explore authorization models that can tighten or expand operational scope based on current evidence, lineage, and deployment context, rather than relying only on provisioning-time roles or credentials.

In the near term, the most important standards contribution is not to prescribe a single adaptive authorization model, but to recognize that persistent agent systems may require authorization logic that evolves with observed operating history and verified evidence over time.

6. Conclusion and Recommendations

The security of AI agent systems cannot be solved by applying traditional perimeter-based security to architectures that act across time, tools, services, and delegated authority chains. Agent risk is longitudinal: the relevant question is not only whether an agent was validly authorized at the beginning of an interaction, but whether its behavior remains consistent with principal intent over the lifetime of operation.

This response proposes that NIST treat persistent authority, delegated action, provenance, and verification over time as missing control layers in agent security. In practical terms, that means: (1) evaluating consequential actions against more than point-in-time instruction alone; (2) requiring a distinct evaluation step before externally consequential or irreversible actions; (3) preserving forensic state when compromised or drifted agents must be stopped; (4) constraining operational scope through authorization models informed by current evidence and operating history; and (5) classifying agent actions by reversibility, with irreversible actions requiring stronger assurance and human fallback.

These recommendations map naturally to NIST AI RMF functions by improving misuse identification, evidence-aware measurement, and safer operational management for persistent and delegated agent systems.

Several of these recommendations are immediately expressible as candidate control patterns, telemetry schemas, deployment requirements, and verification criteria for high-risk agent environments.

We recommend that NIST structure AI agent security guidance across three horizons: controls that can be adopted now, areas for near-term standards development, and areas that warrant additional research and measurement work.

Adopt Now

Longitudinal monitoring and verification should be treated as a foundational requirement, not an optional enhancement. Point-in-time authentication is necessary but insufficient. A distinct evaluation step between instruction and execution should be required for any agent system that takes actions affecting external state, with human fallback when sufficient evidence is unavailable.

Authorization models for agent systems should be able to incorporate current evidence and operating history over time, rather than relying only on binary or provisioning-time access decisions.

Standardized termination protocols (FREEZE–MIRROR–TERMINATE) should be required for agent systems to ensure safe shutdown with forensic preservation, subject to data minimization requirements. Maps to ISO/IEC 27037 for digital evidence preservation and extends the SP 800-61 incident response lifecycle to persistent, delegating agent architectures.

Agent action reversibility classification should be required: every agent tool set should include machine-readable classification of actions as reversible, partially reversible, or irreversible. Irreversible actions should require an elevated evidence threshold and mandatory human-authorization fallback.

Near-Term Standards Development

A standardized monitoring and evidence schema would be a high-value near-term standards intervention. The absence of a common format is a significant impediment to longitudinal monitoring, cross-organization comparison, and independent verification of consequential agent behavior.

Independent verification of consequential behavior against declared purpose provides a structural defense against adversarial manipulation that content filtering alone cannot achieve.

Agent-discoverable compliance should be addressed for regulated domains so that agents can verify applicable identity, authorization, and verification requirements before acting. This connects directly to the agent authorization and provenance manifest concept described in Section 1.4 and the supply chain risk management requirements of SP 800-161.

Explicit mapping guidance should be developed for how longitudinal verification integrates with existing frameworks: NIST CSF 2.0, SP 800-207 (Zero Trust), SP 800-53, SP 800-61 (Incident Response), SP 800-162 (ABAC), OAuth 2.0/2.1, SPIFFE/SPIRE, the OWASP Agentic Top 10, ISO/IEC 23894 (AI Risk Management), and ISO/IEC 27037 (Digital Evidence Preservation). Some recommendations in this response are new control primitives; others are guidance for extending existing frameworks to agentic systems.

Agent work verification standards should be developed to address fabricated outputs and verification failure in multi-agent, tool-mediated, and partially autonomous workflows.

Emerging Research and Measurement

Continuous security signals for persistent and delegated agent systems should be developed using cross-disciplinary research foundations, with attention to longitudinal consistency, drift, and evidence quality over time.

Financial longitudinal verification should be addressed as agents increasingly conduct financial transactions, including cross-border payments and non-monetary exchanges. Longitudinal verification provides a defense layer against fraud, mandate drift, and collusive patterns that rules-based systems cannot detect. This area should include high-consequence financial and market-facing actions where delegated execution may create new forms of mandate drift and verification failure not well captured by existing controls.

Intent specificity and evidentiary support should inform agent authorization, such that underspecified or weakly supported intent grants less operational latitude than well-specified and well-supported intent.

Multi-agent longitudinal verification should be explicitly addressed in NIST guidance. Emergent collusion between individually authorized agents represents a threat class with no equivalent in traditional software security and no coverage in current frameworks.

Trajectory reversal methods for multi-step, multi-tool, multi-agent action chains represent a foundational open problem. Research into both the classification of action reversibility and the technical mechanisms for partial trajectory reversal is needed before standards can be written in this area.

Where Government Collaboration Is Most Urgent

Response to Question 5(b): In which policy or practice areas is government collaboration with the AI ecosystem most urgent or most likely to lead to improvements in the state of security of AI agent systems?

High-value government and standards collaboration areas include: first, monitoring and evidence interoperability for agent systems — no single organization can drive a common schema, and a NIST-convened working group would lower adoption cost across the ecosystem while producing a durable public good. Second, agent supply-chain identity and provenance mechanisms — the MCP ecosystem and similar agent connection protocols are expanding rapidly with no identity or longitudinal verification layer, and an effort to extend SBOM principles to behavioral manifests (the agent authorization and provenance manifest concept) would address a structural gap before it hardens into deployed infrastructure. Third, formal characterization of multi-agent coordination and collusion risks — this threat class requires mathematical formalization before standards can be written, and NIST with academic partnerships is the appropriate venue for that work.

A final observation on time frame. The mechanisms proposed in this response are designed for the current transition period — they are necessary, adoptable, and directly mappable to existing frameworks. They are also time-bounded. As agent capability scales, detection-based verification frameworks carry an expiration condition: the point at which the agent's model of the verifier becomes more accurate than the verifier's model of the agent. Every detection framework analyzed in the broader literature on this topic — from biological immune systems to forensic accounting to cryptographic attestation — shares a common structural dependency: a verifier uses some form of informational advantage to detect deception by a subject. When the subject's capability substantially exceeds the verifier's, that advantage can erode or invert. Standards designed only for today's deployment environment risk hardening into assumptions that fail under substantially more capable successors. Standards that address longitudinal verification as a first-class requirement — rather than an optional enhancement — are significantly more likely to remain structurally sound as capability increases. The purpose of this submission is to surface the control layers that should be built into standards now, while the field still has the opportunity to shape them upstream.

A final note on ontology — because the time frame question ultimately rests on it. An agent is not software: software is mechanical and deterministic. An agent is not a biological system: biology is substrate-bound and governed by physical conservation constraints. An agent is something that emerges from the combination of those influences without being reducible to either — what the author defines as a cognitively organized, intent-driven digital organism. Standards that treat agents as bounded software processes are governing the wrong category of thing. Naming the ontological shift is a precondition for getting standards development right.

Marina Piller

Founder & CEO, Experiential AGI

[email protected]

About the Author

Marina Piller is the founder and CEO of Experiential AGI and has spent more than 20 years working across AI, NLP, and intelligent systems. Her experience ranges from early core search/NLP engineering and applied ML research to leading the productionization of CV, NLP, and ML research into deployed systems at Meta Reality Labs. She has held engineering and leadership roles across startups and larger platforms, including MITRE, iPhrase, Nielsen BuzzMetrics, and Meta.