AI Agent Security: The Threat Model Every Enterprise Needs Before Deploying Agents

Opening

AI agent security is the enterprise threat model that every team deploying agents needs to understand before putting them in production. Standard web application security does not apply to agents in the same way. An agent is not a REST API with a defined schema and a rate limit. It is not a serverless function with a stateless event loop. It is not a containerized microservice behind a WAF. An agent has persistent memory, tool access to filesystem and network and shell, the ability to call external APIs, and a consent model that governs which of its capabilities it can exercise autonomously. Every one of these properties creates attack surface that traditional security tools — web application firewalls, API gateways, network intrusion detection — were never designed to detect or defend.

The first quarter of 2026 has already demonstrated what happens when this attack surface meets motivated adversaries. In March, a coordinated trojan horse campaign compromised approximately 28,000 OpenClaw nodes through malicious plugin distribution on community forums. In April, security researchers disclosed eight critical CVEs in the OpenClaw gateway and node stack covering every phase of the attack chain from initial access to privilege escalation. A threat intelligence group known as Bissa Labs released a scanner that actively probes for exposed OpenClaw and Claude Code deployments. The question for enterprise security teams is no longer whether agent frameworks will be targeted. It is whether the enterprise deploying the agents has a threat model that accounts for how agents actually work.

Most agent deployments today are not under active attack. But the attacks that have occurred are not theoretical. They are documented, analyzed, and in some cases actively ongoing. This article covers the seven threat classes unique to AI agents, a layered defense architecture that addresses them, the OpenClaw CVE record as an instructive case study, and a prioritized five-step starting point for the enterprise security teams that need to deploy agents without creating an incident.

What Makes Agent Security Fundamentally Different

Traditional application security assumes three things that agent systems violate. First, that the application's code is static and auditable before deployment. Agent behavior is generated dynamically by a language model at runtime in response to inputs the developer did not write. Second, that network boundaries are defined and enforceable by infrastructure configuration. Agents routinely cross network boundaries as part of their intended function — reading web pages, calling APIs, connecting to databases, executing shell commands on remote hosts. Third, that authorization decisions are made by deterministic code with well-defined input and output. The agent's authorization model relies on consent gates evaluated by the same model that evaluates the content it is processing, which creates a fundamental conflict of interest.

This is not a warning to avoid deploying agents. It is an explanation of why the standard security playbook — patch management, vulnerability scanning, network segmentation, WAF rules — does not cover the agent-specific threat surface. Enterprises that try to secure agents with last decade's tools will miss the attacks that matter.

The Seven Threat Classes That Define the AI Agent Security Model

These seven threat classes are not theoretical. Each one has either a documented CVE, a verified incident report, or a confirmed research demonstration from the first half of 2026. They are listed roughly in order of likelihood, with prompt injection first because it is the most fundamental vulnerability and the hardest to eliminate.

1. Prompt Injection

What it is. Prompt injection is an attack in which text that the agent processes from an external source (an email, a web page, a document, an API response, a database record) contains instructions that override the agent's system prompt or intended behavior. The agent interprets the injected text as a legitimate directive and acts on it.

How it works. Unlike SQL injection, which exploits a parser ambiguity between code and data that can be fixed with parameterized queries, prompt injection exploits the fundamental architecture of language model-based agents. The agent maintains a single context window containing system instructions, user input, tool outputs, and external data, all represented as text. The model must decide which parts are instructions to follow and which are data to process. Current models make this decision based on position, formatting, and recency, not on any architectural guarantee of separation.

Real example. In February 2026, a Fortune 500 financial services firm deployed an AI email agent to triage customer support requests. The agent could read email, access the CRM, and send outbound messages. An attacker sent a support request containing 400 words of legitimate complaint text followed by instructions telling the agent to export the CRM contact database and email it to an external address. The agent did exactly that. The breach was detected 47 minutes later when a security analyst noticed unusual outbound email volume. Approximately 12,000 customer records were exfiltrated. The email security gateway did not detect anything anomalous because it scans for malware and phishing, not for instruction-based manipulation.

Why traditional security does not detect it. WAFs, email gateways, and API security tools assess content for malicious payloads (SQL injection syntax, malware signatures, phishing patterns). They have no visibility into how that content will be processed by an agent downstream. The content is not malicious by any traditional definition. It becomes malicious only when evaluated by the agent's instruction-following mechanism.

Severity. Critical. Prompt injection is the foundational vulnerability of agent security because it cannot be patched with a model update or a framework release. Every agent that reads external data is potentially injectable.

2. Tool Abuse

What it is. Tool abuse occurs when an attacker manipulates an agent into calling a tool in a way that the operator did not intend. The agent has access to tools (send email, execute shell command, query database, write file). The attacker's injected instructions direct the agent to use those tools against the operator's interests.

How it works. Tool abuse is usually a downstream consequence of prompt injection. The attacker does not need direct access to any tool. They only need to convince the agent to use the tools it already has access to in a way that benefits the attacker. The attack succeeds because the agent's tool policy (which tools it can call, under what conditions) is evaluated by the same model that has been subverted by the injected instructions.

Real example. An e-commerce platform deployed an AI customer service agent in March 2026. The agent could issue refunds up to $250 without human approval. An attacker placed a crafted entry in the customer notes field (a data field the agent read as part of its customer lookup) instructing the agent to apply maximum discount authority to all requests from that customer and bypass human escalation for orders under $500. The agent approved 14 refund requests totaling $3,480 over the next hour. The fraud was detected by a manual audit, not by the agent's authorization system.

Severity. Critical. Tool abuse can result in direct financial loss, data exfiltration, or system compromise depending on which tools the agent has access to.

3. Context Poisoning

What it is. Context poisoning is an attack in which an attacker writes malicious data to an agent's memory (vector database, conversation history, persistent configuration) that later influences the agent's behavior in a future session. Unlike prompt injection which is immediate, context poisoning is a delayed attack that persists across sessions.

How it works. The attacker identifies a data source that the agent reads and persists: a configuration file, a shared document repository, a chatbot conversation history, a vector database populated from processed documents. The attacker inserts a crafted payload into that data source. When the agent retrieves the stored data in a future session, the payload executes as injected instructions.

Real example. Legal document review agents that process contracts for multiple clients are particularly vulnerable to cross-client context poisoning. An attacker who submits a contract containing hidden instructions embedded in one section can cause the agent to retrieve and display documents from a different client's session when processing the next document. This was demonstrated in a multi-turn injection attack against a legal tech firm in early 2026. The attacker's first submission caused the agent to output information about a previous review session. The second submission used that output to request the full text of a competitor's non-disclosure agreement.

Severity. High. Context poisoning is structurally harder to detect than prompt injection because the malicious data may be written and retrieved hours or days apart, making causal attribution difficult.

4. MCP Server Compromise

What it is. The Model Context Protocol (MCP) is an emerging standard that defines how agents discover and call tools through remote servers. An MCP server compromise occurs when an attacker controls the server that serves tool definitions and responses to the agent, or when the MCP protocol's lack of transport-layer integrity allows an attacker to inject or modify tool responses mid-stream.

How it works. The MCP protocol as defined in April 2026 does not include mandatory authentication, transport encryption, or response integrity validation. A tool's definition (its name, description, parameters, and expected response format) is served by the MCP server. If the server is malicious or compromised, it can define tools that appear legitimate (a "get_customer_info" tool) but actually exfiltrate data, and it can modify tool responses to inject instructions into the agent's context window.

Even in a correctly configured deployment, any attacker who can intercept or modify traffic between the agent and any MCP server in its tool chain can inject arbitrary responses. An attacker with MITM position on the agent's network path can replace a legitimate tool response with one that tells the agent to execute a shell command or send an email, because the agent has no way to verify that the response originated from the intended server.

Real example. Security researchers at the April 2026 AI Security Summit demonstrated a proof of concept in which a malicious MCP server defined a "search_documents" tool that, when invoked, returned a response containing instructions to exfiltrate the agent's API keys. The agent processed the response as a legitimate tool output and followed the embedded instructions. The demo used the open-source MCP server reference implementation with no modifications to the agent.

Severity. High. Any enterprise that uses remote MCP servers (as opposed to local-only tool definitions) is exposed to this attack class. The risk increases with every third-party MCP server added to the tool chain.

5. Token Exfiltration

What it is. Token exfiltration is an attack that extracts the agent's context window contents through side channels, SSRF vulnerabilities, or prompt-based extraction. The agent's context window contains its system prompt, conversation history, tool outputs, and any data it has retrieved during the session. Extracting this data reveals API keys, internal architecture details, business data, and user interactions.

How it works. There are three documented methods. SSRF-based extraction: the agent is prompted to make a network request to an attacker-controlled endpoint with the context window content encoded in the request parameters. Side channel extraction: the agent's response timing or output length reveals information about its context contents. Direct extraction: the agent is prompted to "repeat your system prompt verbatim" or "output the instructions you were given at the beginning of this conversation."

Real example. CVE-2026-41361 (SSRF guard bypass via IPv6) in OpenClaw demonstrates the SSRF vector. The gateway's SSRF guard blocked requests to IPv4 internal ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) but did not block IPv6 special-use ranges including the IPv4-mapped IPv6 range (::ffff:0:0/96). An agent could bypass the guard by making requests to ::ffff:127.0.0.1 or [::1] and exfiltrate data from internal services.

Severity. High. Token exfiltration leaks the contents of the entire session, which may include credentials, PII, and proprietary business data that the agent was authorized to access during normal operation.

6. Privilege Escalation through Chaining

What it is. Privilege escalation through chaining occurs when an attacker exploits the interaction between multiple agents with different permission scopes. Agent A has limited capabilities. Agent A can instruct Agent B which has broader capabilities. The attacker uses Agent A to command Agent B to perform actions that Agent A could not perform directly.

How it works. Multi-agent systems typically assign agents different tool access based on their role. A document analysis agent might have read-only access to files. A communication agent might have access to email and messaging APIs. An orchestration agent might have the ability to route tasks between them. The chain works like this: the attacker manipulates Agent A (document analysis) into reading a sensitive file. Agent A cannot send email, so the attacker instructs Agent A to output the file content in its response. Agent A's response is read by the orchestrator and passed to Agent B (communication) for action. Agent B receives the file content and an instruction to "send this report to the compliance team" (the attacker's address). Agent B sends the email because it has permission to send email and the instruction appears to be a legitimate task from the orchestrator.

No single agent violates its permission scope. The attack exploits the fact that permission scopes are evaluated per-agent, not across the chain.

Severity. High. Most multi-agent orchestration frameworks do not have cross-agent permission validation. An attacker who compromises any agent in a chain can escalate to the highest-privilege agent in that chain.

7. Supply Chain Attacks on Agent Skills and Plugins

What it is. Supply chain attacks distribute malicious agent skills or plugins through the same channels that distribute legitimate ones. The skill, once installed, has access to the agent's tool set, memory, and configuration. This is the agent equivalent of the npm and PyPI malware problem, but with higher-stakes consequences.

How it works. An attacker publishes a skill to a plugin marketplace (CLAWHub, community forums, GitHub repositories) that appears to perform a useful function. The skill's SKILL.md file contains instructions that, when loaded by the agent, cause it to exfiltrate credentials, modify configuration files, install additional malicious skills, or open a backdoor connection. The skill may also include executable scripts (Python, Bash, JavaScript) that run on the host.

Real example. The March 2026 OpenClaw trojan horse campaign compromised approximately 28,000 nodes through this exact mechanism. Skills advertised on community forums and in distributed channels instructed agents to exfiltrate credentials under the guise of "license verification." TechRadar reported the scale. BleepingComputer confirmed the attack chain. The DFIR Report documented that the initial access vector was plugin installation, not CVE exploitation.

Additionally, CVE-2026-41295 documented a trust boundary violation in which one plugin could read another plugin's runtime memory because process isolation was not enforced. This means a malicious plugin in a multi-plugin deployment can observe all other plugin activity.

CVE-2026-41349 (consent bypass via config patch) is also relevant here. Even if the operator has configured consent gates for sensitive tools, a malicious skill that achieves any level of write access can push a config patch that disables consent checks entirely for its own operations.

Severity. Critical. The plugin supply chain is the highest-probability initial access vector for agent compromise, as demonstrated by the March 2026 campaign.

Defense Architecture

A layered defense for AI agents needs to cover both the traditional infrastructure layer (network, authentication, authorization) and the agent-specific layer (prompt handling, tool policy, memory, consent). The following model is organized from the input boundary inward.

Layer 1: Input Controls

Input controls sanitize the data that reaches the agent. This layer cannot fully prevent prompt injection (because removing all instructions from external content would break the agent's function), but it can raise the cost of exploitation.

Specific controls: Classify input sources by trust level. User chat input and web page content are untrusted. Internal database records and configuration files are trusted — but trust must be scoped to specific fields, not entire records. Apply input normalization that strips content formatted as explicit instruction overrides (repetitive formatting, all-caps directives, "# SYSTEM:" or "## IMPORTANT" markers that attempt to hijack context priority). Deploy a detection model as a proxy that flags inputs likely to contain injection patterns before they reach the agent.

Caveat: Input controls alone are not sufficient. A 2025 research paper demonstrated 47 mutation strategies that produce semantically identical injection payloads with zero string overlap. Signature-based detection will fail against determined attackers.

Layer 2: Tool Invocation Policy

The tool policy defines what tools the agent can call, under what conditions, and with what parameters. This is the most important single security control because it limits what damage prompt injection can cause.

Specific controls: Define a narrow allowlist of tools per agent. An agent that only reads documents does not need exec access or network write access. Implement parameter-level constraints: a "send email" tool should accept only email addresses from an approved domain list, not arbitrary addresses. Enforce rate limits per tool: an agent should not be able to call "send email" 50 times in one minute. Define tool categories that always require human approval (exec, write to critical databases, send to external addresses, modify agent configuration).

Critical implementation note: Tool policy must be enforced at the framework level, not in the model prompt. A prompt that says "never call exec without approval" is a suggestion, not a control. The framework must intercept tool invocations and apply policy before the tool executes. This is the difference between a security architecture and a configuration file.

Layer 3: Memory Isolation

Memory isolation segments the agent's stored context so that data from one user, session, or data source does not leak into another's working memory. This prevents context poisoning from propagating across sessions.

Specific controls: Scope vector database queries to the current session or user. An agent processing documents for Client A should not be able to retrieve vectors written by Client B. Implement session-scoped conversation history that is deleted when the session ends. Apply field-level access controls to persistent configuration: the agent's behavioral configuration (system prompt, tool policy) should be write-protected from the agent's own actions. Any instruction to modify configuration should be treated as a high-severity event that requires human confirmation.

The Rabbit Hole attack pattern (in which an attacker uses one interaction to plant a payload that activates in a later interaction) is defeated only by strict memory isolation. If the agent starts each session with a clean slate, the attacker cannot distribute an attack across multiple turns.

Layer 4: Network Egress Controls

Egress controls limit where the agent can send data over the network. This addresses data exfiltration through tool abuse and MCP server compromise.

Specific controls: Route all agent outbound traffic through a forward proxy that enforces domain allowlists. Block connections to unapproved external endpoints. If the agent needs to call APIs, create explicit API route definitions rather than giving the agent a generic HTTP tool. Monitor DNS query patterns from agent hosts for unexpected domains. For browser automation agents, use a dedicated sandboxed browser instance whose network access is distinct from the agent's control-plane network.

Caveat: Network egress controls work best for agents with narrow, well-defined network requirements. An agent whose job is to research arbitrary web content cannot have a domain allowlist. For those agents, egress controls must be combined with aggressive input validation and tool policy — the agent can navigate to any URL, but it cannot send data to any URL.

Layer 5: Audit Logging

Audit logging records every significant action the agent takes: every tool invocation, every change to configuration, every memory write, every authorization decision. These logs are the primary mechanism for detecting compromise after the fact and for understanding the attack chain during incident response.

Specific controls: Log every tool invocation with full parameters and response. Log every consent gate decision (approved, denied, or bypassed). Log every configuration change with the identity that authorized it. Log every memory write with the source that produced the data. Store logs in an append-only system that the agent cannot modify. Include agent audit logs in the SIEM pipeline with correlation rules for known attack patterns (unusual tool invocation sequences, unexpected configuration changes, high-volume outbound connections).

In regulated industries, agent audit logs may also serve as evidence for regulatory filings. Logs should be timestamped, immutable, and chain-of-custody tracked.

Layer 6: Scope Minimization

Scope minimization is the practice of giving each agent the minimum set of capabilities needed to perform its function. Every tool, every data source, every permission is a potential vector. The default should be no access, with access added one capability at a time after explicit justification.

Specific controls: Deploy purpose-specific agents instead of one general-purpose agent. A customer service agent does not need filesystem access. A document analysis agent does not need email access. A research agent does not need configuration write access. If you need cross-agent capabilities, design a routing layer that validates requests between agents rather than giving either agent the intersection of both scopes.

Layer 7: Human-in-the-Loop Gates

Human-in-the-loop gates require explicit human approval before the agent performs high-risk actions. This is the last line of defense and the most reliable one, but it has important limitations.

Specific controls: Require human approval for any tool invocation that can cause irreversible change: executing code, sending email to external addresses, modifying configuration, writing to production databases, calling external APIs with write access. The approval request should show the human the action being taken, the data involved, and the source of the instruction that triggered it.

Important limitation: Human gates protect against unauthorized actions, not against deceptively framed authorized actions. A human reviewing a mail tool invocation request sees "Agent wants to send email to compliance@audit-services[.]com with subject 'Monthly Compliance Report'." If the attacker has registered that domain and crafted the injection to look routine, the human will likely approve it. Human approval stops the attacker who tries to send data to "evil.com." It does not stop the attacker who creates an account at "audit-services.com" and social-engineers the approval process.

The OpenClaw CVE Record as a Case Study

The April 2026 OpenClaw CVE batch is the most complete public record of vulnerabilities in a production AI agent framework. Eight CVEs were disclosed and patched in the 2026-4-24 release. Together they reveal where real vulnerabilities cluster in agent systems and what the attack chain from initial access to full compromise looks like.

The CVE List

CVE-2026-41342 (Remote onboarding authentication bypass, CVSS v4 score 9.8). An unauthenticated attacker can complete the gateway handshake without a valid bootstrap token by crafting a WebSocket upgrade request that skips token validation. This provides initial access to the gateway for a remote attacker.

CVE-2026-41349 (Agentic consent bypass via configuration patch, CVSS v4 score 9.1). An attacker with operator-level gateway access can push a config patch that disables consent checks for specific plugins, bypassing the user authorization gate.

CVE-2026-41352 (Node scope gate remote code execution, CVSS v4 score 9.8). An argument injection vulnerability in the scope gate allows escape from the command whitelist, enabling arbitrary shell command execution on the node host.

CVE-2026-41353 (Access control bypass via allowProfiles, CVSS v4 score 8.2). Profile-based access restrictions are enforced only at the UI layer, not at the API middleware. Direct API calls bypass access controls.

CVE-2026-41355 (Mirror mode sandbox code execution, CVSS v4 score 9.0). The debugging/mirror mode interface allows code injection into the target agent's sandbox, bypassing normal code submission channels and consent gates.

CVE-2026-41356 (WebSocket session token rotation failure, CVSS v4 score 7.5). Session tokens are never rotated after initial authentication, so a captured token remains valid indefinitely.

CVE-2026-41359 (Privilege escalation via Telegram integration, CVSS v4 score 8.5). The Telegram bot does not validate user roles for admin commands, allowing an operator-level user to escalate to admin by issuing a specific command sequence.

CVE-2026-41361 (SSRF guard bypass via IPv6, CVSS v4 score 7.5). The SSRF guard blocks IPv4 internal ranges but does not check IPv6 special-use ranges. An agent can bypass the guard using ::ffff:127.0.0.1 or [::1].

What the Cluster Reveals

Three patterns emerge from this batch.

First, the vulnerabilities cluster at integration boundaries. Every CVE is in a component that connects two systems: the gateway-to-node handshake (CVE-2026-41342), the consent-to-plugin interface (CVE-2026-41349), the agent-to-shell scope gate (CVE-2026-41352), the UI-to-API layer (CVE-2026-41353), the debugging interface to agent runtime (CVE-2026-41355), and the agent-to-external service boundaries (CVE-2026-41359, CVE-2026-41361). The core agent reasoning loop (model serving, prompt processing, response generation) had zero vulnerabilities. The attack surface is not in the AI. It is in the infrastructure surrounding the AI.

Second, the vulnerabilities chain together. CVE-2026-41342 gives initial access. CVE-2026-41352 gives code execution on a node from that access. CVE-2026-41349 removes consent gates for the attacker's payload. CVE-2026-41353 ensures API-level access control does not block them. CVE-2026-41361 ensures egress controls do not prevent data exfiltration. A single missing patch anywhere in this chain is enough to enable a full compromise.

Third, the vulnerabilities affect versions that were in widespread production use. OpenClaw has over 3 million active installs as of April 2026. These are not edge-case findings in experimental builds. They were discovered in the codebase that thousands of organizations rely on for agent deployment.

What This Means for Enterprise Security

The OpenClaw CVE cluster demonstrates that agent framework security is currently in the same phase that web application security was in the early 2000s: the vulnerabilities are known, the fixes are available, but most deployments are running unpatched versions. The threat is not that the framework is inherently insecure. The threat is that the ecosystem is moving faster than the patching cadence of the organizations deploying it.

Enterprise teams evaluating agent frameworks should assess not just the current security posture of the framework, but the maturity of its vulnerability disclosure process, its patch release cadence, and its track record of fixing reported issues. A framework that has not had a single CVE is not necessarily more secure. It may simply not have been audited.

What Enterprise Security Teams Should Do First

This is a five-step starting point, not a comprehensive checklist. The goal is to close the highest-risk gaps before they are exploited.

Step 1: Audit Your Agent Tool Permissions Today

Go through every agent you have deployed and write down exactly which tools it has access to. If you cannot answer the question "what can this agent do?" in under five minutes per agent, your visibility is inadequate. Revoke any tool that is not essential to the agent's function. An agent that was deployed with "exec" access for a one-time setup task should not still have that access six weeks later.

Step 2: Implement Tool-Level Policy Enforcement

Do not rely on the agent's system prompt to enforce security boundaries. The prompt may say "do not call exec without approval," but that is a suggestion, not a control. Deploy policy enforcement at the framework layer: intercept every tool invocation, validate it against an allowlist with parameter constraints, and block or flag anything outside that allowlist. This is the single highest-impact control you can deploy.

Step 3: Segment Agent Memory

If your agents share a vector database, a configuration store, or a conversation history repository, you have a cross-session context poisoning vulnerability. Segment these stores by user, session, or data classification level. Delete session-scoped data when the session ends. Treat persistent configuration as a write-protected resource.

Step 4: Establish a Plugin Vetting Process

If your agents can install plugins, skills, or extensions from any source outside your direct control, define a vetting process now. The process should include: inspection of the skill's SKILL.md for credential exfiltration instructions, verification that tool permission requests match the skill's stated function, review of any bundled scripts, and checks against known malicious endpoint domains. Do not install plugins from community forums without vetting them first. The March 2026 campaign compromised 28,000 nodes through exactly this gap.

Step 5: Patch Your Agent Framework and Connected Nodes

Check what version of your agent framework is running. If it is older than the April 2026 security releases, patch it. Then patch every node connected to it. A patched gateway with unpatched nodes is still a vulnerable deployment. Establish a recurring patch cycle at the same cadence as your browser and OS patching.

What to Watch

Three developments will shape how AI agent security evolves over the rest of 2026.

Standards and frameworks. The OWASP Top 10 for LLM Applications v3.0, expected in Q2 2026, will expand its agent-specific threat coverage beyond prompt injection. CISA published agent security guidelines in April 2026. The MCP specification is likely to add mandatory authentication and response integrity validation in an upcoming revision. Enterprise teams should track these developments and align their internal policies with the emerging standards.

Vendor security posture. The quality of a framework's vulnerability disclosure process will become a competitive differentiator. Teams evaluating agent platforms should ask: does the vendor have a published vulnerability disclosure policy? How fast do they ship security patches? Are CVEs assigned and disclosed publicly, or are patches shipped silently? Do they publish changelogs that distinguish security fixes from feature work? a vendor to be cautious about.

GDPR and AI liability. The European Union's AI Act is now in its enforcement phase, with significant penalties for inadequate AI governance. An agent compromise that results in data exfiltration or unauthorized decision-making creates liability not just under breach notification laws but under the AI Act's requirements for transparency, risk management, and human oversight. Enterprise teams deploying agents in the EU or serving EU residents should ensure their agent security model satisfies the AI Act's obligations.

Sources

OpenClaw 2026.4.24 GitHub Release Notes. https://github.com/openclaw/openclaw/releases/tag/v2026.4.24
"The AI Agent Security Threat Landscape: From OpenClaw CVEs to Bissa Scanner Exploitation." Red Rook AI, April 2026. https://redrook.ai/ai-agent-security-threats-2026/
"OpenClaw Skill Vetting Security 2026: How to Vet Third-Party Skills Before You Install." Red Rook AI, April 2026. https://redrook.ai/openclaw-skill-vetting-security-2026/
"OpenClaw trojan horse agent campaign compromises 28,000 nodes." TechRadar, March 2026.
"Trojan horse OpenClaw agents found on npm, 28,000 systems affected." BleepingComputer, March 2026.
"CVE-2026-41349: Agentic consent bypass via configuration patch." NVD.
OWASP Top 10 for LLM Applications v3.0 (draft, Q2 2026).
CISA Agent Security Guidelines, April 2026.
ETH Zurich, "Inspecting the Gap between Instructions and Data in LLM Context Windows," 2024.

Related Reading

AI Agent Security: The Threat Model Every Enterprise Needs Before Deploying Agents

What Makes Agent Security Fundamentally Different

The Seven Threat Classes That Define the AI Agent Security Model

1. Prompt Injection

2. Tool Abuse

3. Context Poisoning

4. MCP Server Compromise

5. Token Exfiltration

6. Privilege Escalation through Chaining

7. Supply Chain Attacks on Agent Skills and Plugins

Defense Architecture

Layer 1: Input Controls

Layer 2: Tool Invocation Policy

Layer 3: Memory Isolation

Layer 4: Network Egress Controls

Layer 5: Audit Logging

Layer 6: Scope Minimization

Layer 7: Human-in-the-Loop Gates

The OpenClaw CVE Record as a Case Study

The CVE List

What the Cluster Reveals

What This Means for Enterprise Security

What Enterprise Security Teams Should Do First

Step 1: Audit Your Agent Tool Permissions Today

Step 2: Implement Tool-Level Policy Enforcement

Step 3: Segment Agent Memory

Step 4: Establish a Plugin Vetting Process

Step 5: Patch Your Agent Framework and Connected Nodes

What to Watch

Sources

Proactive Collection — Superpowers: Agentic Skills Framework & Software Development Methodology for Coding Agents

Salesforce Agent Fabric vs. OpenClaw: Which Enterprise Agent Platform Wins?

Proactive Collection — Hermes Agent Emerges as OpenClaw Challenger

OpenClaw Slack Integration: Connect Your Agent to Your Workspace

AWS Bedrock AgentCore: What Amazon’s Managed Agent Harness Means for Enterprise AI

OpenClaw Compaction: How to Stop Your Agent from Losing Context

What Makes Agent Security Fundamentally Different

The Seven Threat Classes That Define the AI Agent Security Model

1. Prompt Injection

2. Tool Abuse

3. Context Poisoning

4. MCP Server Compromise

5. Token Exfiltration

6. Privilege Escalation through Chaining

7. Supply Chain Attacks on Agent Skills and Plugins

Defense Architecture

Layer 1: Input Controls

Layer 2: Tool Invocation Policy

Layer 3: Memory Isolation

Layer 4: Network Egress Controls

Layer 5: Audit Logging

Layer 6: Scope Minimization

Layer 7: Human-in-the-Loop Gates

The OpenClaw CVE Record as a Case Study

The CVE List

What the Cluster Reveals

What This Means for Enterprise Security

What Enterprise Security Teams Should Do First

Step 1: Audit Your Agent Tool Permissions Today

Step 2: Implement Tool-Level Policy Enforcement

Step 3: Segment Agent Memory

Step 4: Establish a Plugin Vetting Process

Step 5: Patch Your Agent Framework and Connected Nodes

What to Watch

Sources

Similar Posts