AI Agents: Security Compromised by the Addition of Tools and Memory

⚡

Key Takeaways

1A survey reveals that 88% of organizations experienced security incidents related to AI agents in 2026.

2Only 14.4% of agent-based systems are deployed with full security and IT approval.

3AI agents expose four distinct attack surfaces, requiring specific threat models.

💡Why it matters — The rapid adoption of AI agents without adequate security measures exposes businesses to increased risks of cyberattacks.

The Evolution of the Threat Model for AI Agents

Traditionally, AI security focused on the model itself: its responses, refusals, and management of malicious prompts. This framework was suitable for an AI operating as a text interface, where the user sends a message and the AI responds. The attack surface was then limited and well-defined.

However, the introduction of AI agents has radically changed the game. These agents do not merely generate text; they plan, use tools, store information between sessions, and often collaborate with other agents to accomplish complex tasks. This significantly alters the risk model.

The numbers confirm that this is no longer a theoretical concern. According to the 2026 State of AI Agent Security report by Gravitee, based on a survey of over 900 executives and practitioners, 88% of organizations reported security incidents related to AI agents in the past year. Furthermore, only 14.4% of agent systems were deployed with full IT security and approval.

This pattern extends across the industry. A 2026 report from Apono revealed that 98% of cybersecurity leaders report friction between the acceleration of AI agent adoption and compliance with security requirements, leading to slowdowns or constrained deployments. This gap between deployment speed and security readiness is precisely where incidents occur.

The Four Attack Surfaces of AI Agents

An autonomous LLM presents a single attack surface: the prompt. In contrast, an AI agent exposes four distinct surfaces:

The prompt surface: This concerns the reading of external inputs.
The tool surface: This involves executing actions in the background.
The memory surface: This pertains to remembering past sessions.
The planning loop surface: This involves deciding the next steps.

Each surface presents specific attack patterns, and defenses designed for one do not apply to the others.

Examples of Attacks and Vulnerabilities

In 2025, Pomerium reported an incident where an AI support agent executed a hidden SQL payload, leaking database secrets in a public ticket. This example illustrates how adding tools, memory, and autonomous planning to an LLM creates distinct attack surfaces requiring entirely new threat models.

The Prompt Surface

User input is often clean, but the vulnerability lies in everything the agent consumes in the background. When an agent retrieves a webpage or document, these inputs arrive without a trust boundary. Attackers do not compromise the user interface; they plant payloads where the agent will eventually look. This is known as indirect prompt injection.

The Tool Surface

Every tool an agent can call represents a permission boundary, making it a prime target for exploitation. The main attack is parameter injection: manipulating the agent to pass attacker-controlled values to tools that trigger real-world consequences, such as database writes or signed API requests.

The Memory Surface

Imagine a shared whiteboard in an office that is relied upon for daily decision-making. If a stranger silently rewrites an entry overnight, the entire team's output changes based on corrupted data. Persistent memory in an autonomous agent works exactly the same way.

The Planning Loop

A GPS powered by false mapping data always gives accurate directions. The routing logic works perfectly, but the destination is wrong. The driver has no idea they are arriving somewhere they never intended to go.

Defense Strategies

To secure AI agents, several strategies are recommended:

Boundary sanitization: Treat all external data as untrusted at every retrieval point.
Instruction separation: Use structured formats to isolate system prompts from retrieved content.
Pre-execution filtering: Scan for exfiltration patterns before any tool is executed.

These measures aim to secure the agent's data ingestion. However, once the agent takes action, the attack may shift to the tool surface.

The Trade-off Between Security and Autonomy

Every security measure applied to the prompt, tool, memory, and planning surfaces comes at a cost. Ignoring these trade-offs can create a false sense of security. For example, restricting the tool environment limits the agent's capabilities but also reduces risks. Similarly, adding human controls on irreversible actions prevents unauthorized writes but may introduce latency.

The optimal security tuning for a deployment depends on three factors:

Capability profile: Controls should be proportional to the agent's capabilities.
Task environment: Agents operating in critical environments require stricter security measures.
Blast radius: Decisions should be based on the worst possible outcome of an exploitation.