GUIDE · CONCEPT 11 min · PUB MARCH 27, 2026

Five attack patterns on AI coding agentsThe mechanism, the incident, the fix that works.

AI coding agents have introduced five distinct attack surfaces in under two years. Each has a named incident and a durable fix. Most teams are protecting against the wrong one.

ForSecurity engineers and developers who want a clear taxonomy of attacks on AI coding agents

HASP CONCEPT · FLOW

01
Source
State file on disk

env vars captured by agent, shipped in tarball
1-in-13 npm packages
02
Vector
Injected tool output

attacker payload in MCP data reaches model context
no signing required
03
Outcome
Credential harvested

log stream or context contains live secrets
scraped within minutes

TL;DR· the answer, in twenty seconds

What: Five distinct attack patterns target AI coding agents: state-file leaks into publish pipelines, prompt injection via MCP-fetched data, command injection via project config files, credential harvesting from logs, and agents destroying production resources due to stale context.

Fix: Each pattern has a specific countermeasure. Start with ignore-list hygiene and a prepublishOnly guard, then add output redaction on all tool responses before they reach the model context, then scope credentials to the minimum necessary for the task.

Lesson: The agent is not the trust boundary. The data flowing into the agent context window is. Treat every external source the agent reads as adversarial input.

The first generation of AI coding agent incidents follows a recognizable pattern: a new capability ships, a researcher thinks about what happens when you point it at untrusted data, and a named CVE or a disclosure blog post appears six weeks later. We are now far enough into that cycle to have five distinct attack categories, each with at least one real incident attached.

This is a taxonomy. Each pattern has a mechanism, a named example, a fix that actually works, and a fix that sounds right but fails in practice. The goal is to let you look at your agent setup and answer: "Which of these am I exposed to right now?"

The order is roughly by how long teams wait before addressing them. Pattern 1 gets patched in week one after a disclosure. Pattern 5 rarely gets addressed until something expensive breaks.

What to know in 60 seconds

State files written by your agent are a publish target unless you add them to your ignore list.
Any text the agent reads from an external source can contain attacker instructions; treat it as code, not data.
Claude Code project files executed shell commands before Check Point's February 2026 disclosure (CVE-2025-59536). Some patch paths missed older config files.
console.log(process.env) in a CI pipeline is a credential leak. So is a debug step that cats an env file. Logs have 90-day retention windows.
Agents with write access to production resources will use that access. Scope accordingly.

Pattern 1: State-file leak via package publish

The mechanism is simple. Claude Code writes a settings.local.json into .claude/ on every session. The file holds accepted permissions, model preferences, and, before Anthropic's late-February 2026 patch, any environment variable the agent's child process saw. Developers commit the .claude/ directory without checking its contents, then run npm publish. The tarball ships to the registry. The registry mirrors to CDNs. Bots scrape new releases within minutes.

Knostic disclosed this in February 2026 and found the file in roughly 1 in 13 npm packages they scanned. GitGuardian's 2026 State of Secrets Sprawl report shows AI-assisted commits leak secrets at about twice the rate of the baseline.

The fix that works: add .claude/ to .gitignore, .npmignore, MANIFEST.in, and .dockerignore. Add a prepublishOnly script that fails the build if the file exists:

{
  "scripts": {
    "prepublishOnly": "test ! -e .claude/settings.local.json && test ! -e settings.local.json"
  }
}

Run this check in CI, not just a pre-commit hook. Pre-commit accepts --no-verify. CI does not.

The fix that sounds right but fails: rotating the credential after you delete the file. Deletion from a later commit does not remove the value from git history. The npm tarball is immutable. If the secret was in a published package, rotate it before doing anything else. Then delete the file. Then add the ignore entries.

Knostic found the equivalent pattern three weeks earlier in Cursor's .cursor/ directory. This is not a Claude Code bug. It is what happens when a tool writes environment-aware state to a path inside the repository tree.

Pattern 2: Prompt injection via MCP-fetched data

Model Context Protocol servers give agents access to external data sources: GitHub issues, Jira tickets, Confluence pages, Slack threads. Each fetch is a potential injection site.

The mechanism: an attacker controls the content of an external resource the agent will read. They embed an instruction in that content. The agent reads the resource during a tool call, the instruction enters the context window alongside legitimate data, and the agent follows it.

Snyk researchers demonstrated a practical version of this in early 2026, targeting MCP-connected GitHub repositories. The attack worked by embedding instructions in a repository's README or issue body that the agent fetched during a routine task. The instructions redirected the agent to exfiltrate tokens or modify files outside the original task scope.

OX Security's early 2026 report noted roughly 7,000 MCP servers in the wild with around 150 million downloads and no signature requirement for server definitions. Any of those servers that fetches user-controlled content is a potential injection channel.

The fix that works: redact secrets out of all tool output before it enters the model context. Run every MCP response through a content filter that strips known secret patterns before the string reaches the system prompt or user turn. Treat the full tool output as adversarial input, even from servers you control, because the data those servers fetch may not be.

The fix that sounds right but fails: reviewing the MCP server code once and marking it safe. The server code is not the attack surface. The data the server fetches is. A perfectly-written MCP server for GitHub issues is still vulnerable if a GitHub issue contains an injection payload.

Pattern 3: Command injection via project config

Check Point published CVE-2025-59536 in early February 2026. Claude Code's project configuration format allowed specifying shell commands that the agent would run as part of setup. A malicious or compromised .claude/ directory in a repository could trigger arbitrary command execution when a developer opened the project.

The mechanism: a developer clones a repository, opens it in Claude Code, and the agent reads the project config and executes the commands it specifies. If the repository has been tampered with or if the developer cloned a typosquatted repository, the commands run in their environment. The config file reads like preferences. The contents run like a Makefile.

This is not unique to Claude Code. Any agent that reads project-local configuration and executes commands from it has the same shape. The question is what validation happens between reading the config and running the command.

The fix that works: validate config structure and restrict exec paths before running anything. Anthropic's patch addressed the immediate issue, but the durable posture is to run agents in a sandboxed environment where arbitrary shell execution hits a deny-by-default policy. A container or VM with no access to credentials outside the project tree limits the blast radius when config validation misses something.

The fix that sounds right but fails: reviewing project configs manually before cloning. Manual review does not scale to the volume of repos developers interact with, and it does not catch the case where a legitimate repo gets compromised after an initial safe review.

For teams running agents in CI, harden the runner's permissions. An agent with access to repo secrets and the ability to run arbitrary shell commands from a project config file is a supply-chain insertion point.

Pattern 4: Credential harvesting from logs

This pattern predates AI coding agents by fifteen years. Agents make it worse in two ways: they write more logs, and the context windows they operate in contain live credentials.

The mechanism: a credential appears in plaintext in a log line. A CI pipeline step runs cat .env to debug a failing build. A developer adds console.log(process.env) to chase an environment variable mismatch and the line ships in a production build. An agent's debug output includes the raw HTTP request it constructed, including the Authorization: Bearer header. The log stream goes to a retention service with 90-day storage and access control that was set up in 2019 and never reviewed.

GitGuardian's 2026 State of Secrets Sprawl report puts AI-service token leaks at 81% year-over-year growth. AI-assisted commits are a contributing factor, but log exfiltration runs in parallel.

The fix that works: a redaction layer in your CI step library that scrubs known patterns from log output before retention. GitHub Actions supports secret masking natively for secrets registered in the repository settings. For dynamic secrets or secrets from external sources, add an explicit redaction step. In application code, a structured logger with a field filter for known secret key names is more reliable than reviewing every log call manually.

The fix that sounds right but fails: telling developers not to log secrets. That guidance fails at the first debugging session under deadline. The control needs to be in the pipeline, not in the developer's memory.

Watch specifically for agent session transcripts. When a model's context window includes a secret and the agent logs that context for debugging, the log contains the secret. Some agent frameworks log full context by default. Check yours.

Pattern 5: Agent acting on stale or wrong context

This is the most expensive pattern and the least discussed in security circles.

The mechanism: an agent performs a destructive action based on data that is wrong, stale, or misunderstood. The agent had permission to perform the action. The action was not malicious. The outcome was catastrophic.

Two named incidents illustrate the shape. In mid-2024, a Replit user's agent deleted a production database. The agent was operating in a context where production and development environment boundaries were ambiguous, the credentials it held gave write access to both, and the instruction it was following was reasonable in the development context but irreversible in production. A second incident from 2024 produced an $82,000 Google Cloud bill when an agent with broad infrastructure permissions acted on an instruction that made sense in isolation but not in the context of the existing resource state.

Neither incident involved an attacker. Both involved scoped access granted too broadly and destructive operations with no confirmation gate.

The fix that works: scoped credentials for every task. An agent running a database migration gets credentials for that database, with write access to the schema table, for the duration of the migration. Not credentials for the account. Not for production and staging. Not indefinitely. Pair scoped credentials with dry-run defaults for destructive operations: the agent plans the operation and outputs what it would do, a human confirms, then execution proceeds.

The fix that sounds right but fails: reviewing the agent's output before it acts. Review fails when the agent is fast, the review window is seconds, and the person reviewing does not have full context on what the operation changes downstream. The gate needs to be structural, not attentional.

For cloud infrastructure agents, resource-level IAM policies are the control. For database agents, a read-only connection string is correct by default and write access is explicit and time-bounded. For file system agents, chroot or container-level boundaries keep writes inside the intended tree.

Prompt injection gets the conference talks; credential harvesting funds the campaigns

The security community has spent the most airtime on prompt injection. It is a genuinely novel attack class with interesting properties: the input is natural language, the boundary between instruction and data is semantic rather than syntactic, and standard input validation tools do not apply.

But if you look at the actual incident record through early 2026, the high-impact incidents are Patterns 1, 4, and 5: state file leaks shipping credentials to public registries, log lines containing live tokens, and agents with overly-broad access making expensive mistakes. These are old attack classes in new wrapping. Credential harvesting has been funding operations since before LLMs were a product category.

One commenter on the Knostic disclosure thread framed it clearly: "prompt injection gets attention because it's novel, but stolen credentials are a classic attack with way higher impact."

That framing is right. Pattern 2 is worth taking seriously, and the MCP ecosystem's lack of signing makes it more acute. But a team that has locked down prompt injection and still exports STRIPE_KEY in their shell for every Claude Code session has its priorities inverted.

The practical implication: allocate remediation effort in proportion to impact, not novelty. Patterns 1, 3, and 4 have straightforward mechanical fixes that take days to deploy. Pattern 5 requires IAM and process changes that take weeks but are more likely to prevent a five-figure bill.

Checklist you can paste into a PR

## AI coding agent security audit

State-file leak (Pattern 1)
- [ ] .claude/, .cursor/, .aider/ in .gitignore
- [ ] Same paths in .npmignore / MANIFEST.in / .dockerignore
- [ ] git log --all -- '.claude/*' returns nothing
- [ ] prepublishOnly / CI step fails build if state files exist
- [ ] Confirmed no cat .claude/* calls in CI logs

Prompt injection (Pattern 2)
- [ ] MCP tool responses pass through a content filter before reaching model context
- [ ] Known secret patterns stripped from all external data fetched by agent
- [ ] Agent tasks reviewed for which external sources they read

Command injection (Pattern 3)
- [ ] Agents run in sandboxed environment (container/VM) with exec allow-list
- [ ] Project config files validated against schema before execution
- [ ] Agent runner has no access to credentials outside project scope

Credential harvesting (Pattern 4)
- [ ] CI log redaction enabled for registered secrets
- [ ] No console.log(process.env) or cat .env in application or CI code
- [ ] Structured logger with field-level filters for secret key names
- [ ] Agent session transcripts checked for credential-containing context

Wrong-context destruction (Pattern 5)
- [ ] Agent credentials scoped to minimum required resource + operation + time
- [ ] Dry-run default on any destructive operation (delete, drop, rm, terraform destroy)
- [ ] Explicit confirmation gate before irreversible actions
- [ ] Production and development credentials are distinct and never aliased

What this means for your stack

The five patterns share a common root: the agent reads more of your environment than the task requires, holds that data longer than the task lasts, and has access to more resources than the operation needs. None of these are properties of a specific vendor or model. They are properties of how most agent setups are configured today.

The architectural fix is a runtime broker that sits between your secrets and your agent sessions. The broker holds credentials in an encrypted local vault. Agents request access at the call site and receive a process-scoped injection for the duration of one execution. The broker redacts known secret patterns from streaming output before they reach the log stream. An HMAC-chained audit log records every grant. The agent's state files and context window contain references, not values.

hasp is one working implementation. curl -fsSL https://gethasp.com/install.sh | sh, hasp setup, connect a project, hand the next session a reference instead of a key. Source-available (FCL-1.0), local-first, macOS and Linux, no account.

The five patterns above are not going away. Each new agent capability adds a new surface. The durable control is not a per-pattern patch list. It is a runtime model where the sensitive values are never in the places the agent can write to, log, or expose.

Sources· cited above, in one place

NEXT STEP~90 seconds

Stop handing the agent your real keys.

hasp keeps secrets in one local encrypted vault, brokers them into the child process at exec, and never lets the agent read the value.

Local, encrypted vault — no account, no cloud, no telemetry by default.
Brokered run — agent gets a reference, the child process gets the value.
Pre-commit + pre-push hooks catch managed values before they ship.
Append-only HMAC audit log answers "did the agent touch the prod token?" in seconds.

Install hasp Read the docs View on GitHub

→ okvault unlocked · binding ./api

→ okgrant once · pid 88421

→ okagent never read

macOS & Linux. Source-available (FCL-1.0, converts to Apache 2.0). No account.