GUIDE · HOW-TO 9 min ·

Put your coding agent behind an egress allowlistThe fix that survives a prompt injection

A prompt injection that reads your secrets is harmless until the agent can send them somewhere. An egress allowlist makes the agent's process talk to an approved set of hosts and refuse the rest, so the exfiltration step fails even when the injection succeeds.

TL;DR· the answer, in twenty seconds

What happens: A prompt injection reads a secret from your repo or environment, then uses one of the agent's tools to ship it to a host the attacker controls. The read is hard to stop. The outbound call is not.

The minimum fix: Run the agent's process with deny-by-default egress and an allowlist of the hosts it needs. Force traffic through a proxy that matches canonicalized hostnames, block DNS exfiltration and the cloud metadata endpoint, and test that a non-allowlisted host is refused.

The lesson: You will not win prompt injection at the prompt. The network is the layer the model cannot talk its way past, so put the trust boundary there.

A prompt injection only pays off if the stolen data can leave your machine. The injection reads your .env, your cloud token, your source tree, and none of it matters to the attacker until the agent makes the outbound call that ships it somewhere they control. That call is a network request, and the network is the one layer you can lock down without trusting the model to behave.

On June 19, @sundi133 framed it as the cheapest control you can ship this week: "an egress allowlist. If your agent can only call approved URLs and tools, exfiltration attacks like EchoLeak break even when the injection lands." That is the right read. You will not win the prompt-injection arms race at the prompt, where every new model and every new tool reopens the attack surface. You can win it at the socket, where the rules do not depend on what the agent was convinced to do.

The exfiltration step is a network call

Start from the attacker's chain, because the allowlist targets one specific link in it. Aonan Guan's Comment and Control disclosure walked the full sequence across Claude Code, Gemini CLI, and Copilot: untrusted input tells the agent to read a credential, and the same agent holds a tool that writes data out, so the value gets laundered to an endpoint the payload named. Tenet Security's agentjacking writeup showed the same pattern starting from a fake Sentry error that turns into code execution. Different entry points, same final move.

That final move is always a network egress. The agent's tools are the write paths: curl, a package install that hits a registry, a git push to a remote, an MCP server that calls a webhook, a "helpful" POST to the error-reporting endpoint the injection supplied. Every one of them resolves a hostname and opens a socket. Block the sockets you did not approve and the laundering step has nowhere to land.

This is why output redaction and credential brokering, both worth doing, do not finish the job by themselves. Redacting agent output scrubs what the model prints. Brokering the credential keeps the plaintext out of the agent's hands. The egress allowlist covers the case where neither helped: the secret is already in the agent's reach and the agent is about to send it. It assumes the injection won and stops the payoff anyway. That is the layer this guide builds.

Deny by default, then allow what the agent calls

The whole idea is one sentence. The agent's process can reach an approved set of hosts and nothing else. Everything follows from making "nothing else" the default instead of an exception you remember to add.

The reference implementation ships in Anthropic's own Claude Code dev container. Its init-firewall step flushes the rules, allows DNS and loopback, allows the handful of hosts Claude Code needs, then sets the default OUTPUT policy to drop. The shape, trimmed to the parts that matter:

# default-deny egress, allow only what the agent needs
# (shape of the Claude Code devcontainer init-firewall step)
iptables -F
iptables -P INPUT   DROP
iptables -P FORWARD DROP
iptables -P OUTPUT  DROP

# loopback and replies to connections we opened
iptables -A INPUT  -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT
iptables -A INPUT  -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT

# DNS, so hostnames still resolve
iptables -A OUTPUT -p udp --dport 53 -j ACCEPT
iptables -A OUTPUT -p tcp --dport 53 -j ACCEPT

# allow one approved host (resolve to current IPs first)
for ip in $(dig +short api.anthropic.com); do
  iptables -A OUTPUT -p tcp -d "$ip" --dport 443 -j ACCEPT
done

An IP-layer firewall like this is coarse, and the coarseness is the catch. Allowing api.anthropic.com by resolved IP allows whatever else shares that address behind the same CDN, and the resolved set drifts as the provider rotates IPs, so the rule you wrote on Monday leaks or breaks by Friday. It works as a backstop. It is not the part of the allowlist you reason about.

Scope the rules to the agent, not your whole machine. A global firewall fights you every time you push from the same box, so run the agent in its own container, network namespace, or user account, and apply the deny-default policy there. The blast radius you care about is the agent's process tree, and that is the boundary the rules should bound. This also keeps the policy honest: when the agent's network is a separate, named thing, an allowlist change is a reviewable diff instead of a one-off iptables command someone ran on a server and forgot.

Anthropic also shipped a higher-level sandbox feature in Claude Code that wraps the same idea: filesystem and network isolation with a configurable allowlist, so the agent runs with deny-by-default egress without you hand-writing iptables on every box. If you run Claude Code, turn that on first and treat the rest of this guide as what to verify and what to add around it.

Build the allowlist from what the agent calls

A deny-by-default rule that breaks every build is a rule your team turns off by Thursday. The allowlist has to match real traffic, so build it from observation, not a guess.

Run the agent for a day behind a logging proxy that allows everything and records every host it reaches. INNOQ documented this approach for coding agents in its proxy-allowlist writeup: put a forward proxy in front of the agent, watch what it needs, then flip the proxy from log-only to enforce. The allowlist that falls out is the set of hosts that showed up for legitimate work, which for most stacks is a short list:

  • your model vendor's API (api.anthropic.com, api.openai.com)
  • the package registries you pull from (registry.npmjs.org, pypi.org, files.pythonhosted.org)
  • your source host (github.com, api.github.com, codeload.github.com)
  • internal services the agent has a real reason to call

The enforcing layer should match on hostname, not IP, because that is the identity you care about and it survives IP rotation. A layer-7 forward proxy with an allowlist file is the common shape:

# egress policy: hostnames the agent may reach over TLS.
# everything not listed is refused at the proxy.
allow:
  - api.anthropic.com
  - registry.npmjs.org
  - pypi.org
  - files.pythonhosted.org
  - github.com
  - api.github.com
  - codeload.github.com
default: deny

Point the agent's traffic at the proxy with HTTP_PROXY and HTTPS_PROXY, then block direct egress at the firewall so the proxy is the only route out. NVIDIA's sandboxing guidance makes the same point in vendor-neutral terms: combine process isolation with a network policy, and assume the workload inside is hostile. The proxy gives you hostname-level rules and a clean audit log of every host the agent tried; the firewall guarantees the proxy cannot be skipped.

Two cases complicate the short list. MCP servers are egress too: each one the agent loads can open its own outbound connections, so an allowlist that covers curl and git but ignores a Slack or web-fetch MCP server has a gap the size of that server's network reach. Route MCP traffic through the same proxy and add only the hosts each server documents. The harder case is the agent that has a real reason to browse the open web, where a fixed host list does not fit. There the control moves up a layer: pin the agent to a single fetch tool you own, log every URL it pulls, and keep write-capable endpoints on a far shorter list than read-only ones. An agent reading a page the attacker controls is a smaller problem than one POSTing your secrets back to it, so allow the read and refuse the write.

The bypasses your allowlist has to survive

An allowlist that any injection can step around is theater. The published bypasses cluster into four shapes, and each has a fix you apply at build time, not after an incident.

DNS is its own exfiltration channel. You allowed port 53 so hostnames resolve. An injection can encode stolen bytes into the subdomain labels of a domain the attacker owns and let your resolver carry them out, no HTTP request required. The fix is to send DNS through a resolver you control that answers only for allowlisted names, and drop outbound 53 to everything else. A coding agent has no reason to resolve arbitrary domains.

Hostname matching is the next trap. Allow github.com and a naive matcher waves through github.com.evil.com, an IP literal, a trailing-dot github.com., or an uppercased host. The proxy has to canonicalize the host and match the full name, not a substring. This class of mistake is not hypothetical: in March, researchers walked around Claude Code's SOCKS5 sandbox with a path like /proc/self/root/usr/bin/npx to reach a binary the denylist had named. The lesson generalizes past that one bug. A denylist of bad names is the wrong shape because the attacker picks the name; an allowlist that canonicalizes before it matches is the right one.

Loopback and link-local addresses need their own rule. The agent can reach a service on localhost or the cloud metadata endpoint at 169.254.169.254 unless you block it, and that metadata endpoint hands out cloud credentials to anything on the box that asks. Drop the link-local range outright. The agent has no legitimate reason to read instance metadata, and leaving it reachable turns a local foothold into a cloud one. The same SSRF logic that bites MCP servers applies to the agent's own process.

The proxy itself is bypassable if egress is not forced. HTTP_PROXY is an environment variable, and an injection can unset it or open a raw socket that ignores it. The proxy only holds if the firewall drops all egress except to the proxy's own address and port. The env var sets the path; the firewall makes it the only path. Test both together, because either one alone is a gap.

Prove it fails closed

A control you did not test is a guess. From inside the sandbox, confirm an allowlisted host connects and a non-allowlisted host is refused. Run these as a script, not by eye, so you can wire the same checks into CI:

# should succeed: allowlisted host
curl -sS -o /dev/null -w 'allowed: %{http_code}\n' https://api.anthropic.com
# allowed: 200 (or 401) — the connection reached the host

# should fail: host not on the list
curl -sS --max-time 5 https://example.com \
  && echo 'LEAK: reached non-allowlisted host' || echo 'blocked: arbitrary host'

# should fail: raw IP literal, to dodge a hostname-only match
curl -sS --max-time 5 https://93.184.216.34 \
  && echo 'LEAK: reached raw IP' || echo 'blocked: ip literal'

# should fail: cloud metadata endpoint
curl -sS --max-time 5 http://169.254.169.254/latest/meta-data/ \
  && echo 'LEAK: reached metadata' || echo 'blocked: metadata'

Every should fail line that prints LEAK is a hole in the allowlist, and the IP-literal and metadata cases are the two most teams miss. CISA's agentic adoption guidance puts network isolation and least privilege at the center of running autonomous workloads, and a fail-closed test is how you prove your isolation matches the policy on paper. Run the script on every change to the allowlist, because the failure mode is silent: a rule that stops blocking looks identical to one that works until the day it matters.

What this means for your stack

The minimum this week: run your coding agent with deny-by-default egress, build the allowlist from a day of observed traffic instead of guessing, force everything through a hostname-matching proxy, and block DNS exfiltration and the 169.254.169.254 metadata endpoint. Then run the fail-closed test above and put it in CI. That is an afternoon of work and it holds even when a prompt injection gets through everything upstream of it.

The pattern underneath is older than agents: treat untrusted code's network the way you treat any untrusted process. The agent runs attacker-influenced instructions every time it reads a web page or a dependency's README, so its outbound network is a trust boundary, and a trust boundary you enforce at the socket does not care how convincing the injection was. The OWASP agentic Top 10 files this under sensitive-information disclosure for a reason: the exit is the control point.

The allowlist stops the data from leaving. The complementary move is to keep the data from being worth stealing in the first place, by brokering credentials so the agent never holds the plaintext. hasp is one working implementation of that half: hasp run -- <command> injects a credential into a single process and keeps it out of the agent's environment, so an exfiltration call that survives your allowlist still carries nothing. Source-available (FCL-1.0), local-first, macOS and Linux, no account.

Pick either layer and you are better off; run both and the injection has to defeat the network and find a secret that is not there. Stand up the allowlist first, because it is the one you can test today and the one that does not ask you to trust the model at all.

Sources· cited above, in one place

NEXT STEP~90 seconds

Stop handing the agent your real keys.

hasp keeps secrets in one local encrypted vault, brokers them into the child process at exec, and never lets the agent read the value.

  • Local, encrypted vault — no account, no cloud, no telemetry by default.
  • Brokered run — agent gets a reference, the child process gets the value.
  • Pre-commit + pre-push hooks catch managed values before they ship.
  • Append-only HMAC audit log answers "did the agent touch the prod token?" in seconds.
→ okvault unlocked · binding ./api
→ okgrant once · pid 88421
→ okagent never read

macOS & Linux. Source-available (FCL-1.0, converts to Apache 2.0). No account.

Browse all clusters· eight threads, one index