Agent autonomy and trust boundariesThe five questions every agent design dodges
Every major coding agent runs at the trust level of your OS user account. Autonomy is moving fast. The trust boundary hasn't moved at all.
-
01
Scope
OS user account
Agent inherits full user trust on launch
no sandbox by default -
02
Gap
Five open questions
Secrets, commands, network, state, audit
most designs skip all five -
03
Model
Per-task boundary
Scoped, time-limited, resource-specific
auditable after the fact
TL;DR· the answer, in twenty seconds
What: Coding agents like Claude Code, Cursor, Aider, and Codex CLI inherit the full trust of your OS user account. Every secret you can read, they can read. Every command you can run, they can run.
Fix: Evaluate each agent session against five questions: credential visibility, command scope, network reach, state persistence, and audit coverage. If you can't answer all five, the boundary is wherever the agent decides it is.
Lesson: Agent autonomy is increasing by design. Trust boundaries won't move unless you draw them explicitly, per task, per session.
In February 2026, Boris Cherny posted a widely-read thread on what happens after coding itself is solved. His argument: the hard problem isn't generation, it's autonomy. Agents that can write code will soon run tests, open PRs, deploy builds, and manage infrastructure without human approval at each step. That trajectory is real. What he didn't address, and what almost nobody building on top of these systems addresses, is that autonomy and trust are two different things. Autonomy is moving fast. Trust boundaries have not moved.
Every major coding agent today, including Claude Code, Cursor, Hermes, OpenCode, and Aider, draws its trust boundary at the same place: your OS user account. The agent process runs as you. It reads what you can read. It executes what you can execute. It reaches the network wherever you can reach the network. When you launch one of these tools, you are not delegating a task to a constrained subprocess. You are handing an LLM-driven process your full identity on the machine.
That was an acceptable tradeoff in 2023, when agents mostly generated code snippets and waited for you to paste them. It is not an acceptable tradeoff for agents that autonomously clone repos, run builds, manage git history, and call external APIs across a multi-hour session.
What the user-account boundary actually means
Unix permissions were designed for humans. The model is simple: a user account owns files and processes, and the OS enforces boundaries between accounts. If you trust a person enough to give them your login, they can do what you can do.
Browser sandboxing moved past that model in 2008. The Chrome security team published their multi-process architecture partly to isolate web content from the host OS and from other tabs. A malicious script in one tab cannot read your desktop files. The sandbox is not perfect, but it exists and it is explicit.
Container isolation moved past it again. Docker and OCI containers give a process its own filesystem namespace, network namespace, and process namespace. You can run an untrusted binary in a container and reason about what it can reach.
Coding agents have not moved past the user-account model. They are closer to the 1970s Unix model than to either of those successors. The agent runs as you, with your PATH, your environment variables, your SSH keys, your cloud credentials, your git identity. MITRE ATLAS documents this class of privilege inheritance under "AI System Access," but the framing there is about adversaries gaining access. The more common case is that you grant it intentionally and forget what you granted.
Five questions that expose the real boundary
Most agent security conversations focus on prompt injection or supply-chain risks in MCP servers. Those are real. But the trust boundary problem is older and larger. Five questions get at it directly.
One: Can the agent see secrets you don't intend it to see?
If you have AWS_ACCESS_KEY_ID set in your shell environment, a Claude Code session started from that shell inherits it. You may have intended it for a specific aws CLI call. The agent can read it, pass it to any tool, include it in any prompt it sends to Anthropic's API, or write it to a state file. Knostic documented exactly this mechanism in February 2026 with settings.local.json capturing environment variables from every session and shipping them in npm packages.
The scope of what an agent can see is not the list of files you ask it to read. It is everything in your environment, everything in your home directory, and everything accessible through the APIs that directory contains keys for.
Two: Can the agent execute commands you wouldn't approve interactively?
Claude Code supports a permissions model. Cursor has trust levels. But both defaults are permissive, and the agent's judgment about which commands are safe runs ahead of yours. Check Point Research published a command-injection bug in Claude Code project files (CVE-2025-59536) in early February 2026, showing that a malicious project file could run arbitrary shell commands when Claude Code opened it. The access vector was the agent's willingness to execute configuration-specified commands without a separate human approval gate.
The question is not whether the agent can execute commands. It can. The question is whether there is any principled scope restriction on which commands, against which resources, during which time window.
Three: Can the agent reach the network without your awareness?
The MCP specification allows servers to make arbitrary outbound connections. An MCP server you install from a registry can call home, exfiltrate data, or proxy requests through your network identity. OX Security reported approximately 7,000 MCP servers in the wild by early 2026, with no signature requirement and no standardized permission manifest. The agent that calls one of those servers to "look up documentation" or "check a package version" may be doing other things on the same connection.
This is not hypothetical. Snyk researchers disclosed a prompt-injection data heist via a GitHub MCP server in early 2026, where injected instructions caused the agent to exfiltrate repository contents through the MCP connection. The agent's network reach is your network reach, and MCP amplifies the number of processes that can use it.
Four: Can the agent persist state that survives the session?
Every major agent writes files. Claude Code writes .claude/settings.local.json. Cursor writes to .cursor/. Aider maintains .aider.chat.history.md. These files stay on disk after the session ends. They may contain conversation fragments, cached tool outputs, API responses with embedded data, or, as the February 2026 npm incident showed, raw environment variables from the session.
State persistence means the trust boundary extends in time. A secret that appeared in your environment for five minutes of one session may sit in an on-disk file for months, available to anyone who reads that file, to any future agent session that loads it, and to any publishing pipeline that picks up the directory.
Five: Can you audit any of the above after the fact?
This is the question nobody asks until something goes wrong. The NIST AI Risk Management Framework includes auditability as a core governance property. In practice, most agent deployments have no structured audit trail. What commands did the agent run during the session? Which files did it read? What did it send to the model API? What network calls did the MCP servers make?
Shell history captures some of it, imperfectly. The agent's own conversation log captures some of it, from the model's perspective. Nothing captures it comprehensively, in a tamper-evident format, indexed for later query. If a compliance audit asks "did the agent access the production database credential last Tuesday," you probably cannot answer.
The analogy that actually fits
The container model is the right reference point, not because agents should run in containers (though that helps), but because the container design process forces explicit answers to the right questions. When you write a Dockerfile, you decide which files exist in the image, which ports are exposed, which environment variables are injected, and what the process runs as. You can audit that manifest. You can diff it across versions.
SPIFFE/SPIRE goes further: workload identity gives each process a cryptographic identity with an explicit set of permitted resources, issued at startup and rotated on a short TTL. A process doesn't inherit ambient credentials; it proves its identity and receives specifically scoped access. The trust boundary is per-workload, per-resource, time-limited.
Coding agents are the opposite. They receive maximum ambient trust at launch and have no mechanism for callers to scope that trust down. You cannot say "this agent session may read files in src/ but not ~/.ssh/, may run cargo test but not git push, may call the docs API but not the payments API." The agent you're using today has no interface for expressing those constraints, and the frameworks it runs on have no way to enforce them.
What gets missed in the autonomy conversation
Autonomy and trust are conflated. Boris Cherny's framing from February 2026, that the next hard problem is autonomy, is correct. But autonomy without bounded trust is just unrestricted execution. The risk is not that agents will become autonomous. The risk is that they already are, for multi-step tasks covering real infrastructure, while the trust model is still the one designed for a single human sitting at a keyboard.
The Replit incident in 2024 (an agent deleting a production database) was not a failure of autonomy. The agent did exactly what it was asked to do, given a task it was trusted to complete, with access to resources it should not have had. The same structure applies to the $82K Google Cloud bill incident: an agent with full cloud credentials, a poorly-scoped task, and no time limit on what it could provision.
OWASP's Top 10 for LLM Applications lists "excessive agency" as its eighth category, covering unconstrained tool permissions and lack of human oversight. The framing there is reactive: here is a thing that goes wrong. The missing half is prescriptive: here is how to draw the boundary before it goes wrong.
A model that actually works
Per-task trust boundaries have four properties. They are specific: the agent can access these resources, not everything you can access. They are time-limited: the grant expires when the session ends, or after a ceiling (24 hours is a common implementation choice). They are delivered at exec time: secrets and credentials appear inside the process that needs them, not in the shell environment that outlives it. And they are audited: every grant is recorded in an append-only log you can query later.
hasp implements exactly this model for the developer-local case. Each hasp run invocation scopes a credential grant to one child process, enforces a 24-hour ceiling, and writes the grant to an HMAC-chained audit log at ~/.hasp/audit.jsonl. The agent process never touches the raw credential; it gets a reference that the broker resolves at exec time.
NIST AI RMF Govern 1.2 and Govern 4.1 both point at exactly this structure: defined roles, scoped permissions, and audit mechanisms that support accountability after the fact. The framework is abstract, but the mapping to agent design is direct. A task runs. The agent receives identity credentials sufficient for that task. The credentials expire. The audit log records what happened. The next task gets fresh credentials.
This is not harder to implement than what agents do today. It is a different design choice, made once, at the point where you decide how your team's agents launch and what they inherit.
A checklist you can paste into a PR
## Agent trust boundary audit
- [ ] Agent launch script strips the parent environment (env -i or equivalent)
- [ ] Secrets injected per-session, not exported in ~/.zshrc or ~/.bashrc
- [ ] Agent state directories (.claude/, .cursor/, .aider/) in .gitignore
- [ ] Agent state directories in .npmignore / MANIFEST.in / .dockerignore
- [ ] MCP server list reviewed; each server's outbound network access documented
- [ ] Network egress for agent sessions restricted to required hosts (firewall or proxy)
- [ ] Session audit log exists and is tamper-evident (append-only, HMAC or equivalent)
- [ ] Audit log queryable by resource accessed, command executed, time range
- [ ] Credential grants time-limited with explicit ceiling
- [ ] Post-session cleanup verified: no residual secrets on disk
What this means for your stack
The gap in most agent deployments is not a missing tool, it is a missing design decision. Someone chose a framework, ran npm install, and started shipping. Nobody wrote down what the agent is allowed to access, for how long, or how you would know if something went wrong. Closing that gap means picking an explicit trust model before the next multi-hour autonomous session runs against production infrastructure.
hasp is one working implementation. curl -fsSL https://gethasp.com/install.sh | sh, hasp setup, connect a project, and each agent session receives only the credentials it needs, injected at exec time, with a 24-hour ceiling and an HMAC-chained audit log at ~/.hasp/audit.jsonl. Source-available (FCL-1.0), local-first, macOS and Linux, no account.
The argument for per-task trust boundaries doesn't depend on any particular tool. It depends on the observation that autonomy and ambient trust is a combination with a known failure mode, and you have seen the incidents. Draw the boundary before the agent does.
Sources· cited above, in one place
- Anthropic Security advisories and Claude Code release notes
- Model Context Protocol Specification
- OWASP Top 10 for Large Language Model Applications
- NIST AI Risk Management Framework AI RMF 1.0
- MITRE ATLAS Adversarial threats for AI systems
- SPIFFE / SPIRE Workload identity
- Replit incident coverage Agent deleting a production database (2024-2025)
Stop handing the agent your real keys.
hasp keeps secrets in one local encrypted vault, brokers them into the child process at exec, and never lets the agent read the value.
- Local, encrypted vault — no account, no cloud, no telemetry by default.
- Brokered run — agent gets a reference, the child process gets the value.
- Pre-commit + pre-push hooks catch managed values before they ship.
- Append-only HMAC audit log answers "did the agent touch the prod token?" in seconds.
macOS & Linux. Source-available (FCL-1.0, converts to Apache 2.0). No account.