GUIDE · REFERENCE 13 min · PUB MAY 21, 2026

AI coding agent terms of serviceSix vendors, line by line

Every AI coding agent has a data use policy. Most teams sign up without reading it, then discover the details when procurement asks. This table does the reading for you.

ForEngineering leaders and legal reviewing AI agent ToS

HASP REFERENCE · FLOW

01
Input
Prompts + code

Sent to vendor inference endpoint or local model
every session
02
Policy
ToS / DPA clauses

Training, retention, access, indemnity vary by tier
May 2026 snapshot
03
Risk
Exposure surface

IP, compliance, contractual liability if unreviewed
6 vendors compared

TL;DR· the answer, in twenty seconds

What: Six popular AI coding agents have materially different data use policies. Free tiers on at least three of them allow prompt and code content to train future models unless you actively opt out or upgrade.

Fix: Confirm your tier, locate the opt-out (vendor settings, API header, or enterprise DPA), and add indemnification scope to any enterprise negotiation checklist.

Lesson: "We don't train on your data" and "we use your data for service improvement" can both appear in the same policy and mean different things. Read the definitions section before the bullet points.

Engineering teams adopted AI coding agents fast. Legal teams are catching up. The typical sequence: a developer installs Cursor or Claude Code, uses it on production code for three months, then someone from legal or procurement asks "do they train on our code?" and nobody knows the answer.

The gap is not usually bad faith. The policies exist and are public. They are also long, inconsistently structured across vendors, and written to satisfy multiple regulatory frameworks at once. Reading the Anthropic privacy notice, the GitHub privacy statement, and the OpenAI API terms in the same afternoon requires tracing different definitions sections, different tier distinctions, and different DPA addenda. Most engineering teams do not do this. This article does it for them.

The six tools below cover the majority of enterprise usage as of May 2026: Claude Code (Anthropic), Cursor, GitHub Copilot, Codex CLI (OpenAI), Aider (bring-your-own-key), and the fully OSS path of OpenCode with a local model. The OSS row is included because it is the only option where the answer to every question is "not applicable" and that answer is worth understanding.

Ten questions matter most when reviewing these policies. The table below answers all ten for each vendor based on the published terms, privacy notices, and DPA documents current as of May 21, 2026. Notes after the table explain the ones where the answer is 'it depends'.

The 10 questions

Does my prompt content get used to train the model?
Does my code content get used to train?
What is the default data retention period?
Who can read my data (vendor employees, sub-processors)?
Where is data processed geographically?
Is there a zero-data-retention (ZDR) option, and at what tier?
What happens to data on account deletion?
Does the vendor disclose security incidents per a defined timeline?
What are the indemnification terms if the agent generates infringing code?
Can I opt out of telemetry?

The matrix

The table uses four shorthand values: Yes, No, Tier (depends on plan), and Vague (the policy text does not resolve the question). Vendor names link to their primary policy documents.

Question	Claude Code (Anthropic)	Cursor	GitHub Copilot	Codex CLI (OpenAI)	Aider (BYO key)	OSS + local model
1. Prompts train model	No (API / paid) / Yes (Claude.ai Free)	No (Privacy Mode on) / Tier	Tier (Individual Yes, Business/Enterprise No)	No (API, ZDR enrolled) / Tier	Depends on upstream provider	No
2. Code trains model	Same as above	Same as above	Same as above	Same as above	Depends on upstream provider	No
3. Default retention	30 days (API); indefinite (Claude.ai Free)	30 days (product); prompt logs per tier	28 days (suggestion telemetry)	30 days (API); ZDR = 0	Per upstream provider (e.g., OpenAI 30 days)	0 (no network call)
4. Who can read data	Employees + vetted sub-processors	Employees; Anthropic and Azure OpenAI as sub-processors	MS employees + GitHub staff; GitHub sub-processors	OpenAI employees + sub-processors	Upstream provider's policy applies	Nobody (local only)
5. Geographic processing	US (primary); EU SCCs available	US; Privacy Mode routes through Anthropic (US)	US and EU; region selector on Enterprise	US (primary); EU SOC 2 available	Wherever upstream routes	Local machine only
6. Zero-data-retention	Yes (API + Enterprise tier)	Yes (Privacy Mode, all paid tiers)	Yes (GitHub Copilot Enterprise)	Yes (API with ZDR header)	Yes if upstream supports it	N/A (no data sent)
7. On account deletion	Deleted "within a reasonable period" (Vague)	Deleted within 30 days (stated)	Deleted per GitHub account deletion flow (Vague on timeline)	Deleted within 30 days	Upstream controls	N/A
8. Security incident disclosure	72-hour GDPR timeline in DPA; no public SLA stated	Not specified in published terms (Vague)	72-hour per Microsoft DPA; public MSRC advisories	72-hour in OpenAI DPA for enterprise	Depends on upstream	N/A
9. IP indemnification	Anthropic indemnifies Enterprise customers; scope limited to Anthropic-generated output	Not offered in standard terms; Vague on Enterprise	Microsoft Copilot Copyright Commitment (CCC): covers Enterprise/Business; capped, conditions apply	OpenAI IP indemnity available on API Enterprise terms; similar conditions	Upstream provider's IP terms apply	None needed (you own output)
10. Opt out of telemetry	Yes (API: no telemetry by default; Claude.ai: account settings)	Yes (Privacy Mode in settings)	Yes (Copilot settings > editor telemetry off)	Yes (API default: no telemetry; CLI flag `--no-telemetry`)	Yes (Aider itself sends nothing; upstream may)	Yes (local model sends nothing)

Where the answers get complicated

"Training" vs "service improvement"

Every vendor has learned to write "we do not train on your data" in the headline and then define "service improvement" or "model evaluation" as a separate category buried in the definitions. Anthropic's API terms and OpenAI's API terms both use explicit opt-in language for model training at the API tier. The Claude.ai free product uses different language. Read the product-level privacy notice, not the API terms, when evaluating what happens to code pasted into a chat session.

The practical test: go to the definitions section and find what the policy means by "service improvement," "product enhancement," or "model performance." If the definition includes fine-tuning, retraining, or creating new training datasets, that is training by another name. If the definition is limited to bug detection, latency optimization, and abuse prevention, it is more defensible. Most policies land somewhere between those two descriptions.

Cursor's Privacy Mode routes requests through Anthropic rather than storing them on Cursor's infrastructure. That is a meaningful distinction. It does not automatically make Anthropic's API terms apply if the user is on a plan that routes through Cursor's own inference layer. The Privacy Mode guarantee is that Cursor itself does not retain the content. The guarantee does not extend to what Anthropic's receiving endpoint does under Anthropic's own API terms, which is why confirming your Anthropic API tier matters separately.

Free vs paid is the actual split

For Claude Code, Copilot, and Codex CLI, the free tier and the paid or enterprise tier have materially different terms. A developer who created a Claude.ai account on the free plan and uses that credential with Claude Code gets free-plan terms, not API terms, unless they explicitly use their own API key. GitHub Copilot Individual includes training opt-out. Copilot Business and Enterprise exclude it by default. Three different products, same brand, three different policies.

This is the gap most teams miss. An enterprise organization may have Copilot Business licenses, but individual developers who installed Copilot before the org license existed may still be running under their personal free or Individual plan with older terms attached. The same pattern applies with Claude Code: the tool does not force you onto API-tier terms. A developer whose authentication comes from a personal Claude.ai session is operating under consumer terms regardless of what their employer's procurement contract says.

The right audit step is not checking the contract. It is checking each developer's active credential source and confirming it matches the license tier your legal team reviewed.

Aider is a pass-through

Aider itself stores nothing. It sends your prompts and code context to whichever API you configure, and the retention, training, and sub-processor questions all flow to that upstream provider. If you configure Aider with OpenAI's API and you have not enrolled in zero-data-retention, OpenAI's default API retention applies (30 days, no model training, per their current terms). If you configure it with a local Ollama instance, nothing leaves the machine. Aider's own codebase is MIT-licensed and you can audit it, which puts it in a different category from the vendor-controlled tools.

The OSS + local model path

Running OpenCode or a similar open-source shell against a locally hosted model (Ollama, llama.cpp, anything that speaks OpenAI-compatible API at localhost) means zero data leaves the machine. There are no sub-processors, no geographic transfer questions, no retention periods, and no vendor indemnification. The tradeoff: you maintain the model infrastructure, you don't get frontier model quality for complex reasoning tasks, and indemnification on the output becomes your own legal department's problem. For high-security codebases where confidentiality outweighs capability, this is a serious option, not a fallback.

What is vague, and why it matters

Three vendors gave materially vague answers on the questions that matter most in enterprise procurement.

Account deletion timelines are the most consistent gap. "Within a reasonable period" appears in multiple policies without defining what reasonable means. For GDPR Article 32 compliance, reasonable needs a number. Backup and log retention often extend beyond the primary data deletion window: a vendor that deletes your account in 30 days may retain anonymized training derivatives, system logs, and backup snapshots for 90 or 180 more. The policy language rarely addresses these secondary stores explicitly. In enterprise negotiations, deletion SLAs belong in the DPA as separate commitments for primary data, backups, and derived data.

Security incident disclosure is explicit in Anthropic's and Microsoft's DPAs (72-hour notification). It is entirely absent from Cursor's published terms. If Cursor processes code from a regulated industry codebase and a breach occurs, there is no contractual guarantee on when your organization hears about it. Cursor likely addresses this in enterprise contracts, but the standard published terms do not. SOC 2 Type II certification, which Anthropic, Microsoft, and OpenAI hold, requires defined incident response processes, but the SOC 2 report does not automatically translate into a notification SLA in your vendor contract. Those are separate things.

IP indemnification scope is the most consequential gap for legal. Microsoft's Copilot Copyright Commitment is the most specific commitment in the market. It covers defended claims, not just settlement costs, and it applies to Business and Enterprise customers who followed the usage policies. Anthropic's Enterprise indemnification is similar in structure but narrower in the publicly described scope. OpenAI's API Enterprise terms include IP indemnification but the exact cap and conditions require a direct contract review. None of these commitments cover output generated in violation of the acceptable use policy, meaning a developer who uses the tool to produce something the AUP prohibits loses indemnification coverage on that output.

The pattern across all three gap areas: the published policy gets you 70% of the answer. The other 30% lives in the enterprise DPA addendum, which you can request before signing.

What to look for in the "service improvement" clause

Before signing any vendor DPA, find and read the service improvement or quality improvement definition. The questions to answer:

Does "service improvement" include using your content to fine-tune or retrain any model, including evaluation models?
Does it include human review of your prompts or code by vendor employees or contractors?
Is it opt-in or opt-out, and at what tier does the opt-out become available?
Does the definition distinguish between your content and aggregated, de-identified data derived from your content?

Several vendors are clear on the first point but vague on the second. Human review for quality assurance is common. It is also the most sensitive question for code containing trade secrets or pre-release product logic. The EU AI Act pushes toward transparency on human oversight of AI outputs, but the overlap with vendor quality review of inputs is not yet addressed in published guidance from regulators.

Which vendors are clearest

Anthropic and Microsoft publish the most granular policy documents with the clearest tier distinctions. Both have explicit DPA language, defined incident disclosure timelines, and available IP indemnification for enterprise customers. The Anthropic Trust Center at trust.anthropic.com consolidates SOC 2, DPA templates, and security documentation in one place. GitHub's privacy statement for Copilot is long but searchable, and the tier-specific documentation is linked directly from the product page. For legal teams doing a first pass, both are readable without a vendor call.

OpenAI's API policies are similarly precise for API customers, with zero-data-retention available via request header (OpenAI-Organization + ZDR enrollment) or by account-level configuration for enterprise orgs. The ambiguity shows up when users mix Claude.ai-type consumer products with API-tier tooling, a configuration Claude Code makes easy to arrive at accidentally. Codex CLI, which is an OpenAI API client, inherits exactly the API terms that apply to the credential it runs under. Get the credential right and the terms follow.

Cursor's privacy documentation has improved over 2025 and early 2026, but it still leaves the security incident disclosure and sub-processor list questions underspecified compared to the enterprise vendors. The Privacy Mode toggle is a legitimate control, but the policy around non-Privacy-Mode data storage is written at a level of generality that requires follow-up. Specifically: what sub-processors receive inference requests outside Privacy Mode, and for how long are those forwarded requests retained by the sub-processor? The current published terms do not answer this.

Aider's documentation is honest about being a pass-through: it tells you exactly which upstream controls apply and links to them. That transparency is more useful than vague vendor assurances. For teams that want to know what happens to their code, "go read OpenAI's API terms" is a complete answer. For teams using Aider with multiple upstream providers depending on the task, each provider needs a separate review.

What to negotiate in enterprise contracts

If your organization is in procurement discussions with any of these vendors, five clauses matter:

Deletion SLA. Define account and data deletion as a specific number of days, with written confirmation. "Reasonable period" is not a number.

Sub-processor list. Request the full list, not a category description. Enterprise DPAs typically include this. Know which cloud providers and third-party ML infrastructure vendors touch your code.

IP indemnification scope. Get the commitment in writing, including the cap, the conditions (usage policy compliance), and whether it covers defense costs or only final judgments.

Security incident notification SLA. 72 hours is the GDPR-standard target. Match whatever your internal incident response policy requires. Shorter is achievable; negotiate it.

ZDR enrollment confirmation. For API-tier vendors, zero-data-retention is often a configuration choice, not an automatic enterprise default. Confirm in writing that your organization's API credentials are enrolled and that the vendor has records showing this.

A checklist you can paste into a vendor review or PR:

## AI coding agent ToS review checklist

- [ ] Identified which tier/plan the tool runs under (free / paid / enterprise)
- [ ] Located and read the product-level privacy notice (not just the API terms)
- [ ] Found the "service improvement" definition and confirmed whether it includes model training
- [ ] Confirmed training opt-out status for all developer accounts in the org
- [ ] Confirmed Privacy Mode or ZDR is enabled and documented
- [ ] Requested full sub-processor list
- [ ] Confirmed deletion SLA is a defined number of days in the DPA
- [ ] Confirmed security incident notification SLA matches internal IR policy
- [ ] Confirmed IP indemnification scope, cap, and conditions
- [ ] Confirmed telemetry opt-out is applied at account and editor level
- [ ] Flagged any free-plan accounts that should migrate to org license

What this means for your stack

The policies above are not static. Each vendor updates terms as products evolve, and the gap between what the published policy says and what an enterprise contract delivers is meaningful. Any organization running AI coding agents on production or pre-release code should treat vendor data policy review as a recurring task, not a one-time procurement checkbox.

The credential and code exposure surface does not end at the vendor policy. GitGuardian's 2026 State of Secrets Sprawl report found AI-service token leaks up 81% year over year, and AI-assisted commits leak secrets at roughly double the baseline rate. The terms of service question and the secrets-at-rest question are connected: if prompts containing credentials are retained by a vendor and a breach occurs, the ToS disclosure timeline determines how fast your team knows.

hasp is one working implementation of a local secret broker that keeps credentials out of the prompts entirely. curl -fsSL https://gethasp.com/install.sh | sh, hasp setup, connect a project, and the agent gets a reference instead of a value. Source-available (FCL-1.0), local-first, macOS and Linux, no account.

Even without a broker, the ToS review above is worth the two hours it takes. The vendor that is clearest about what it does with your data is the one you want to be on when legal asks the question.

Sources· cited above, in one place

NEXT STEP~90 seconds

Stop handing the agent your real keys.

hasp keeps secrets in one local encrypted vault, brokers them into the child process at exec, and never lets the agent read the value.

Local, encrypted vault — no account, no cloud, no telemetry by default.
Brokered run — agent gets a reference, the child process gets the value.
Pre-commit + pre-push hooks catch managed values before they ship.
Append-only HMAC audit log answers "did the agent touch the prod token?" in seconds.

Install hasp Read the docs View on GitHub

→ okvault unlocked · binding ./api

→ okgrant once · pid 88421

→ okagent never read

macOS & Linux. Source-available (FCL-1.0, converts to Apache 2.0). No account.