The leaked-secret incident runbook for AI agentsRevoke, rotate, purge, audit, in order.
The gap between a leaked key and an abused key is measured in seconds, not hours. This runbook gives you the order to work in: revoke first, rotate second, purge git history third, audit last. Most teams get the order wrong and leave a live credential sitting in every existing clone.
-
01
Minute 0
Revoke
De-authorize the key first. A code-only cleanup leaves a live credential in every clone.
stop the bleeding -
02
Minute 5
Rotate
Generate a replacement, update every environment, then disable the old key. Order keeps prod up.
no grace period -
03
Hour 1
Audit
Purge history, then read the access logs for the exposure window. Assume reuse until proven otherwise.
blast radius
TL;DR· the answer, in twenty seconds
What this is: A sequenced runbook for the moment an AI coding agent commits a live secret. Four phases, in a fixed order: revoke, rotate, purge, audit. Plus the cloud caveat that breaks naive revocation.
The minimum fix: Revoke the credential in the provider console before you touch the code. Deleting the file or rewriting git history first does nothing to a key that scanners may have already pulled.
The lesson: The order is the deliverable. A documented, drilled runbook beats a clever tool, because the only variable you control under pressure is how fast you contain the blast radius.
This is the sequence to run the moment an AI coding agent commits a live secret: revoke, rotate, purge, audit, in that fixed order.
The order is the whole point. Most teams reach for git first, rewrite the offending commit, force-push, and feel safe. The credential is still valid. It is still sitting in every existing clone, in GitHub's commit cache, in any fork, and quite possibly in an attacker's scanner queue. Code cleanup is the last 20% of the job dressed up as the first.
The clock you're actually racing
The threat timeline compressed in 2026, and not because the leak rate changed. Automated scanners pull newly committed secrets off public GitHub within seconds of the push. Credential abuse now starts measurably before a human notices anything is wrong. If your mental model is "I have a few hours to triage and rotate," you have already lost the race that matters.
Two numbers from GitGuardian's reporting set the stakes. The average time to remediate a leaked secret runs to roughly 27 days, which is weeks of open exposure for a credential a scanner found in seconds. And of the credentials leaked in 2022, GitGuardian found that 64% were still valid at the start of 2026. The blocker is almost never technical. It is the absence of a repeatable path, so the leak sits in a backlog while everyone assumes someone else rotated it.
AI coding agents make this worse by default. Tools like Lovable, Bolt, v0, Cursor, Claude Code, and Replit wire client SDKs straight into the frontend, which ships the key to every visitor, and they paste secrets into example code without a second thought. The leak surface grew. The response window shrank. A runbook is how you close the gap that opened between them.
Minute 0: revoke before you touch the code
Revoke first. Not rotate, not git filter-repo, not a panicked Slack message. Revoke.
Revocation de-authorizes the credential so that the copy in every clone and cache becomes worthless. This is the one action that actually stops the bleeding, and it is the one most runbooks bury under three steps of git surgery. The OWASP playbook for a leaked secret is four ordered steps for a reason: revoke, rotate, delete, log. Reorder them and you spend your first ten minutes scrubbing history while the live key earns an attacker money.
Know your provider's revocation behavior before the incident, because it varies and some of it is unforgiving. Anthropic console keys, for example, revoke instantly with no grace period. The moment you disable the key, in-flight requests fail on their next call. That is the behavior you want during an incident and the behavior that takes down production if you revoke before you have a replacement wired in. Which is exactly why revoke and rotate are two phases, not one.
If more than one secret shared the leaked file or commit, treat every secret in that blast zone as compromised. An attacker who pulled the commit got all of them, not the one you noticed.
Minute 5: rotate a replacement without taking down prod
Rotation is three steps, and the order keeps production alive:
# 1. Generate the replacement in the provider console or via API.
# Do NOT disable the old key yet.
# 2. Update every environment that holds the old value.
# Miss one and that service hard-fails the instant you disable the old key.
rg -l 'OPENAI_API_KEY' . # find every reference, every env file
# update: local .env, CI/CD secrets, staging, prod, edge functions, k8s secrets
# 3. Validate nothing still depends on the old key, THEN disable it.
# Watch error rates for a few minutes before you pull the trigger.
The failure mode is breaking prod, and three mechanisms make unattended rotation safe enough to run under pressure. Stage the new credential alongside the old one so consumers move over gradually instead of all at once. Gate the disable of the old key on health signals from those consumers. And keep a rollback ready, so a failed health check flips you back to the old credential and pages a human instead of taking the service down. Aembit's remediation guide walks the same staged-rollout, health-gated, auto-rollback pattern in more depth.
This reframes what rotation is for. Manual rotation optimizes for the wrong variable: it minimizes how often you rotate. The number you want to minimize is the blast radius of any single leak. A key on a 90-day manual cycle has a 90-day worst-case exposure window. A key rotated daily by automation has a 24-hour window, and the muscle memory to rotate in minutes when something leaks. Short-lived credentials, such as AWS STS tokens, Vault dynamic secrets, or OIDC-federated access, close that window structurally. A 15-minute credential that leaks is a non-event. (For the architecture behind that, see runtime secret injection.)
The hour after: purge git history, then assume it failed
Now you can touch the code, with the right expectation: purging git history is hygiene, not remediation.
Truffle Security's position is blunt: deleting leaked keys is not a solution. Deleting the file leaves the secret in history. Even after you rewrite history with git filter-repo or BFG, treat the secret as permanently compromised, because GitHub caches commits, forks preserve them, and web archives may have snapshotted the page before you got there.
# Private repo: rewrite history to remove the blob.
git filter-repo --invert-paths --path config/secrets.yml
# Public repo: the bots already have it. History rewrite is optional cleanup;
# rotation (which you did in Minute 5) was the part that mattered.
The decision splits on visibility. Private repo: rewrite history, because limiting who can pull the old blob still buys you something. Public repo: the key is already crawled, so rotation was mandatory and the rewrite is optional tidying. Either way, clear the secret out of the places people forget: Terraform state, Ansible playbooks, Helm charts, CI logs, build artifacts, internal wikis, and any backup or snapshot that captured the old value. (Scanning the repo for what else leaked is worth doing here, not just at the start.)
Then audit the blast radius. Every major provider logs API usage, so pull the logs for the exposure window and look for activity you did not authorize. Check for resource creation in regions you never use, since attackers spin up crypto miners in every available region at once. If the key reached a database, check whether records changed. If it was a payment key, review every charge and refund in the window and check the webhook configuration for tampering. Work the git history backward to find when the secret was first committed, because that timestamp, not when you noticed, is the real start of the exposure window. This is also where an append-only audit log earns its keep: you cannot reconstruct a blast radius from logs an attacker could edit.
The cloud caveat: revocation that doesn't take
One edge case breaks the naive version of this runbook, and cloud responders need to know it before the incident, not during.
AWS IAM is eventually consistent, which means there is a window where a key you just revoked still works. Following a responsible disclosure, AWS published a Credential Cleanup Procedure that sequences the containment: attach a deny policy scoped to credential-management actions, nullify active sessions with a time-conditioned policy, rotate the access keys, then clean up once you have confirmation. The researchers who reported it note that the official procedure does not fully close the consistency gap. Account-level quarantine through OU isolation does close it, at a real operational cost. The point for your runbook: on AWS, "I clicked revoke" is the start of containment, not the end. Verify the deny actually took effect.
The runbook, on one page
Print this. Tape it next to the monitor. The whole value of an incident runbook is that you run it without thinking when your pulse is up.
LEAKED-SECRET RUNBOOK
=====================
[ ] MIN 0 REVOKE
- Identify the provider + key. Note every other secret in the
same file/commit; treat them all as compromised.
- Revoke in the console NOW. Confirm it took (esp. AWS: deny
policy + session nullification, verify the deny applied).
[ ] MIN 5 ROTATE
- Generate replacement. Do NOT disable old yet.
- Update EVERY environment (local, CI, staging, prod, edge, k8s).
- Validate, watch error rates, THEN disable the old key.
[ ] HOUR 1 PURGE
- Private repo: git filter-repo / BFG.
- Public repo: rotation already covered it; rewrite is optional.
- Clear Terraform, Ansible, Helm, CI logs, artifacts, backups.
[ ] HOUR 1 AUDIT
- Pull provider usage logs for the exposure window.
- Unfamiliar regions? DB writes? Payment activity? Webhooks?
- Find first-commit timestamp = true start of exposure.
DECISION: managed vs unmanaged secret
- Short-lived / brokered -> revoke = expire the lease, blast radius ~minutes
- Static long-lived -> full runbook above, assume worst case
If a step in here makes you stop and look something up, that is a gap to close on a calm day, not during the next leak. The way to find those gaps is to drill it: leak a known credential into a staging environment and time yourself from detection to confirmed revocation. That number is your real mean time to remediate. It is almost always slower than you think.
What this means for your stack
Write the runbook down before you need it, then drill it once so the order is reflex: revoke, rotate, purge, audit. The single most common mistake is leading with git, which scrubs the evidence while the live key keeps working. Revocation is the only step that stops the bleeding, so it goes first, every time.
The deeper fix is to make revocation cheap and the blast radius small by default. Static, long-lived credentials force the full four-phase scramble. Short-lived, brokered credentials turn a leak into a non-event, because the leaked value expires on its own and revocation is just expiring a lease. The architecture that gets you there is runtime brokering with an append-only audit log: the agent never holds a durable secret, and you can reconstruct exactly what a leaked reference touched.
hasp is one working implementation of that pattern. It brokers scoped, short-lived references to your coding agent instead of exporting raw keys, and logs every access to an append-only record you can read during exactly this kind of incident. Source-available (FCL-1.0), local-first, macOS and Linux, no account.
Whatever you run, the test is the same. Pick a credential, pretend it leaked, and time your team from detection to confirmed revocation. If that number is measured in days, the next real leak will be too, and the attacker's scanner already knows it works in seconds.
Sources· cited above, in one place
Stop handing the agent your real keys.
hasp keeps secrets in one local encrypted vault, brokers them into the child process at exec, and never lets the agent read the value.
- Local, encrypted vault — no account, no cloud, no telemetry by default.
- Brokered run — agent gets a reference, the child process gets the value.
- Pre-commit + pre-push hooks catch managed values before they ship.
- Append-only HMAC audit log answers "did the agent touch the prod token?" in seconds.
macOS & Linux. Source-available (FCL-1.0, converts to Apache 2.0). No account.