Self-hosted MCP gateway patternOne door for every server
By early 2026, OX Security counted roughly 7,000 MCP servers in the wild. Running five or more of them without a gateway means your security posture is only as strong as the weakest one.
-
01
Fleet
MCP server fleet
5+ servers, each with its own credentials and tools
~7,000 in wild · OX Security -
02
Gateway
MCP gateway process
Brokers auth, allowlists, audit log, injection scrub
one endpoint per team -
03
Agent
AI agent session
Talks to one trusted URL, sees scoped tool list
no per-server auth
TL;DR· the answer, in twenty seconds
What: Teams running multiple MCP servers face an audit-and-credential problem that grows with every server they add. Reviewing each server individually does not scale past five.
Fix: Run one MCP gateway process in front of all your servers. The gateway injects credentials, enforces per-tool allowlists, scrubs prompt-injection payloads, and writes an append-only audit log. The agent connects to one trusted endpoint.
Lesson: The gateway pattern is not MCP-specific. Any system where a single client talks to many back-end services benefits from a single brokering layer at the boundary. The agent does not need to know how many servers exist.
OX Security's early-2026 analysis of the MCP ecosystem counted roughly 7,000 publicly reachable MCP servers and 150 million downloads with no signature requirement. Most teams running MCP in production are not running one or two servers. They are running five, eight, twelve: a GitHub server, a Slack server, a database server, internal tools built on FastMCP, a vendor integration someone bolted on last sprint.
Each server has its own auth flow, its own tool list, and its own update cadence. An agent connecting to all of them directly inherits every one of their credential surfaces. When someone leaves the team, you revoke their access in twelve places. When a tool you have not used in three months gets a prompt-injection payload in its response, you find out during the postmortem.
The gateway pattern is the obvious response, and it is underbuilt. Most teams either run each server individually and accept the overhead, or they script around the problem in ways that do not survive the third person touching the config. This article describes what a real gateway looks like, compares the existing implementations, and gives you enough to stand up a minimum-viable version in a weekend.
What to know before you build anything
Five things matter before architecture decisions:
- The MCP specification defines a JSON-RPC transport layer, not a security model. Security is your problem.
- A gateway does not need to modify the MCP protocol. It speaks MCP inbound and MCP outbound. The agent never knows there is an indirection.
- Credential injection at the gateway boundary is simpler than per-server credential management everywhere. The gateway knows the real credentials. The agent knows nothing.
- Tool-call audit logs are only useful if they are tamper-evident. A flat log file the agent's host process can write to is not an audit log.
- The biggest operational risk is not the servers themselves. It is the update story: how do you update a server without dropping an active agent session?
The core architecture
The gateway is a reverse proxy for MCP traffic. One process, one address your agents connect to, multiple upstream MCP servers behind it. The agent sends a tool call. The gateway receives it, checks the per-tool allowlist, injects the upstream credentials, forwards the call, receives the response, runs the prompt-injection scrubber, and returns the result. The agent sees a response.
That description sounds like a standard reverse proxy, and in some ways it is. The difference is that MCP traffic carries structured tool calls, not HTTP requests. The gateway can inspect the semantic content of every call: which tool, which arguments, which values. An nginx or Caddy proxy can forward the traffic, but it cannot enforce "deny calls to the execute_sql tool where the query contains a DROP statement." The gateway can.
Four components:
The routing layer maps tool names to upstream servers. The agent calls github_create_pr. The gateway knows that tool lives on the GitHub MCP server at localhost:8001. It routes and returns. The agent does not hold a list of server addresses.
The credential injector adds the upstream server's credentials to each forwarded request. The gateway holds the real tokens. When it forwards a call to the GitHub server, it injects the GitHub token. When it forwards to the Slack server, it injects the Slack token. The agent's context window sees neither.
The allowlist engine filters which tools each agent identity can call. A read-only analyst gets the list-tools subset of the database server. An engineering agent gets the write tools too. You configure this per team, per role, or per session token, depending on how you scope your auth.
The audit writer appends a structured record for every call: timestamp, agent identity, tool name, arguments (with secrets redacted), upstream server, response status, latency. The record is HMAC-chained so you can verify nothing was deleted or reordered.
The prompt-injection scrubber sits at ingress on the response path. Before the upstream response reaches the agent, the gateway scans it for patterns that look like embedded instructions: "ignore previous instructions," base64-encoded payloads, suspicious markdown formatting designed to hijack the agent's next step. Snyk researchers documented this class of attack in early 2026 as the MCP GitHub prompt-injection data heist. A scrubber does not catch everything. It does catch the obvious patterns and forces attackers to work harder.
Existing implementations
Three categories of existing gateway code:
Anthropic's reference tooling. The MCP documentation includes a reference client and transport implementations, but no gateway. Anthropic's reference setup assumes a single agent connecting to a small number of servers directly. It is a starting point for understanding the protocol, not a production gateway.
mcp-proxy. A community project that aggregates multiple MCP servers behind one stdio or HTTP endpoint. It handles routing and basic server management. It does not handle credential injection, per-tool allowlists, or audit logging. Worth understanding as a pattern reference. Not enough on its own for a team with security requirements.
FastMCP. A Python framework for building MCP servers that also supports middleware. You can write a gateway as a FastMCP server that proxies upstream calls. This gets you pythonic middleware for scrubbing and logging with relatively little boilerplate. The operational overhead is higher than a purpose-built binary: Python runtime, package management, a dependency tree that needs auditing.
Custom builds. Most teams with serious MCP deployments end up writing their own. The advantage is that you control every layer. The disadvantage is that you own the security surface. The typical shape: a Go or Rust binary, an HTTP/SSE transport, a YAML config file mapping tools to upstreams, a middleware stack for allowlists and audit logging. Two weeks of engineering time to get it right.
The gap between mcp-proxy and a custom build is where most teams get stuck. mcp-proxy is not enough. A custom build feels like too much for an initial deployment. The weekend implementation at the end of this article closes most of that gap.
Scoping per-team auth
Agent identity at the gateway boundary is where most designs underinvest. The gateway needs to know who is calling before it can enforce allowlists or write a meaningful audit record.
The simplest model: each team gets a short-lived bearer token scoped to a set of allowed tools. The token encodes the scope. The gateway validates it. When someone leaves the team, you revoke their token. No per-server credential rotation needed.
A stronger model uses SPIFFE workload identity to issue cryptographic identities to agent processes. Each agent process gets a SPIFFE Verifiable Identity Document (SVID) that proves "this process is the CI pipeline agent for team-payments." The gateway validates the SVID against a SPIRE server. No static token to rotate, no credential to leak.
The practical middle ground for most teams: mTLS between the agent and the gateway, with client certificates issued per team by your internal CA. More work to set up than a bearer token, less work to maintain than a SPIFFE deployment. Revocation is a CRL push rather than a SPIRE policy change.
What you should not do: use a single shared API key for all agents connecting to the gateway. You lose the ability to attribute tool calls to a specific team. You cannot revoke access for one team without disrupting all of them. The audit log becomes useless for incident response.
Where to host the gateway
Three options, each with real tradeoffs:
On-prem or co-located with the MCP servers. The gateway and the upstream servers live in the same network. Traffic between the gateway and the servers is on a private network. The exposed surface is one address: the gateway. This is the right answer for teams with on-prem infrastructure or air-gapped environments. Latency is low. The operational model is familiar. The drawback is that the gateway becomes a single point of failure. You need to plan for high availability from day one.
Cloud-hosted gateway, cloud-hosted servers. Everything runs in your cloud account. The gateway is a container in the same VPC as the MCP servers. No public exposure between gateway and servers. The agent's host (a developer machine, a CI runner, an IDE extension) connects to the gateway over TLS. This is the easiest model to scale. It is also the model where credential exposure risk is highest: if the cloud account is compromised, the gateway's credentials to all upstream servers are in scope.
Gateway on the developer machine, servers remote. Each developer runs a local gateway process. The gateway talks to remote MCP servers over the public internet (or a VPN). This is the laptop-native model and the right starting point for teams exploring the pattern. It does not solve the per-developer credential problem, but it does give you the audit log and allowlist enforcement without any cloud infrastructure.
For most teams: start with the developer-local model to validate the pattern, then move to cloud-hosted once you have confidence in the config format and allowlist design.
Rolling updates without breaking sessions
MCP sessions are stateful. An agent that is mid-conversation with an upstream server through your gateway has an active JSON-RPC session. If you restart the gateway to update it, you drop the session. The agent either errors or silently loses context.
Three mitigations:
Versioned configs with hot reload. The gateway watches a config file. When the file changes, it applies the new allowlists and credential references to new connections. Existing connections use the old config until they close. You update without restart. This works for allowlist and credential changes, not for gateway binary updates.
Blue-green gateway deployment. You run two gateway instances. You shift new connections to the new instance. Old connections drain on the old instance. When they close, you shut down the old instance. This is standard reverse-proxy practice. It adds operational overhead but eliminates dropped sessions.
Session resumption at the agent. Some agent runtimes support MCP session resumption: they hold enough state to reconnect and replay the current context if the gateway returns a 503. Claude Code does not currently implement this. If your agent supports it, you get a free retry layer that makes gateway restarts transparent.
The practical answer for weekend deployments: accept that gateway restarts drop active sessions, document it in your runbook, and schedule updates outside working hours. Implement blue-green when the dropped-session cost becomes real.
What gets missed in most gateway designs
Allowlists without argument validation are incomplete. You can allow the execute_sql tool and still get a DROP TABLE through it. Tool-level allowlists are the first layer. Argument-level validation is the second. A gateway that parses the SQL in a execute_sql call and rejects DDL statements is doing more work, but the work is measurable. The OWASP Top 10 for LLMs lists excessive agency as a top risk. Argument validation reduces the blast radius when an agent is manipulated into calling a tool it is allowed to call with inputs it should not supply.
Audit logs nobody checks are security theater. An HMAC-chained log is tamper-evident. It is not reviewed. Attach an alert to the log stream. If a single agent calls more than N tools in a minute, something is wrong. If a tool that has never been called in production gets called at 2am, that is worth a page. Wire the audit log to your existing SIEM or write a small alert process that watches the stream.
A broker fits naturally one hop before the gateway's credential injector. The gateway authorizes the MCP server by role; the broker authorizes the secret by grant. The gateway audit record tells you which tool ran; the broker audit record tells you which credential touched it. During incident response, correlating the two logs answers "what did session 47 actually access" with a real chain of evidence rather than inference from tool call names.
The scrubber is not a trust boundary. Treating the prompt-injection scrubber as the security boundary is the same mistake as treating a WAF as a trust boundary. The scrubber catches known patterns. It does not catch novel ones. The correct posture: the scrubber buys you time and visibility. The trust boundary is the allowlist and the credential injector, which constrain what a compromised agent can do even after a successful injection.
Upstream server updates happen on their own schedule. You control the gateway. You do not control when the GitHub MCP server publishes a new release. Set up a process that reviews upstream server changelogs before updating. OX Security found that a significant fraction of the 7,000 MCP servers they analyzed had no code-signing requirement on updates. A server you auto-update could receive a supply-chain compromise the same way an npm package can.
A minimum-viable gateway for the weekend
The fastest path to a working gateway uses an existing MCP proxy as the routing core and adds the security layers as middleware. The architecture:
A Go binary reads a YAML config at startup. The config lists upstream servers by name, their addresses, and the credential environment variables to inject. It lists per-role tool allowlists. It specifies an audit log path. The binary listens on a local port and proxies MCP traffic.
The config shape looks like this:
gateway:
listen: "127.0.0.1:9000"
audit_log: "~/.local/share/mcp-gateway/audit.jsonl"
hmac_key_env: "MCP_AUDIT_HMAC_KEY"
servers:
github:
address: "localhost:8001"
inject:
GITHUB_TOKEN: "${GITHUB_TOKEN}"
slack:
address: "localhost:8002"
inject:
SLACK_BOT_TOKEN: "${SLACK_BOT_TOKEN}"
database:
address: "localhost:8003"
inject:
DB_URL: "${DB_URL}"
roles:
analyst:
allowed_tools:
- github_list_repos
- github_get_pr
- slack_list_channels
- database_query
engineer:
allowed_tools:
- "*"
denied_tools:
- database_drop_table
- database_execute_raw_ddl
The agent connects to 127.0.0.1:9000 with a bearer token that encodes its role. The gateway validates the token, looks up the role's allowlist, injects the upstream credentials per server, appends each call to the audit log. The agent sees a flat tool namespace: github_list_repos not "call the GitHub server's list_repos tool."
For the scrubber: a simple string-match pass over the upstream response text, checking for common injection patterns. Keep a list of patterns in a YAML file so you can update it without recompiling. Log every match even if you pass it through. Patterns that trigger the scrubber but are not attacks tell you something about the quality of your scrubber.
The HMAC chain for the audit log: each record includes a prev_hash field that is the HMAC of the previous record. To verify integrity, replay the chain from the first record. Any deletion or insertion breaks the chain. The key lives in an environment variable. Rotate it quarterly and keep the old key long enough to verify historical records.
Two engineers, one weekend. That is the realistic estimate to get this to production quality. The routing core takes an afternoon. The allowlist enforcement takes a morning. The audit log takes an afternoon. The scrubber takes a morning. Deployment and testing fills the remainder.
What this means for your stack
The gateway pattern solves a category of problem: unbounded agent authority over an expanding set of back-end services. Without a gateway, every new MCP server you add expands the surface your agents can reach, adds another credential to manage, and adds another server changelog to monitor. With a gateway, you add a config block and a credential reference. The surface stays controlled.
hasp addresses the credential side of this at the local level. curl -fsSL https://gethasp.com/install.sh | sh, hasp setup, bind your project, and the gateway's upstream credentials live in an encrypted local vault instead of environment variables. Source-available (FCL-1.0), local-first, macOS and Linux, no account.
The architecture principle holds regardless of what you use for credential storage. An agent that talks to one trusted gateway, receives a scoped tool list, and generates a tamper-evident audit record is categorically more auditable than twelve agents each managing their own credentials. The pattern does not require a sophisticated implementation. It requires a decision to stop connecting agents directly to every server.
Sources· cited above, in one place
- Model Context Protocol Specification
- MCP docs Server and client implementation guides
- Anthropic Security advisories and Claude Code release notes
- OX Security AppSec research, including MCP ecosystem analysis
- Snyk Security Labs MCP prompt-injection and supply-chain research
- SPIFFE / SPIRE Workload identity
- OWASP Top 10 for Large Language Model Applications
Stop handing the agent your real keys.
hasp keeps secrets in one local encrypted vault, brokers them into the child process at exec, and never lets the agent read the value.
- Local, encrypted vault — no account, no cloud, no telemetry by default.
- Brokered run — agent gets a reference, the child process gets the value.
- Pre-commit + pre-push hooks catch managed values before they ship.
- Append-only HMAC audit log answers "did the agent touch the prod token?" in seconds.
macOS & Linux. Source-available (FCL-1.0, converts to Apache 2.0). No account.