Why Guardrails?
Once AI applications go to production, they handle real user data and — in the case of agents — call external tools on their own. Things can go wrong fast:- A customer support chatbot leaks a user’s credit card number because PII wasn’t stripped from the context.
- A coding agent runs
rm -rf /through an MCP tool after hallucinating a shell command — and nothing stopped it. - A healthcare assistant makes up drug dosage numbers. The response reaches the patient unchecked.
- An internal Q&A bot gets jailbroken through prompt injection, leaking confidential company data.
How a TrueFoundry Guardrail Works
Each guardrail has two settings you configure: what it does with the data, and how strictly it enforces its decisions.Operation Mode
| Mode | Behavior | Execution |
|---|---|---|
| Validate | Looks at the data, blocks the request if something is wrong. Doesn’t touch the data itself. E.g., a content moderation guardrail sees hate speech in the prompt and blocks the request outright. | Runs in parallel (faster) |
| Mutate | Looks at the data and rewrites it. Can also block. E.g., a PII guardrail rewrites ”My SSN is 123-45-6789”to”My SSN is REDACTED” and lets the request through. | Runs sequentially by priority (lower = first) |
Enforcement Strategy
This decides what happens when a guardrail catches a violation — and also what happens if the guardrail itself has a problem (like a timeout or a provider outage).| Strategy | On Violation | On Guardrail Error |
|---|---|---|
| Enforce | Block | Block |
| Enforce But Ignore On Error | Block | Let through (graceful degradation) |
| Audit | Let through (log only) | Let through |
How Truefoundry AI Gateway Runs Guardrails
Where guardrails run depends on whether you’re making an LLM call or invoking an MCP tool.- LLM Requests
- MCP Tool Invocations
LLM requests have two hooks — Input (before the model sees the prompt) and Output (after the model responds):
Here’s the order of operations when a request hits the gateway:
LLM Input
Runs before the prompt reaches the LLM:
- PII masking and redaction
- Prompt injection detection
- Content moderation
LLM Output
Runs after the LLM responds:
- Hallucination detection
- Secrets detection
- Content filtering

- Input Mutation guardrails run first and block until they finish (e.g., redacting PII from the prompt).
- Input Validation kicks off in the background — it checks for things like prompt injection while the model request is already in flight.
- The model request starts with the mutated prompt.
- If Input Validation fails while the model is still running, the gateway cancels the model request right away so you don’t pay for it.
- Once the model responds, Output Mutation guardrails process the response (e.g., stripping secrets).
- Output Validation checks the final result. If it fails, the response is blocked — though model costs have already been incurred at this point.
- The clean response goes back to the client.
| Hook | Execution | What Happens on Failure |
|---|---|---|
| Input Validation | Async (parallel with model request) | Model request cancelled |
| Input Mutation | Sync (before model request) | Request blocked |
| Output Mutation | Sync (after model response) | Response blocked |
| Output Validation | Sync (after output mutation) | Response rejected |
Latency Impact of Guardrails
Guardrails add processing time — but the gateway is designed to keep that impact small.- LLM Requests
- MCP Tool Invocations
- Input Validation runs in parallel with the model request, so in the happy path, it adds no extra wait time before you see the first token.
- Input Mutation runs before the model request, so its processing time is added directly.
- When Input Validation fails, the model request gets cancelled immediately — you don’t pay for a response you were going to throw away.
All Guardrails Pass
All Guardrails Pass
Input validation runs in parallel with the model request. Output guardrails process the response before it’s returned.
Input Validation Failure
Input Validation Failure
Input validation fails while the model is running — the model request gets cancelled immediately to save costs.
Output Validation Failure
Output Validation Failure
The model finishes, but output validation fails — the response is rejected. Model costs are already incurred at this point.

How to Apply Guardrails
You can attach guardrails in two ways:Per-Request via Headers
Per-Request via Headers
Pass the
X-TFY-GUARDRAILS header to apply guardrails on a single request. Handy for testing or when different requests need different guardrails.Gateway-Level via Policies
Gateway-Level via Policies
Set up guardrail rules in AI Gateway → Controls → Guardrails to apply guardrails automatically based on who’s making the request, which model they’re calling, or which MCP tool is being used. This is the way to go for org-wide enforcement.For a step-by-step walkthrough, see the Getting Started guide. For the full policy reference, see Guardrails Configuration.
Supported Guardrails
The AI Gateway ships with built-in guardrails and integrates with a range of external providers — all managed through a single interface.TrueFoundry Built-in Guardrails
Ready to use out of the box — no external credentials needed.Secrets Detection
Catches and redacts credentials like AWS keys, API keys, JWT tokens, and private keys.
Code Safety Linter
Flags unsafe code patterns — eval, exec, os.system, subprocess calls, dangerous shell commands.
SQL Sanitizer
Catches risky SQL: DROP, TRUNCATE, DELETE/UPDATE without WHERE, string interpolation.
Regex Pattern Matching
Matches and redacts sensitive patterns (PII, payment cards, credentials) using built-in or custom regex.
Prompt Injection
Detects prompt injection attacks and jailbreak attempts using model-based analysis.
PII Detection
Finds and redacts personally identifiable information with configurable entity categories.
Content Moderation
Blocks harmful content across hate, self-harm, sexual, and violence categories with adjustable thresholds.
Cedar Guardrails
Fine-grained access control for MCP tools using Cedar policy language with default-deny security.
OPA Guardrails
Fine-grained access control with full policy lifecycle management using Open Policy Agent.
External Providers
We also integrate with third-party guardrail providers. Don’t see yours? Reach out — we’re happy to add it.OpenAI Moderations
OpenAI’s moderation API for detecting violence, hate speech, harassment, and other policy violations.
AWS Bedrock Guardrail
AWS Bedrock’s guardrail capabilities for AI models.
Azure PII
Azure’s PII detection service for identifying and redacting personal data.
Azure Content Safety
Azure Content Safety for detecting harmful or inappropriate content.
Azure Prompt Shield
Azure Prompt Shield for blocking prompt injection and jailbreak attempts.
Enkrypt AI
Advanced moderation and compliance — toxicity, bias, and sensitive data detection.
Palo Alto Prisma AIRS
Palo Alto AI Risk for content safety and threat detection.
PromptFoo
Promptfoo integration for content moderation guardrails.
Fiddler
Fiddler-Safety and Fiddler-Response-Faithfulness guardrails.
CrowdStrike
API-based security — content moderation, prompt injection detection, toxicity analysis. (Formerly Pangea, acquired by CrowdStrike in 2025.)
Patronus AI
Hallucination detection, prompt injection, PII leakage, toxicity, and bias evaluators.
Google Model Armor
Google Cloud Model Armor for prompt injection, harmful content, PII, and malicious URI detection.
GraySwan Cygnal
Policy violation detection and content safety monitoring powered by GraySwan Cygnal.
Akto
LLM security, prompt injection detection, and policy violation monitoring with native streaming support.
Bring Your Own Guardrail / Plugin
If the built-in and provider integrations don’t cover your use case, you can write your own. Build a custom guardrail with Guardrails.AI, a plain Python function, or any framework you prefer, and plug it into the gateway.Custom Guardrails
Build and integrate your own guardrail using Guardrails.AI, a Python function, or any custom logic.
FAQ
How do I control which messages guardrails evaluate?
How do I control which messages guardrails evaluate?
By default, guardrails look at all messages in the conversation. If you only care about the latest message, set the
X-TFY-GUARDRAILS-SCOPE header:all(default): Checks the full conversation historylast: Checks only the most recent message
last is faster when you don’t need to scan the whole conversation.What happens if a guardrail service is down?
What happens if a guardrail service is down?
Depends on the enforcement strategy you picked:
- Enforce: Request gets blocked.
- Enforce But Ignore On Error: Request goes through anyway. This is the safest default — you stay protected when guardrails work, but a provider outage won’t break your app.
- Audit: Request always goes through.
How can I monitor guardrail execution?
How can I monitor guardrail execution?
Go to AI Gateway → Monitor → Request Traces. You’ll see:
- Which guardrails ran on each hook
- Whether they passed or failed, and how long they took
- What they found (secrets, SQL issues, unsafe patterns, etc.)
- What mutations were applied
How do output guardrails work with streaming responses?
How do output guardrails work with streaming responses?
Streaming and output guardrails are fundamentally at odds — guardrails need the complete response to evaluate it, but streaming sends tokens to the client as they’re generated. The gateway handles this by buffering the entire response, running output guardrails on the full text, and only then streaming it back to the client if the guardrails pass. This means you lose the time-to-first-token benefit of streaming when output guardrails are active.There are alternative strategies like running guardrails on batches of tokens as they arrive, but these only work for guardrails that operate on local patterns — for example, regex matching, PII detection, or secrets scanning can catch issues in a chunk without seeing the full response. Guardrails that need the complete output as context — such as hallucination detection, factual consistency checks, or content moderation that evaluates the overall message — cannot work incrementally and still require the full response to be buffered.In practice:
If low latency matters more than full output coverage, consider using only input guardrails with streaming, and running output guardrails asynchronously for audit purposes.
| Approach | Time to First Token | Works For |
|---|---|---|
| Full buffering (default) | Delayed until guardrails complete | All guardrails — hallucination, moderation, factual checks, etc. |
| Chunked evaluation | Near-normal streaming | Pattern-based guardrails only — regex, PII, secrets scanning |
