Guardrail Hooks
Guardrails can be invoked at four hooks in the AI Gateway workflow:LLM Input
Applied before the request is sent to the LLM. Use for:
- PII masking and redaction
- Prompt injection detection
- Content moderation
- Input validation
LLM Output
Applied after the response is received from the LLM. Use for:
- Hallucination detection
- Secrets detection
- Content filtering
- Output validation
MCP Pre Tool
Applied before an MCP tool is invoked. Use for:
- SQL injection prevention
- Parameter validation
- Permission checks
- Input sanitization
MCP Post Tool
Applied after an MCP tool returns results. Use for:
- Code safety validation
- Secrets detection in outputs
- PII redaction from results
- Output sanitization
Operation Modes
| Operation | Behavior | Execution |
|---|---|---|
| Validate | Checks data and blocks if rules violated | Parallel (lower latency) |
| Mutate | Validates AND modifies data (e.g., redact PII) | Sequential by Priority (lower runs first) |
Enforcing Strategy
| Strategy | On Violation | On Guardrail Error |
|---|---|---|
| Enforce | Block | Block |
| Enforce But Ignore On Error | Block | Allow (graceful degradation) |
| Audit | Allow (log only) | Allow |
Guardrail Execution Flow
LLM Request Flow

- LLM Input Validation Guardrail starts (asynchronous): Begins immediately but doesn’t block processing.
- LLM Input Mutation Guardrail executes (synchronous): Must complete before the model request starts.
- Model request starts: Proceeds with mutated messages while input validation continues in parallel.
- LLM Input Validation completion: If validation fails, the model request is cancelled immediately to prevent costs.
- LLM Output Mutation Guardrail: Processes the model response after input validation passes.
- LLM Output Validation Guardrail: Validates the response. If it fails, the response is rejected (model costs already incurred).
- Response returned: Validated and mutated response is returned to the client.
MCP Tool Invocation Flow
When an AI agent invokes an MCP tool, guardrails execute in the following sequence:- MCP Pre Tool Guardrails execute (synchronous): All pre-tool guardrails must pass before the tool is invoked.
- Tool invocation: If pre-tool guardrails pass, the MCP tool is executed.
- MCP Post Tool Guardrails execute (synchronous): All post-tool guardrails validate/mutate the tool output.
- Tool result returned: The validated/mutated result is returned to the AI model.
MCP guardrails are evaluated for each tool invocation. In agentic workflows where multiple tools are called, guardrails run for each tool call independently.
Optimization Strategy
The gateway optimizes time-to-first-token latency by executing guardrail checks in parallel where possible. LLM Input validation runs concurrently with the model request, and if validation fails, the model request is immediately cancelled to avoid incurring unnecessary costs.Execution Flow Diagrams
All Guardrails Pass
All Guardrails Pass
Input validation runs in parallel with model request. Output guardrails process the response before returning to client.

Input Validation Failure
Input Validation Failure
When input validation fails, the model request is cancelled immediately to prevent costs.

Output Validation Failure
Output Validation Failure
If output validation fails after model completes, the response is rejected (costs already incurred).

Key Behaviors
| Hook | Execution | On Failure |
|---|---|---|
| LLM Input Validation | Async (parallel with model request) | Model request cancelled |
| LLM Input Mutation | Sync (before model request) | Request blocked |
| LLM Output Mutation | Sync (after model response) | Response blocked |
| LLM Output Validation | Sync (after output mutation) | Response rejected |
| MCP Pre Tool | Sync (before tool invocation) | Tool not invoked |
| MCP Post Tool | Sync (after tool returns) | Result not passed to model |
Controlling Guardrails Scope
By default, guardrails evaluate all messages in a conversation. You can control this behavior using theX-TFY-GUARDRAILS-SCOPE header:
all(default): Evaluates all messages in the conversation historylast: Evaluates only the most recent message
Per-Request Guardrails Header
TheX-TFY-GUARDRAILS header accepts JSON with the following fields:
| Field | Description |
|---|---|
llm_input_guardrails | Array of guardrail selectors for LLM Input hook |
llm_output_guardrails | Array of guardrail selectors for LLM Output hook |
mcp_tool_pre_invoke_guardrails | Array of guardrail selectors for MCP Pre Tool hook |
mcp_tool_post_invoke_guardrails | Array of guardrail selectors for MCP Post Tool hook |
For backward compatibility,
input_guardrails maps to llm_input_guardrails and output_guardrails maps to llm_output_guardrails.Monitoring Guardrails
View guardrail execution in AI Gateway → Monitor → Request Traces:- Which hooks were evaluated and guardrails triggered
- Pass/fail status and latency per guardrail
- Detailed findings (secrets detected, SQL issues, unsafe patterns)
- Mutations applied by mutate-mode guardrails
Error Handling
If a guardrail service experiences an error (API timeout, 5xx errors, network issues), the gateway continues processing your request by default. This ensures your application remains available even if a guardrail provider has issues.Guardrail API errors do not block requests. Your LLM calls will complete successfully even if guardrail checks fail to execute.
Guardrail Integrations
TrueFoundry AI Gateway provides both built-in guardrails and integrations with popular external guardrail providers, giving you a unified interface for guardrail management and configuration.TrueFoundry Guardrails
TrueFoundry provides built-in guardrails that require no external credentials or setup. These guardrails are fully managed by TrueFoundry and are designed for common security, compliance, and content safety use cases.Secrets Detection
Detect and redact sensitive credentials like AWS keys, API keys, JWT tokens, and private keys in LLM inputs and outputs.
Code Safety Linter
Detect unsafe code patterns in tool outputs including eval, exec, os.system, subprocess calls, and dangerous shell commands.
SQL Sanitizer
Detect and sanitize risky SQL patterns like DROP, TRUNCATE, DELETE/UPDATE without WHERE, and string interpolation.
Regex Pattern Matching
Detect and redact sensitive patterns using preset regex patterns across PII, payment cards, credentials, and more — with custom pattern support.
Prompt Injection
Detect and block prompt injection attacks and jailbreak attempts in LLM inputs using model-based analysis.
PII Detection
Detect and redact personally identifiable information using model-based named entity recognition with configurable entity categories.
Content Moderation
Detect and block harmful content across hate, self-harm, sexual, and violence categories with configurable severity thresholds.
Cedar Guardrails
Implement fine-grained access control policies for MCP tool invocations using the Cedar policy language with default deny security.
External Providers
TrueFoundry also integrates with popular external guardrail providers for additional capabilities. In case you don’t see the provider you are looking for, please reach out to us and we will be happy to add the integration.OpenAI Moderations
Integrate with OpenAI’s moderation API to detect and handle content that may violate usage policies, like violence, hate speech, or harassment.
AWS Bedrock Guardrail
Integrate with AWS Bedrock’s capabilities to apply guardrails on AI models.
Azure PII
Integrate with Azure’s PII detection service to identify and redact PII data in both requests and responses.
Azure Content Safety
Leverage Azure Content Safety to detect and mitigate harmful, unsafe, or inappropriate content in model inputs and outputs.
Azure Prompt Shield
Integrate with Azure Prompt Shield to detect and block prompt injection and jailbreak attempts using your own Azure credentials.
Enkrypt AI
Integrate with Enkrypt AI for advanced moderation and compliance, detecting risks like toxicity, bias, and sensitive data exposure.
Palo Alto Prisma AIRS
Integrate with Palo Alto AI Risk to detect and mitigate harmful, unsafe, or inappropriate content in model inputs and outputs.
PromptFoo
Integrate with Promptfoo to apply guardrails like content moderation on the models.
Fiddler
Integrate with Fiddler to apply guardrails, such as Fiddler-Safety and Fiddler-Response-Faithfulness on the models.
Pangea
Integrate with Pangea for API-based security services providing real-time content moderation, prompt injection detection, and toxicity analysis.
Patronus AI
Integrate with Patronus AI to detect hallucinations, prompt injection, PII leakage, toxicity, and bias with production-ready evaluators.
Bring Your Own Guardrail
Integrate your own custom guardrail using frameworks like Guardrails.AI or a python function.
Configure Guardrails
Guardrails can be configured at two levels:Per-Request Configuration
Use theX-TFY-GUARDRAILS header to apply guardrails to individual requests. This is useful for testing or when different requests need different guardrails.
Gateway-Level Configuration
Create guardrail rules in AI Gateway → Controls → Guardrails to automatically apply guardrails based on:- Users, teams, or virtual accounts making the request
- Models being called
- Request metadata (environment, application, etc.)
- MCP servers and tools being invoked
when conditions for precise targeting.
Read more in the Configure Guardrails section.