Introduction - TrueFoundry Docs

Guardrails in the AI Gateway provide a mechanism to ensure safety, quality, and compliance by validating and transforming data at critical points in your AI workflows. This includes both LLM interactions and MCP (Model Context Protocol) tool invocations, making guardrails essential for securing agentic AI applications.

Guardrail Hooks

Guardrails can be invoked at four hooks in the AI Gateway workflow:

LLM Input

Applied before the request is sent to the LLM. Use for:

PII masking and redaction
Prompt injection detection
Content moderation
Input validation

LLM Output

Applied after the response is received from the LLM. Use for:

Hallucination detection
Secrets detection
Content filtering
Output validation

MCP Pre Tool

Applied before an MCP tool is invoked. Use for:

SQL injection prevention
Parameter validation
Permission checks
Input sanitization

MCP Post Tool

Applied after an MCP tool returns results. Use for:

Code safety validation
Secrets detection in outputs
PII redaction from results
Output sanitization

Agentic Safety: MCP hooks are critical for agentic workflows where AI models autonomously invoke external tools. Use MCP Pre Tool to validate what the agent is about to do, and MCP Post Tool to validate what the tool returned before the data is used by the model.

Operation Modes

Operation	Behavior	Execution
Validate	Checks data and blocks if rules violated	Parallel (lower latency)
Mutate	Validates AND modifies data (e.g., redact PII)	Sequential by Priority (lower runs first)

Enforcing Strategy

Strategy	On Violation	On Guardrail Error
Enforce	Block	Block
Enforce But Ignore On Error	Block	Allow (graceful degradation)
Audit	Allow (log only)	Allow

Rollout strategy: Start with Audit → verify behavior → Enforce But Ignore On Error → optionally Enforce for strict compliance.

Guardrail Execution Flow

LLM Request Flow

Diagram showing the flow of LLM requests through input and output guardrails

When an LLM request arrives at the gateway, guardrails execute in the following sequence:

LLM Input Validation Guardrail starts (asynchronous): Begins immediately but doesn’t block processing.
LLM Input Mutation Guardrail executes (synchronous): Must complete before the model request starts.
Model request starts: Proceeds with mutated messages while input validation continues in parallel.
LLM Input Validation completion: If validation fails, the model request is cancelled immediately to prevent costs.
LLM Output Mutation Guardrail: Processes the model response after input validation passes.
LLM Output Validation Guardrail: Validates the response. If it fails, the response is rejected (model costs already incurred).
Response returned: Validated and mutated response is returned to the client.

MCP Tool Invocation Flow

Diagram showing the flow of MCP tool invocations through pre and post tool guardrails

When an AI agent invokes an MCP tool, guardrails execute in the following sequence:

MCP Pre Tool Guardrails execute (synchronous): All pre-tool guardrails must pass before the tool is invoked.
Tool invocation: If pre-tool guardrails pass, the MCP tool is executed.
MCP Post Tool Guardrails execute (synchronous): All post-tool guardrails validate/mutate the tool output.
Tool result returned: The validated/mutated result is returned to the AI model.

MCP guardrails are evaluated for each tool invocation. In agentic workflows where multiple tools are called, guardrails run for each tool call independently.

Optimization Strategy

The gateway optimizes time-to-first-token latency by executing guardrail checks in parallel where possible. LLM Input validation runs concurrently with the model request, and if validation fails, the model request is immediately cancelled to avoid incurring unnecessary costs.

Execution Flow Diagrams

All Guardrails Pass

Input validation runs in parallel with model request. Output guardrails process the response before returning to client.

Gantt chart showing successful guardrail execution flow

Input Validation Failure

When input validation fails, the model request is cancelled immediately to prevent costs.

Output Validation Failure

If output validation fails after model completes, the response is rejected (costs already incurred).

Key Behaviors

Hook	Execution	On Failure
LLM Input Validation	Async (parallel with model request)	Model request cancelled
LLM Input Mutation	Sync (before model request)	Request blocked
LLM Output Mutation	Sync (after model response)	Response blocked
LLM Output Validation	Sync (after output mutation)	Response rejected
MCP Pre Tool	Sync (before tool invocation)	Tool not invoked
MCP Post Tool	Sync (after tool returns)	Result not passed to model

Controlling Guardrails Scope

By default, guardrails evaluate all messages in a conversation. You can control this behavior using the X-TFY-GUARDRAILS-SCOPE header:

all (default): Evaluates all messages in the conversation history
last: Evaluates only the most recent message

Example:

curl -X POST "https://{controlPlaneURL}/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H 'X-TFY-GUARDRAILS: {"llm_input_guardrails":["my-group/pii-redaction"],"llm_output_guardrails":["my-group/secrets-detection"]}' \
  -H "X-TFY-GUARDRAILS-SCOPE: last" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello, how can you help me today?"}
    ]
  }'

Use last for better performance when you only need to validate the latest message. Use all when you need to check the entire conversation context.

Per-Request Guardrails Header

The X-TFY-GUARDRAILS header accepts JSON with the following fields:

Field	Description
`llm_input_guardrails`	Array of guardrail selectors for LLM Input hook
`llm_output_guardrails`	Array of guardrail selectors for LLM Output hook
`mcp_tool_pre_invoke_guardrails`	Array of guardrail selectors for MCP Pre Tool hook
`mcp_tool_post_invoke_guardrails`	Array of guardrail selectors for MCP Post Tool hook

For backward compatibility, input_guardrails maps to llm_input_guardrails and output_guardrails maps to llm_output_guardrails.

Monitoring Guardrails

View guardrail execution in AI Gateway → Monitor → Request Traces:

Which hooks were evaluated and guardrails triggered
Pass/fail status and latency per guardrail
Detailed findings (secrets detected, SQL issues, unsafe patterns)
Mutations applied by mutate-mode guardrails

Traces are available for both successful and blocked requests for full auditability.

Error Handling

If a guardrail service experiences an error (API timeout, 5xx errors, network issues), the gateway continues processing your request by default. This ensures your application remains available even if a guardrail provider has issues.

Guardrail API errors do not block requests. Your LLM calls will complete successfully even if guardrail checks fail to execute.

Guardrail Integrations

TrueFoundry AI Gateway provides both built-in guardrails and integrations with popular external guardrail providers, giving you a unified interface for guardrail management and configuration.

TrueFoundry Guardrails

TrueFoundry provides built-in guardrails that require no external credentials or setup. These guardrails are fully managed by TrueFoundry and are designed for common security, compliance, and content safety use cases.

Secrets Detection

Detect and redact sensitive credentials like AWS keys, API keys, JWT tokens, and private keys in LLM inputs and outputs.

Code Safety Linter

Detect unsafe code patterns in tool outputs including eval, exec, os.system, subprocess calls, and dangerous shell commands.

SQL Sanitizer

Detect and sanitize risky SQL patterns like DROP, TRUNCATE, DELETE/UPDATE without WHERE, and string interpolation.

Regex Pattern Matching

Detect and redact sensitive patterns using preset regex patterns across PII, payment cards, credentials, and more — with custom pattern support.

Prompt Injection

Detect and block prompt injection attacks and jailbreak attempts in LLM inputs using model-based analysis.

PII Detection

Detect and redact personally identifiable information using model-based named entity recognition with configurable entity categories.

Content Moderation

Detect and block harmful content across hate, self-harm, sexual, and violence categories with configurable severity thresholds.

Cedar Guardrails

Implement fine-grained access control policies for MCP tool invocations using the Cedar policy language with default deny security.

OPA Guardrails

Implement fine-grained access control policies with full lifecycle management using Open Policy Agent (OPA).

External Providers

TrueFoundry also integrates with popular external guardrail providers for additional capabilities. In case you don’t see the provider you are looking for, please reach out to us and we will be happy to add the integration.

OpenAI Moderations

Integrate with OpenAI’s moderation API to detect and handle content that may violate usage policies, like violence, hate speech, or harassment.

AWS Bedrock Guardrail

Integrate with AWS Bedrock’s capabilities to apply guardrails on AI models.

Azure PII

Integrate with Azure’s PII detection service to identify and redact PII data in both requests and responses.

Azure Content Safety

Leverage Azure Content Safety to detect and mitigate harmful, unsafe, or inappropriate content in model inputs and outputs.

Azure Prompt Shield

Integrate with Azure Prompt Shield to detect and block prompt injection and jailbreak attempts using your own Azure credentials.

Enkrypt AI

Integrate with Enkrypt AI for advanced moderation and compliance, detecting risks like toxicity, bias, and sensitive data exposure.

Palo Alto Prisma AIRS

Integrate with Palo Alto AI Risk to detect and mitigate harmful, unsafe, or inappropriate content in model inputs and outputs.

PromptFoo

Integrate with Promptfoo to apply guardrails like content moderation on the models.

Fiddler

Integrate with Fiddler to apply guardrails, such as Fiddler-Safety and Fiddler-Response-Faithfulness on the models.

Pangea

Integrate with Pangea for API-based security services providing real-time content moderation, prompt injection detection, and toxicity analysis.

Patronus AI

Integrate with Patronus AI to detect hallucinations, prompt injection, PII leakage, toxicity, and bias with production-ready evaluators.

Bring Your Own Guardrail

Integrate your own custom guardrail using frameworks like Guardrails.AI or a python function.

Configure Guardrails

Guardrails can be configured at two levels:

Per-Request Configuration

Use the X-TFY-GUARDRAILS header to apply guardrails to individual requests. This is useful for testing or when different requests need different guardrails.

Gateway-Level Configuration

Create guardrail rules in AI Gateway → Controls → Guardrails to automatically apply guardrails based on:

Users, teams, or virtual accounts making the request
Models being called
Request metadata (environment, application, etc.)
MCP servers and tools being invoked

Gateway-level configuration supports all four hooks (LLM Input, LLM Output, MCP Pre Tool, MCP Post Tool) with flexible when conditions for precise targeting. Read more in the Configure Guardrails section.

Get Started

LLM Gateway

MCP Registry and Gateway

Agent Hub

Guardrails and Security

Prompt Management

Observability

Deployment

Admin Guide

API Reference

Chat

Agent

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Moderations

Models

​Guardrail Hooks

LLM Input

LLM Output

MCP Pre Tool

MCP Post Tool

​Operation Modes

​Enforcing Strategy

​Guardrail Execution Flow

​LLM Request Flow

​MCP Tool Invocation Flow

​Optimization Strategy

​Execution Flow Diagrams

​Key Behaviors

​Controlling Guardrails Scope

​Per-Request Guardrails Header

​Monitoring Guardrails

​Error Handling

​Guardrail Integrations

​TrueFoundry Guardrails

Secrets Detection

Code Safety Linter

SQL Sanitizer

Regex Pattern Matching

Prompt Injection

PII Detection

Content Moderation

Cedar Guardrails

OPA Guardrails

​External Providers

OpenAI Moderations

AWS Bedrock Guardrail

Azure PII

Azure Content Safety

Azure Prompt Shield

Enkrypt AI

Palo Alto Prisma AIRS

PromptFoo

Fiddler

Pangea

Patronus AI

Bring Your Own Guardrail

​Configure Guardrails

​Per-Request Configuration

​Gateway-Level Configuration

Guardrail Hooks

Operation Modes

Enforcing Strategy

Guardrail Execution Flow

LLM Request Flow

MCP Tool Invocation Flow

Optimization Strategy

Execution Flow Diagrams

Key Behaviors

Controlling Guardrails Scope

Per-Request Guardrails Header

Monitoring Guardrails

Error Handling

Guardrail Integrations

TrueFoundry Guardrails

External Providers

Configure Guardrails

Per-Request Configuration

Gateway-Level Configuration