Skip to main content
Guardrails in the AI Gateway provide a mechanism to ensure safety, quality, and compliance by validating and transforming data at critical points in your AI workflows. This includes both LLM interactions and MCP (Model Context Protocol) tool invocations, making guardrails essential for securing agentic AI applications.

Guardrail Hooks

Guardrails can be invoked at four hooks in the AI Gateway workflow:

LLM Input

Applied before the request is sent to the LLM. Use for:
  • PII masking and redaction
  • Prompt injection detection
  • Content moderation
  • Input validation

LLM Output

Applied after the response is received from the LLM. Use for:
  • Hallucination detection
  • Secrets detection
  • Content filtering
  • Output validation

MCP Pre Tool

Applied before an MCP tool is invoked. Use for:
  • SQL injection prevention
  • Parameter validation
  • Permission checks
  • Input sanitization

MCP Post Tool

Applied after an MCP tool returns results. Use for:
  • Code safety validation
  • Secrets detection in outputs
  • PII redaction from results
  • Output sanitization
Agentic Safety: MCP hooks are critical for agentic workflows where AI models autonomously invoke external tools. Use MCP Pre Tool to validate what the agent is about to do, and MCP Post Tool to validate what the tool returned before the data is used by the model.

Operation Modes

OperationBehaviorExecution
ValidateChecks data and blocks if rules violatedParallel (lower latency)
MutateValidates AND modifies data (e.g., redact PII)Sequential by Priority (lower runs first)

Enforcing Strategy

StrategyOn ViolationOn Guardrail Error
EnforceBlockBlock
Enforce But Ignore On ErrorBlockAllow (graceful degradation)
AuditAllow (log only)Allow
Rollout strategy: Start with Audit → verify behavior → Enforce But Ignore On Error → optionally Enforce for strict compliance.

Guardrail Execution Flow

LLM Request Flow

Diagram showing the flow of LLM requests through input and output guardrails
When an LLM request arrives at the gateway, guardrails execute in the following sequence:
  1. LLM Input Validation Guardrail starts (asynchronous): Begins immediately but doesn’t block processing.
  2. LLM Input Mutation Guardrail executes (synchronous): Must complete before the model request starts.
  3. Model request starts: Proceeds with mutated messages while input validation continues in parallel.
  4. LLM Input Validation completion: If validation fails, the model request is cancelled immediately to prevent costs.
  5. LLM Output Mutation Guardrail: Processes the model response after input validation passes.
  6. LLM Output Validation Guardrail: Validates the response. If it fails, the response is rejected (model costs already incurred).
  7. Response returned: Validated and mutated response is returned to the client.

MCP Tool Invocation Flow

When an AI agent invokes an MCP tool, guardrails execute in the following sequence:
  1. MCP Pre Tool Guardrails execute (synchronous): All pre-tool guardrails must pass before the tool is invoked.
  2. Tool invocation: If pre-tool guardrails pass, the MCP tool is executed.
  3. MCP Post Tool Guardrails execute (synchronous): All post-tool guardrails validate/mutate the tool output.
  4. Tool result returned: The validated/mutated result is returned to the AI model.
MCP guardrails are evaluated for each tool invocation. In agentic workflows where multiple tools are called, guardrails run for each tool call independently.

Optimization Strategy

The gateway optimizes time-to-first-token latency by executing guardrail checks in parallel where possible. LLM Input validation runs concurrently with the model request, and if validation fails, the model request is immediately cancelled to avoid incurring unnecessary costs.

Execution Flow Diagrams

Input validation runs in parallel with model request. Output guardrails process the response before returning to client.Gantt chart showing successful guardrail execution flow
When input validation fails, the model request is cancelled immediately to prevent costs.Gantt chart showing input validation failure
If output validation fails after model completes, the response is rejected (costs already incurred).Gantt chart showing output validation failure

Key Behaviors

HookExecutionOn Failure
LLM Input ValidationAsync (parallel with model request)Model request cancelled
LLM Input MutationSync (before model request)Request blocked
LLM Output MutationSync (after model response)Response blocked
LLM Output ValidationSync (after output mutation)Response rejected
MCP Pre ToolSync (before tool invocation)Tool not invoked
MCP Post ToolSync (after tool returns)Result not passed to model

Controlling Guardrails Scope

By default, guardrails evaluate all messages in a conversation. You can control this behavior using the X-TFY-GUARDRAILS-SCOPE header:
  • all (default): Evaluates all messages in the conversation history
  • last: Evaluates only the most recent message
Example:
curl -X POST "https://{controlPlaneURL}/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H 'X-TFY-GUARDRAILS: {"llm_input_guardrails":["my-group/pii-redaction"],"llm_output_guardrails":["my-group/secrets-detection"]}' \
  -H "X-TFY-GUARDRAILS-SCOPE: last" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello, how can you help me today?"}
    ]
  }'
Use last for better performance when you only need to validate the latest message. Use all when you need to check the entire conversation context.

Per-Request Guardrails Header

The X-TFY-GUARDRAILS header accepts JSON with the following fields:
FieldDescription
llm_input_guardrailsArray of guardrail selectors for LLM Input hook
llm_output_guardrailsArray of guardrail selectors for LLM Output hook
mcp_tool_pre_invoke_guardrailsArray of guardrail selectors for MCP Pre Tool hook
mcp_tool_post_invoke_guardrailsArray of guardrail selectors for MCP Post Tool hook
For backward compatibility, input_guardrails maps to llm_input_guardrails and output_guardrails maps to llm_output_guardrails.

Monitoring Guardrails

View guardrail execution in AI Gateway → Monitor → Request Traces:
  • Which hooks were evaluated and guardrails triggered
  • Pass/fail status and latency per guardrail
  • Detailed findings (secrets detected, SQL issues, unsafe patterns)
  • Mutations applied by mutate-mode guardrails
Traces are available for both successful and blocked requests for full auditability.

Error Handling

If a guardrail service experiences an error (API timeout, 5xx errors, network issues), the gateway continues processing your request by default. This ensures your application remains available even if a guardrail provider has issues.
Guardrail API errors do not block requests. Your LLM calls will complete successfully even if guardrail checks fail to execute.

Guardrail Integrations

TrueFoundry AI Gateway provides both built-in guardrails and integrations with popular external guardrail providers, giving you a unified interface for guardrail management and configuration.

TrueFoundry Guardrails

TrueFoundry provides built-in guardrails that require no external credentials or setup. These guardrails are fully managed by TrueFoundry and are designed for common security, compliance, and content safety use cases.

External Providers

TrueFoundry also integrates with popular external guardrail providers for additional capabilities. In case you don’t see the provider you are looking for, please reach out to us and we will be happy to add the integration.

OpenAI Moderations

Integrate with OpenAI’s moderation API to detect and handle content that may violate usage policies, like violence, hate speech, or harassment.

AWS Bedrock Guardrail

Integrate with AWS Bedrock’s capabilities to apply guardrails on AI models.

Azure PII

Integrate with Azure’s PII detection service to identify and redact PII data in both requests and responses.

Azure Content Safety

Leverage Azure Content Safety to detect and mitigate harmful, unsafe, or inappropriate content in model inputs and outputs.

Azure Prompt Shield

Integrate with Azure Prompt Shield to detect and block prompt injection and jailbreak attempts using your own Azure credentials.

Enkrypt AI

Integrate with Enkrypt AI for advanced moderation and compliance, detecting risks like toxicity, bias, and sensitive data exposure.

Palo Alto Prisma AIRS

Integrate with Palo Alto AI Risk to detect and mitigate harmful, unsafe, or inappropriate content in model inputs and outputs.

PromptFoo

Integrate with Promptfoo to apply guardrails like content moderation on the models.

Fiddler

Integrate with Fiddler to apply guardrails, such as Fiddler-Safety and Fiddler-Response-Faithfulness on the models.

Pangea

Integrate with Pangea for API-based security services providing real-time content moderation, prompt injection detection, and toxicity analysis.

Patronus AI

Integrate with Patronus AI to detect hallucinations, prompt injection, PII leakage, toxicity, and bias with production-ready evaluators.

Bring Your Own Guardrail

Integrate your own custom guardrail using frameworks like Guardrails.AI or a python function.

Configure Guardrails

Guardrails can be configured at two levels:

Per-Request Configuration

Use the X-TFY-GUARDRAILS header to apply guardrails to individual requests. This is useful for testing or when different requests need different guardrails.

Gateway-Level Configuration

Create guardrail rules in AI Gateway → Controls → Guardrails to automatically apply guardrails based on:
  • Users, teams, or virtual accounts making the request
  • Models being called
  • Request metadata (environment, application, etc.)
  • MCP servers and tools being invoked
Gateway-level configuration supports all four hooks (LLM Input, LLM Output, MCP Pre Tool, MCP Post Tool) with flexible when conditions for precise targeting. Read more in the Configure Guardrails section.