Skip to main content

Guardrail Hooks

Guardrails can be invoked at four different points (hooks) in the AI Gateway workflow:
HookWhen It ExecutesCommon Use Cases
LLM InputBefore the request is sent to the LLMPII redaction, prompt injection detection, content moderation
LLM OutputAfter the response is received from the LLMHallucination detection, content filtering, secrets detection
MCP Pre ToolBefore an MCP tool is invokedValidate tool parameters, check permissions, sanitize inputs
MCP Post ToolAfter an MCP tool returns resultsValidate tool outputs, detect unsafe code/SQL, redact sensitive data
MCP hooks are particularly valuable for agentic workflows where AI models invoke external tools. Use MCP Pre Tool to validate what the agent is about to do, and MCP Post Tool to validate what the tool returned before it’s used by the model.

Guardrail Rules UI

Navigate to AI Gateway → Controls → Guardrails to view and manage guardrail rules. Click Add Rule to create a new rule or edit existing ones.

Rule ID database-safety-rule
WHEN REQUEST GOES TO + Add Targets
MCP ServersINdatabase-tools analytics-db

FROM SUBJECTS
ANDIN👤 Alice Chen 👤 Bob Smith
NOT IN👤 DB Admin

APPLY ON HOOKS + Add Hook
HookGuardrail
MCP Tool Pre-Invokemy-guardrails/sql-sanitizer
MCP Tool Post-Invokemy-guardrails/secrets-detection

Create Guardrail Config

Create rules via the UI above (AI Gateway → Controls → Guardrails), or use YAML configuration via the Config tab.
TrueFoundry YAML editor for creating guardrail configuration

Configuration Structure

The guardrails configuration contains an array of rules that are evaluated for each request. Only the first matching guardrail rule is applied to that request. Each rule can specify guardrails for any of the four hooks.

Example Configuration

name: guardrails-control
type: gateway-guardrails-config
rules:
  - id: demo-guardrail
    when:
      subjects:
        operator: and
        conditions:
          in:
            - user:john@example.com
    llm_input_guardrails:
      - my-guardrail-group/openai-moderation
    llm_output_guardrails:
      - my-guardrail-group/secrets-detection
    mcp_tool_pre_invoke_guardrails: []
    mcp_tool_post_invoke_guardrails: []

Configuration Reference

Rule Structure

FieldRequiredDescription
idYesUnique identifier for the rule
whenYesMatching criteria with target and subjects blocks
llm_input_guardrailsYesGuardrails applied before LLM request (use [] if none)
llm_output_guardrailsYesGuardrails applied after LLM response (use [] if none)
mcp_tool_pre_invoke_guardrailsYesGuardrails applied before MCP tool invocation (use [] if none)
mcp_tool_post_invoke_guardrailsYesGuardrails applied after MCP tool returns (use [] if none)

The when Block

The when block contains two main sections: target (what the request targets) and subjects (who is making the request):
SectionDescription
targetDefines conditions based on mcpServers, models, mcpTools, or metadata
subjectsDefines conditions based on users, teams, or virtual accounts
If when is empty ({}), the rule matches all requests. Use this for fallback/default rules at the end of your rules list.

The when Block Structure

when:
  target:
    operator: or
    conditions:
      mcpServers:
        values:
          - database-tools
          - code-executor
        condition: in
when:
  target:
    operator: or
    conditions:
      models:
        values:
          - openai/gpt-4o
          - anthropic/claude-3-5-sonnet
        condition: in
when:
  target:
    operator: or
    conditions:
      metadata:
        environment: production
        tier: enterprise
Requires header: X-TFY-METADATA: {"environment": "production", "tier": "enterprise"}
when:
  target:
    operator: or
    conditions:
      mcpServers:
        values:
          - database-tools
        condition: in
      mcpTools:
        values:
          - execute_query
        condition: in
when:
  subjects:
    operator: and
    conditions:
      in:
        - user:alice@company.com
        - user:bob@company.com
        - team:data-science
      not_in:
        - user:guest@company.com
when:
  target:
    operator: or
    conditions:
      mcpServers:
        values:
          - database-tools
        condition: in
      metadata:
        environment: production
  subjects:
    operator: and
    conditions:
      in:
        - team:engineering
      not_in:
        - user:external@partner.com
Both target and subjects conditions must match for the rule to apply.
How it works:
  • Rules are evaluated in order. Only the first matching rule is applied; subsequent rules are ignored for that request.
  • Each rule can target specific users, teams, models, metadata, or MCP servers, and can enforce different guardrails on any combination of hooks.
  • Omitted fields are not used for filtering (e.g., if models is not specified, the rule matches any model).
Order your rules with the most specific at the top and the most generic at the bottom to ensure that specialized guardrails are prioritized and general rules serve as a fallback.

How to Get the Guardrail Selector

You can get the selector (FQN) of guardrail integrations by navigating to the Guardrail tab on AI Gateway and clicking on the “Copy FQN” button next to the guardrail integration.
Guardrail integration interface showing Copy FQN button to obtain guardrail selector
Once you submit the config, guardrails will be automatically applied when requests match your rules. This includes:
  • LLM chat/completion requests (LLM Input/Output hooks)
  • MCP tool invocations (MCP Pre/Post Tool hooks)

MCP Tool Guardrails Example

For agentic workflows using MCP tools, you can add guardrails that validate tool inputs before execution and sanitize outputs after:
name: guardrails-control
type: gateway-guardrails-config
rules:
  - id: database-tool-protection
    when:
      target:
        operator: or
        conditions:
          mcpServers:
            values:
              - database-tools
            condition: in
      subjects:
        operator: and
        conditions:
          in:
            - team:engineering
          not_in:
            - user:db-admin@example.com
    llm_input_guardrails: []
    llm_output_guardrails: []
    mcp_tool_pre_invoke_guardrails:
      - my-guardrail-group/sql-sanitizer      # Validate SQL before execution
    mcp_tool_post_invoke_guardrails:
      - my-guardrail-group/secrets-detection  # Check for leaked credentials
      - my-guardrail-group/pii-redaction      # Redact PII from results
  
  - id: code-executor-protection
    when:
      target:
        operator: or
        conditions:
          mcpServers:
            values:
              - code-executor
            condition: in
      subjects:
        operator: and
        conditions:
          in:
            - team:engineering
          not_in:
            - user:devops-admin@example.com
    llm_input_guardrails: []
    llm_output_guardrails: []
    mcp_tool_pre_invoke_guardrails:
      - my-guardrail-group/code-safety-linter # Block dangerous code
    mcp_tool_post_invoke_guardrails:
      - my-guardrail-group/secrets-detection  # Check output for secrets
MCP tool guardrails are critical for agentic safety. Without them, AI agents may execute dangerous operations or leak sensitive data through tool outputs.

Monitoring Guardrail Execution

You can monitor guardrail execution in real-time through the AI Gateway dashboard:
  1. Navigate to AI Gateway → Monitor → Request Traces
  2. View detailed traces showing which guardrails were triggered on each hook
  3. See findings, mutations, and execution timing for each guardrail
  4. Filter by guardrail status to quickly find blocked or flagged requests
Use the Request Traces view to debug guardrail behavior, identify false positives during audit mode rollout, and verify your configuration is working as expected.

Detecting Guardrail Violations Programmatically

When guardrails flag content, the Gateway returns 400 status code with guardrail violation details. You can programmatically detect violations by checking the error.type field in the error response. Error Response Format: When a guardrail violation occurs, the response includes details about which hook triggered the violation:
{
  "error": {
    "message": "Guardrail checks failed for integrations: [integration-name]",
    "type": "guardrail_checks_failed"
  },
  "guardrail_checks": {
    "llm_input_guardrails": [...],
    "llm_output_guardrails": [...],
    "mcp_tool_pre_invoke_guardrails": [...],
    "mcp_tool_post_invoke_guardrails": [...]
  }
}
The error.type field will be set to 'guardrail_checks_failed' only when there is an actual guardrail violation. The guardrail_checks object will only contain the hooks that were evaluated.
Example:
from openai import OpenAI

client = OpenAI(
    api_key="your_truefoundry_api_key",
    base_url="https://{controlPlaneURL}/api/llm"
)

try:
    response = client.chat.completions.create(
        model="openai-main/gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant that specializes in Python programming."},
            {"role": "user", "content": "How do I write a function to calculate factorial?"}
        ]
    )
    print("Response:", response.choices[0].message.content)
except Exception as e:
    if hasattr(e, 'response') and e.response is not None:
        error_data = e.response.json()
        if error_data.get('error', {}).get('type') == 'guardrail_checks_failed':
            print("Guardrail violation detected!")
            # Check which hook triggered the violation
            checks = error_data.get('guardrail_checks', {})
            if checks.get('llm_input_guardrails'):
                print("Input guardrail violation")
            if checks.get('llm_output_guardrails'):
                print("Output guardrail violation")
            if checks.get('mcp_tool_pre_invoke_guardrails'):
                print("MCP pre-tool guardrail violation")
            if checks.get('mcp_tool_post_invoke_guardrails'):
                print("MCP post-tool guardrail violation")