Truefoundry Docs

Guardrail Hooks

Guardrails can be invoked at four different points (hooks) in the AI Gateway workflow:

Hook	When It Executes	Common Use Cases
LLM Input	Before the request is sent to the LLM	PII redaction, prompt injection detection, content moderation
LLM Output	After the response is received from the LLM	Hallucination detection, content filtering, secrets detection
MCP Pre Tool	Before an MCP tool is invoked	Validate tool parameters, check permissions, sanitize inputs
MCP Post Tool	After an MCP tool returns results	Validate tool outputs, detect unsafe code/SQL, redact sensitive data

MCP hooks are particularly valuable for agentic workflows where AI models invoke external tools. Use MCP Pre Tool to validate what the agent is about to do, and MCP Post Tool to validate what the tool returned before it’s used by the model.

Guardrail Rules UI

Navigate to AI Gateway → Controls → Guardrails to view and manage guardrail rules. Click Add Rule to create a new rule or edit existing ones.

Database Protection Rule
Code Executor Rule

Rule ID database-safety-rule

WHEN REQUEST GOES TO + Add Targets


MCP Servers	`IN`	`database-tools` `analytics-db`

FROM SUBJECTS


AND	`IN`	👤 Alice Chen 👤 Bob Smith
	`NOT IN`	👤 DB Admin

APPLY ON HOOKS + Add Hook

Hook	Guardrail
MCP Tool Pre-Invoke	`my-guardrails/sql-sanitizer`
MCP Tool Post-Invoke	`my-guardrails/secrets-detection`

Rule ID code-executor-safety

WHEN REQUEST GOES TO + Add Targets


MCP Servers	`IN`	`code-executor` `jupyter-kernel`

FROM SUBJECTS


AND	`IN`	👤 Engineering Team
	`NOT IN`	👤 DevOps Admin 👤 Platform Admin

APPLY ON HOOKS + Add Hook

Hook	Guardrail
MCP Tool Pre-Invoke	`my-guardrails/code-safety-linter`
MCP Tool Post-Invoke	`my-guardrails/secrets-detection`
LLM Output	`my-guardrails/code-safety-linter`

Create Guardrail Config

Create rules via the UI above (AI Gateway → Controls → Guardrails), or use YAML configuration via the Config tab.

TrueFoundry YAML editor for creating guardrail configuration

Configuration Structure

The guardrails configuration contains an array of rules that are evaluated for each request. Only the first matching guardrail rule is applied to that request. Each rule can specify guardrails for any of the four hooks.

Example Configuration

name: guardrails-control
type: gateway-guardrails-config
rules:
  - id: demo-guardrail
    when:
      subjects:
        operator: and
        conditions:
          in:
            - user:john@example.com
    llm_input_guardrails:
      - my-guardrail-group/openai-moderation
    llm_output_guardrails:
      - my-guardrail-group/secrets-detection
    mcp_tool_pre_invoke_guardrails: []
    mcp_tool_post_invoke_guardrails: []

Configuration Reference

Rule Structure

Field	Required	Description
`id`	Yes	Unique identifier for the rule
`when`	Yes	Matching criteria with `target` and `subjects` blocks
`llm_input_guardrails`	Yes	Guardrails applied before LLM request (use `[]` if none)
`llm_output_guardrails`	Yes	Guardrails applied after LLM response (use `[]` if none)
`mcp_tool_pre_invoke_guardrails`	Yes	Guardrails applied before MCP tool invocation (use `[]` if none)
`mcp_tool_post_invoke_guardrails`	Yes	Guardrails applied after MCP tool returns (use `[]` if none)

The `when` Block

The when block contains two main sections: target (what the request targets) and subjects (who is making the request):

Section	Description
`target`	Defines conditions based on mcpServers, models, mcpTools, or metadata
`subjects`	Defines conditions based on users, teams, or virtual accounts

If when is empty ({}), the rule matches all requests. Use this for fallback/default rules at the end of your rules list.

The `when` Block Structure

Target: Match by MCP Servers

when:
  target:
    operator: or
    conditions:
      mcpServers:
        values:
          - database-tools
          - code-executor
        condition: in

Target: Match by Models

when:
  target:
    operator: or
    conditions:
      models:
        values:
          - openai/gpt-4o
          - anthropic/claude-3-5-sonnet
        condition: in

Target: Match by Metadata

when:
  target:
    operator: or
    conditions:
      metadata:
        environment: production
        tier: enterprise

Requires header: X-TFY-METADATA: {"environment": "production", "tier": "enterprise"}

Target: Match by Specific MCP Tool

when:
  target:
    operator: or
    conditions:
      mcpServers:
        values:
          - database-tools
        condition: in
      mcpTools:
        values:
          - execute_query
        condition: in

Subjects: Users with IN/NOT IN

when:
  subjects:
    operator: and
    conditions:
      in:
        - user:alice@company.com
        - user:bob@company.com
        - team:data-science
      not_in:
        - user:guest@company.com

Combined Target and Subjects

when:
  target:
    operator: or
    conditions:
      mcpServers:
        values:
          - database-tools
        condition: in
      metadata:
        environment: production
  subjects:
    operator: and
    conditions:
      in:
        - team:engineering
      not_in:
        - user:external@partner.com

Both target and subjects conditions must match for the rule to apply.

How it works:

Rules are evaluated in order. Only the first matching rule is applied; subsequent rules are ignored for that request.
Each rule can target specific users, teams, models, metadata, or MCP servers, and can enforce different guardrails on any combination of hooks.
Omitted fields are not used for filtering (e.g., if models is not specified, the rule matches any model).

Order your rules with the most specific at the top and the most generic at the bottom to ensure that specialized guardrails are prioritized and general rules serve as a fallback.

How to Get the Guardrail Selector

You can get the selector (FQN) of guardrail integrations by navigating to the Guardrail tab on AI Gateway and clicking on the “Copy FQN” button next to the guardrail integration.

Guardrail integration interface showing Copy FQN button to obtain guardrail selector

Once you submit the config, guardrails will be automatically applied when requests match your rules. This includes:

LLM chat/completion requests (LLM Input/Output hooks)
MCP tool invocations (MCP Pre/Post Tool hooks)

MCP Tool Guardrails Example

For agentic workflows using MCP tools, you can add guardrails that validate tool inputs before execution and sanitize outputs after:

name: guardrails-control
type: gateway-guardrails-config
rules:
  - id: database-tool-protection
    when:
      target:
        operator: or
        conditions:
          mcpServers:
            values:
              - database-tools
            condition: in
      subjects:
        operator: and
        conditions:
          in:
            - team:engineering
          not_in:
            - user:db-admin@example.com
    llm_input_guardrails: []
    llm_output_guardrails: []
    mcp_tool_pre_invoke_guardrails:
      - my-guardrail-group/sql-sanitizer      # Validate SQL before execution
    mcp_tool_post_invoke_guardrails:
      - my-guardrail-group/secrets-detection  # Check for leaked credentials
      - my-guardrail-group/pii-redaction      # Redact PII from results
  
  - id: code-executor-protection
    when:
      target:
        operator: or
        conditions:
          mcpServers:
            values:
              - code-executor
            condition: in
      subjects:
        operator: and
        conditions:
          in:
            - team:engineering
          not_in:
            - user:devops-admin@example.com
    llm_input_guardrails: []
    llm_output_guardrails: []
    mcp_tool_pre_invoke_guardrails:
      - my-guardrail-group/code-safety-linter # Block dangerous code
    mcp_tool_post_invoke_guardrails:
      - my-guardrail-group/secrets-detection  # Check output for secrets

MCP tool guardrails are critical for agentic safety. Without them, AI agents may execute dangerous operations or leak sensitive data through tool outputs.

Monitoring Guardrail Execution

You can monitor guardrail execution in real-time through the AI Gateway dashboard:

Navigate to AI Gateway → Monitor → Request Traces
View detailed traces showing which guardrails were triggered on each hook
See findings, mutations, and execution timing for each guardrail
Filter by guardrail status to quickly find blocked or flagged requests

Use the Request Traces view to debug guardrail behavior, identify false positives during audit mode rollout, and verify your configuration is working as expected.

Detecting Guardrail Violations Programmatically

When guardrails flag content, the Gateway returns 400 status code with guardrail violation details. You can programmatically detect violations by checking the error.type field in the error response. Error Response Format: When a guardrail violation occurs, the response includes details about which hook triggered the violation:

{
  "error": {
    "message": "Guardrail checks failed for integrations: [integration-name]",
    "type": "guardrail_checks_failed"
  },
  "guardrail_checks": {
    "llm_input_guardrails": [...],
    "llm_output_guardrails": [...],
    "mcp_tool_pre_invoke_guardrails": [...],
    "mcp_tool_post_invoke_guardrails": [...]
  }
}

The error.type field will be set to 'guardrail_checks_failed' only when there is an actual guardrail violation. The guardrail_checks object will only contain the hooks that were evaluated.

Example:

from openai import OpenAI

client = OpenAI(
    api_key="your_truefoundry_api_key",
    base_url="https://{controlPlaneURL}/api/llm"
)

try:
    response = client.chat.completions.create(
        model="openai-main/gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant that specializes in Python programming."},
            {"role": "user", "content": "How do I write a function to calculate factorial?"}
        ]
    )
    print("Response:", response.choices[0].message.content)
except Exception as e:
    if hasattr(e, 'response') and e.response is not None:
        error_data = e.response.json()
        if error_data.get('error', {}).get('type') == 'guardrail_checks_failed':
            print("Guardrail violation detected!")
            # Check which hook triggered the violation
            checks = error_data.get('guardrail_checks', {})
            if checks.get('llm_input_guardrails'):
                print("Input guardrail violation")
            if checks.get('llm_output_guardrails'):
                print("Output guardrail violation")
            if checks.get('mcp_tool_pre_invoke_guardrails'):
                print("MCP pre-tool guardrail violation")
            if checks.get('mcp_tool_post_invoke_guardrails'):
                print("MCP post-tool guardrail violation")

Get Started

Developer Guide

MCP Registry and Gateway

Agent Hub

Guardrails and Security

Prompt Management

Observability

Deployment

Admin Guide

API Reference

Chat

Agent

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Moderations

Models

Configure Guardrails

Guardrail Hooks

Guardrail Rules UI

Create Guardrail Config

Configuration Structure

Example Configuration

Configuration Reference

Rule Structure

The `when` Block

The `when` Block Structure

How to Get the Guardrail Selector

MCP Tool Guardrails Example

Monitoring Guardrail Execution

Detecting Guardrail Violations Programmatically

Get Started

Developer Guide

MCP Registry and Gateway

Agent Hub

Guardrails and Security

Prompt Management

Observability

Deployment

Admin Guide

API Reference

Chat

Agent

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Moderations

Models

​Guardrail Hooks

​Guardrail Rules UI

​Create Guardrail Config

​Configuration Structure

​Example Configuration

​Configuration Reference

​Rule Structure

​The when Block

​The when Block Structure

​How to Get the Guardrail Selector

​MCP Tool Guardrails Example

​Monitoring Guardrail Execution

​Detecting Guardrail Violations Programmatically

Guardrail Hooks

Guardrail Rules UI

Create Guardrail Config

Configuration Structure

Example Configuration

Configuration Reference

Rule Structure

The `when` Block

The `when` Block Structure

How to Get the Guardrail Selector

MCP Tool Guardrails Example

Monitoring Guardrail Execution

Detecting Guardrail Violations Programmatically