Global YAML-based routing configuration for TrueFoundry AI Gateway
For new setups, we recommend using Virtual Models to configure routing. Virtual models provide the same routing strategies, retries, and fallbacks, with clearer per-model ownership, access control, and a simpler configuration experience. The global routing configuration described on this page remains functional for existing deployments.
The global routing configuration lets you define load balancing, fallback, and retry rules as a YAML file applied at the tenant level. Rules are evaluated in order for each incoming request — the first matching rule wins and subsequent rules are ignored.
when — Defines which requests a rule applies to. The subjects, models, and metadata fields are combined with AND logic. If a request doesn’t match one rule’s when block, the next rule is evaluated.
subjects — Filter by user, team, or virtual account (for example user:john-doe, team:engineering, virtualaccount:acct_123).
models — Rule matches if the request model name is in this list.
metadata — Rule matches if the request’s X-TFY-METADATA header contains these key-value pairs.
type — The routing strategy for this rule:
weight-based-routing — Distribute traffic by assigned weights that sum to 100.
latency-based-routing — Automatically route to the target with the lowest recent latency (time per output token).
priority-based-routing — Route to the highest priority (lowest number) healthy target, falling back to the next on failure.
For details on how each strategy behaves (latency algorithm, SLA cutoff, unhealthy detection), see Virtual Models — Routing Strategies. The strategies work identically whether configured here or on a virtual model.load_balance_targets — The list of models eligible for routing in this rule. Per-target options:
Retry configuration — attempts, delay, and on_status_codes for retries on the same target.
Fallback configuration — fallback_status_codes to trigger trying another target, and fallback_candidate to control whether a target can receive fallback traffic.
Override parameters — Per-target request parameters like temperature, max_tokens, or prompt_version_fqn for model-specific prompts.
prompt_version_fqn override does not work with agents (when using MCP/tools). It is supported for standard chat completion requests.
The configuration is managed under AI Gateway → Configs → Routing Config in the UI. You can also store the YAML in your Git repository and apply it with the tfy apply command to enforce a PR review process.
To move from global routing config to virtual models:
Identify each distinct model your apps send that is backed by rules here.
Create a virtual model with the same targets, strategy, weights/priorities, retries, fallbacks, and override_params.
Point clients at the virtual model using its full path or a slug.
Remove or narrow rules here once traffic uses the virtual model.
For rules that matched metadata or subjects, use different virtual model names per team or environment (for example booking-app/gpt-prod vs booking-app/gpt-dev).See Virtual Models for the full guide.