Introduction to AI Gateway - TrueFoundry Docs

TrueFoundry AI Gateway is the proxy layer that sits between your applications and the LLM providers and MCP Servers. It is an enterprise-grade platform that enables users to access 1000+ LLMs using a unified interface while taking care of observability and governance.

TrueFoundry AI Gateway architecture diagram showing the gateway as a proxy between applications and multiple LLM providers

Key Features

Unified API for 1000+ LLMs

One endpoint with an OpenAI-compatible schema for every provider.

Multimodal & Audio APIs

Chat, embeddings, images, audio, rerank, and realtime APIs.

Native SDK Compatibility

Drop-in support for OpenAI, Anthropic, and other provider SDKs.

Load Balancing & Fallbacks

Route across models by weight, latency, or priority with automatic retries.

Semantic Caching

Cut cost and latency on repeat requests.

Batch APIs

Run large workloads asynchronously at batch pricing.

Access Control & API Keys

RBAC and scoped keys for users, teams, and applications.

Rate Limiting

Per-user, per-model, and per-application throttles.

Budgets & Cost Tracking

Enforce spend limits and attribute cost across teams.

Guardrails

PII, prompt injection, content moderation, and custom policies.

Observability & Logs

OpenTelemetry-compliant metrics, traces, and request logs.

Prompt Management

Versioned prompts with a built-in playground.

MCP Registry

Host, publish, and discover MCP servers in one place.

Centralized MCP Auth

One API key to reach every MCP server and tool.

Virtual MCP Servers

Combine tools from multiple MCP servers into one.

Agent Registry

Build, publish, and share AI agents natively on TrueFoundry.

Skills Registry

Versioned, reusable SKILL.md instructions for agents and IDEs.

Flexible Deployment

SaaS, hybrid, or fully self-hosted in your own VPC.

Supported Model Providers

We integrate with 1000+ LLMs through the following providers.

If you don’t see the provider you need, there is a high change it will just work as self hosted models or OpenAI provider. Please reach out to us at support@truefoundry.com and we will be happy to guide you.

Gemini & Vertex AI

Google Gemini

AWS Bedrock

AWS SageMaker

Azure OpenAI

Azure AI Foundry

OpenAI

Cohere

Databricks

AI21

Anthropic

Together AI

xAI

DeepInfra

Perplexity AI

Mistral AI

Cloudera

Groq

ElevenLabs

Deepgram

Cartesia

Smallest AI

Snowflake Cortex

Self Hosted

OpenRouter

SambaNova

Cerebras

Supported APIs

The following accordions summarize provider support for each gateway endpoint. Each section links to the full guide for that API (same order as Supported APIs in the sidebar).

Legend:

✅ Supported by Provider and Truefoundry
Provided by provider, but not by Truefoundry
Provider does not support this feature

Chat Completion (/chat/completions)

Documentation: Chat Completions API · API Reference

Provider	Stream	Non Stream	Tools	JSON Mode	Schema Mode	Prompt Caching	Reasoning	Structured Output
OpenAI	✅	✅	✅	✅	✅	✅	✅
Azure OpenAI	✅	✅	✅	✅	✅	✅	✅
Anthropic	✅	✅	✅		✅	✅	✅
Bedrock	✅	✅	✅		✅	✅	✅
Vertex	✅	✅	✅		✅	✅	✅
Cohere	✅	✅	✅	✅	✅		✅
Gemini	✅	✅	✅	✅	✅	✅	✅
Groq	✅	✅	✅	✅	✅	✅	✅
AI21	✅	✅		✅
Cerebras	✅	✅		✅			✅
SambaNova	✅	✅		✅			✅
Perplexity-AI	✅	✅		✅			✅	✅
Together-AI	✅	✅	✅	✅		✅	✅	✅
xAI	✅	✅	✅	✅	✅	✅	✅	✅
DeepInfra	✅	✅	✅	✅		✅	✅

Embedding (/embeddings)

Documentation: Embeddings API · API Reference

Provider	String	List of String
OpenAI	✅	✅
Azure OpenAI	✅	✅
Anthropic
Bedrock	✅	✅
Vertex	✅	✅
Cohere	✅	✅
Gemini
Groq
SambaNova
Together-AI	✅	✅
xAI
DeepInfra

Batch (/batches)

Documentation: Batch API · API Reference

Provider	Batch
OpenAI	✅
Azure OpenAI	✅
Anthropic
Bedrock	✅
Vertex	✅
Cohere
Gemini
Groq
Cerebras
Together-AI
xAI
DeepInfra

Fine Tune

Documentation: Finetune API · API Reference

Provider	Fine Tune
OpenAI	✅
Azure OpenAI
Anthropic
Bedrock
Vertex	✅
Cohere
Gemini
Groq
Cerebras
Together-AI
xAI
DeepInfra

Model Response (/responses)

Documentation: Responses API · API Reference

Provider	Model Response
OpenAI	✅
Azure OpenAI	✅
Anthropic
Bedrock
Vertex
Cohere
Gemini
Groq
Cerebras
Together-AI
xAI
DeepInfra

Image Generation (/images/generations)

Documentation: Image Generation API · API Reference

Provider	Generate
OpenAI	✅
Azure OpenAI	✅
Bedrock	✅
Vertex	✅
Anthropic
Cohere
Gemini
Groq
Together-AI
xAI
DeepInfra

Image Edit (/images/edits)

Documentation: Image Edit API · API Reference

Provider	Edit
OpenAI	✅
Azure OpenAI	✅
Bedrock	✅
Vertex	✅
Anthropic
Cohere
Gemini
Groq
Together-AI
xAI
DeepInfra

Image Variation (/images/variations)

Documentation: Image Variation API · API Reference

Provider	Variation
OpenAI	✅
Azure OpenAI
Bedrock	✅
Vertex
Anthropic
Cohere
Gemini
Groq
Together-AI
xAI
DeepInfra

Text To Speech

Documentation: Text to Speech API · API Reference

Provider	Text To Speech
OpenAI	✅
Azure OpenAI	✅
Azure AI Foundry	✅
Anthropic
Bedrock
Vertex	✅
Cohere
Gemini	✅
Groq	✅
Together-AI
xAI
DeepInfra
DeepGram	✅
Cartesia	✅
ElevenLabs	✅
Resemble AI	✅
Smallest AI	✅

Audio Translation

Documentation: Audio Translation API · API Reference

Provider	Translation
OpenAI	✅
Azure OpenAI	✅
Azure AI Foundry	✅
Anthropic
Bedrock
Vertex
Cohere
Gemini
Groq	✅
Together-AI
xAI
DeepInfra

Speech to Text

Documentation: Speech to Text API · API Reference

Provider	Transcription
OpenAI	✅
Azure OpenAI	✅
Azure AI Foundry	✅
Anthropic
Bedrock
Vertex
Cohere
Gemini
Groq	✅
Together-AI
xAI
DeepInfra
DeepGram	✅
Cartesia	✅
ElevenLabs	✅
Smallest AI	✅

Live / Realtime API

Documentation: Live / Realtime API

Provider	Live / Realtime API
Gemini	✅
Vertex	✅
OpenAI	✅
Azure AI Foundry	✅

Files (/files)

Documentation: Files API · API Reference

Provider	Files
OpenAI	✅
Azure OpenAI
Anthropic	✅
Bedrock	✅
Vertex	✅
Cohere
Gemini
Groq	✅
Cerebras
Together-AI
xAI
DeepInfra

Rerank (/rerank)

Documentation: Rerank API · API Reference

Provider	Rerank
OpenAI
Azure OpenAI
Anthropic
Bedrock	✅
Vertex
Cohere	✅
Gemini
Groq
Together-AI
xAI
DeepInfra

Moderation (/moderations)

Documentation: Moderation API · API Reference

Provider	Moderation
OpenAI	✅
Azure OpenAI
Anthropic
Bedrock
Vertex
Cohere
Gemini
Groq
Cerebras
Together-AI
xAI
DeepInfra

Compaction API

Documentation: Compaction API · API Reference

Provider	Compaction API
OpenAI	✅

Messages API

Documentation: Messages API · API Reference

Provider	Messages API
Anthropic	✅

Proxy API (/proxy)

Documentation: Proxy APIForward provider-native requests through the gateway while keeping logging, rate limiting, and budget controls. See the guide for setup, headers, and examples by provider.

Deployment Options

You can run the AI Gateway as fully managed SaaS, keep LLM request–response data in your own object storage while Truefoundry operates the gateway, or host the gateway plane (and optionally more of the stack) in your cloud or on-prem for stricter data residency and control. Each option differs in who hosts infrastructure, where traffic flows, and pricing tier. Read the full comparison—including a scenario table, diagrams, and operational notes—in AI Gateway deployment options. For background on how the gateway fits the platform, see gateway plane architecture. To start on managed SaaS, follow the quick start.

Frequently Asked Questions

What's the performance impact of using the gateway?

The latency overhead is minimal, typically less than 5ms. Our benchmarks show enterprise-grade performance that scales with your needs. Our SaaS offering is hosted in multiple regions across the world to ensure low latency and high availability. You can also deploy the gateway on-premise or on any cloud provider in your region which
is closer to your users.

Can I deploy the gateway on-premise?

Yes, the AI Gateway supports on-premise deployments on any infrastructure or cloud provider, giving you complete control over your AI operations.

How do I integrate my self-hosted models?

You can easily integrate any OpenAI-compatible self-hosted model. Check our self-hosted models guide for detailed instructions.

Can I use the gateway without the full MLOps platform?

Yes, The AI Gateway can be used as a standalone solution. You can use the full MLOps platform if you’re using features like model deployment(traditional models and LLMs), model training, llm fine-tuning or training/data-processing workflows.

​Key Features

Unified API for 1000+ LLMs

Multimodal & Audio APIs

Native SDK Compatibility

Load Balancing & Fallbacks

Semantic Caching

Batch APIs

Access Control & API Keys

Rate Limiting

Budgets & Cost Tracking

Guardrails

Observability & Logs

Prompt Management

MCP Registry

Centralized MCP Auth

Virtual MCP Servers

Agent Registry

Skills Registry

Flexible Deployment

​Supported Model Providers

​Supported APIs

​Deployment Options

​Frequently Asked Questions

Key Features

Supported Model Providers

Supported APIs

Deployment Options

Frequently Asked Questions