Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.truefoundry.com/llms.txt

Use this file to discover all available pages before exploring further.

TrueFoundry AI Gateway is the proxy layer that sits between your applications and the LLM providers and MCP Servers. It is an enterprise-grade platform that enables users to access 1000+ LLMs using a unified interface while taking care of observability and governance. TrueFoundry AI Gateway architecture diagram showing the gateway as a proxy between applications and multiple LLM providers

Key Features

Unified API for 1000+ LLMs

One endpoint with an OpenAI-compatible schema for every provider.

Multimodal & Audio APIs

Chat, embeddings, images, audio, rerank, and realtime APIs.

Native SDK Compatibility

Drop-in support for OpenAI, Anthropic, and other provider SDKs.

Load Balancing & Fallbacks

Route across models by weight, latency, or priority with automatic retries.

Semantic Caching

Cut cost and latency on repeat requests.

Batch APIs

Run large workloads asynchronously at batch pricing.

Access Control & API Keys

RBAC and scoped keys for users, teams, and applications.

Rate Limiting

Per-user, per-model, and per-application throttles.

Budgets & Cost Tracking

Enforce spend limits and attribute cost across teams.

Guardrails

PII, prompt injection, content moderation, and custom policies.

Observability & Logs

OpenTelemetry-compliant metrics, traces, and request logs.

Prompt Management

Versioned prompts with a built-in playground.

MCP Registry

Host, publish, and discover MCP servers in one place.

Centralized MCP Auth

One API key to reach every MCP server and tool.

Virtual MCP Servers

Combine tools from multiple MCP servers into one.

Agent Registry

Build, publish, and share AI agents natively on TrueFoundry.

Skills Registry

Versioned, reusable SKILL.md instructions for agents and IDEs.

Flexible Deployment

SaaS, hybrid, or fully self-hosted in your own VPC.

Supported Model Providers

We integrate with 1000+ LLMs through the following providers.
If you don’t see the provider you need, there is a high change it will just work as self hosted models or OpenAI provider. Please reach out to us at support@truefoundry.com and we will be happy to guide you.

Gemini & Vertex AI

Google Gemini logoGoogle Gemini

AWS Bedrock

AWS SageMaker logoAWS SageMaker

Azure OpenAI

Azure AI Foundry

OpenAI logoOpenAI

Cohere

Databricks

AI21

Anthropic

Together AI

xAI

DeepInfra

Perplexity AI

Mistral AI

Cloudera logoCloudera

Groq

ElevenLabs logoElevenLabs

Deepgram logoDeepgram

Cartesia logoCartesia

Smallest AI logoSmallest AI

Snowflake Cortex logoSnowflake Cortex

Self Hosted

OpenRouter

SambaNova

Cerebras

Supported APIs

The following accordions summarize provider support for each gateway endpoint. Each section links to the full guide for that API (same order as Supported APIs in the sidebar).
Legend:
  • Supported by Provider and Truefoundry
  • Provided by provider, but not by Truefoundry
  • Provider does not support this feature

Chat Completion (/chat/completions)

Documentation: Chat Completions API
ProviderStreamNon StreamToolsJSON ModeSchema ModePrompt CachingReasoningStructured Output
OpenAI
Azure OpenAI
Anthropic
Bedrock
Vertex
Cohere
Gemini
Groq
AI21
Cerebras
SambaNova
Perplexity-AI
Together-AI
xAI
DeepInfra
Documentation: Embeddings API
ProviderStringList of String
OpenAI
Azure OpenAI
Anthropic
Bedrock
Vertex
Cohere
Gemini
Groq
SambaNova
Together-AI
xAI
DeepInfra
Documentation: Batch API
ProviderBatch
OpenAI
Azure OpenAI
Anthropic
Bedrock
Vertex
Cohere
Gemini
Groq
Cerebras
Together-AI
xAI
DeepInfra
Documentation: Finetune API
ProviderFine Tune
OpenAI
Azure OpenAI
Anthropic
Bedrock
Vertex
Cohere
Gemini
Groq
Cerebras
Together-AI
xAI
DeepInfra
Documentation: Responses API
ProviderModel Response
OpenAI
Azure OpenAI
Anthropic
Bedrock
Vertex
Cohere
Gemini
Groq
Cerebras
Together-AI
xAI
DeepInfra
Documentation: Image Generation API
ProviderGenerate
OpenAI
Azure OpenAI
Bedrock
Vertex
Anthropic
Cohere
Gemini
Groq
Together-AI
xAI
DeepInfra
Documentation: Image Edit API
ProviderEdit
OpenAI
Azure OpenAI
Bedrock
Vertex
Anthropic
Cohere
Gemini
Groq
Together-AI
xAI
DeepInfra
Documentation: Image Variation API
ProviderVariation
OpenAI
Azure OpenAI
Bedrock
Vertex
Anthropic
Cohere
Gemini
Groq
Together-AI
xAI
DeepInfra
Documentation: Text to Speech API
ProviderText To Speech
OpenAI
Azure OpenAI
Azure AI Foundry
Anthropic
Bedrock
Vertex
Cohere
Gemini
Groq
Together-AI
xAI
DeepInfra
DeepGram
Cartesia
ElevenLabs
Resemble AI
Smallest AI
Documentation: Audio Translation API
ProviderTranslation
OpenAI
Azure OpenAI
Azure AI Foundry
Anthropic
Bedrock
Vertex
Cohere
Gemini
Groq
Together-AI
xAI
DeepInfra
Documentation: Speech to Text API
ProviderTranscription
OpenAI
Azure OpenAI
Azure AI Foundry
Anthropic
Bedrock
Vertex
Cohere
Gemini
Groq
Together-AI
xAI
DeepInfra
DeepGram
Cartesia
ElevenLabs
Smallest AI
Documentation: Live / Realtime API
ProviderLive / Realtime API
Gemini
Vertex
OpenAI
Azure AI Foundry
Documentation: Files API
ProviderFiles
OpenAI
Azure OpenAI
Anthropic
Bedrock
Vertex
Cohere
Gemini
Groq
Cerebras
Together-AI
xAI
DeepInfra
Documentation: Rerank API
ProviderRerank
OpenAI
Azure OpenAI
Anthropic
Bedrock
Vertex
Cohere
Gemini
Groq
Together-AI
xAI
DeepInfra
Documentation: Moderation API
ProviderModeration
OpenAI
Azure OpenAI
Anthropic
Bedrock
Vertex
Cohere
Gemini
Groq
Cerebras
Together-AI
xAI
DeepInfra
Documentation: Compaction API
ProviderCompaction API
OpenAI
Documentation: Messages API
ProviderMessages API
Anthropic
Documentation: Proxy APIForward provider-native requests through the gateway while keeping logging, rate limiting, and budget controls. See the guide for setup, headers, and examples by provider.

Deployment Options

You can run the AI Gateway as fully managed SaaS, keep LLM request–response data in your own object storage while Truefoundry operates the gateway, or host the gateway plane (and optionally more of the stack) in your cloud or on-prem for stricter data residency and control. Each option differs in who hosts infrastructure, where traffic flows, and pricing tier. Read the full comparison—including a scenario table, diagrams, and operational notes—in AI Gateway deployment options. For background on how the gateway fits the platform, see gateway plane architecture. To start on managed SaaS, follow the quick start.

Frequently Asked Questions

The latency overhead is minimal, typically less than 5ms. Our benchmarks show enterprise-grade performance that scales with your needs. Our SaaS offering is hosted in multiple regions across the world to ensure low latency and high availability. You can also deploy the gateway on-premise or on any cloud provider in your region which
is closer to your users.
Yes, the AI Gateway supports on-premise deployments on any infrastructure or cloud provider, giving you complete control over your AI operations.
You can easily integrate any OpenAI-compatible self-hosted model. Check our self-hosted models guide for detailed instructions.
Yes, The AI Gateway can be used as a standalone solution. You can use the full MLOps platform if you’re using features like model deployment(traditional models and LLMs), model training, llm fine-tuning or training/data-processing workflows.