Highlights
- TrueFoundry LLM Gateway provides a unified OpenAI compatible interface to various LLM providers like Anthropic, OpenAI, Bedrock, Gemini and many others
- TrueFoundry LLM Gateway scales seamlessly to 350 RPS on a single replica of 1 unit CPU while using 270 MB of memory. We compared with another gateway product, LiteLLM, on a similar setup and LiteLLM failed to scaled beyond 50 RPS
- TrueFoundry LLM Gateway only adds an extra latency of 3-5 ms, while LiteLLM adds between 15-30 ms per request.
Why does your org need an LLM Gateway?
An LLM Gateway provides a unified interface to manage your organisation's LLM usage:
- Unified API: Access multiple LLM providers through a single OpenAI compatible interface, no code changes needed
- API Key Security: Secure, centralised credential management
- Governance & Control: Set limits, access controls, and content filtering
- Rate Limiting: Prevent abuse and ensure fair usage
- Observability: Track usage, costs, latency and performance
- Load Balancing: Route requests across providers automatically
- Cost Management: Monitor spending and set budget alerts
- Audit Trails: Log all LLM interactions for compliance
How fast is TrueFoundry LLM Gateway?
Load Test Setup
For our load testing experiment, we setup a deployed this fake OpenAI endpoint service using TrueFoundry. The service would simulate OpenAI request and response format without actually producing tokens.
We also deployed the TrueFoundry LLM Gateway and LiteLLM Proxy Server, both running of a single replica with 1 unit CPU and 1 GB memory.
We added our fake OpenAI provider into both TrueFoundry and LiteLLM gateways. While load testing, we made requests to the fake OpenAI server in 3 different ways:
- Setup 1: Directly without using any proxy or gateway
- Setup 2: Through the TrueFoundry LLM Gateway deployed on 1 unit CPU and 1 GB memory
- Setup 3: Through the LiteLLM Proxy Server deployed on 1 unit CPU and 1 GB memory
Observations
- TrueFoundry Gateway adds only extra 3 ms in latency upto 250 RPS and 4 ms at RPS > 300
- TrueFoundry LLM Gateway was able to scale without any degradation in performance until about 350 RPS (1 vCPU, 1 GB machine) before the CPU utilisation reached 100% and latencies started getting affected. With more CPU or more replicas, the LLM Gateway can scale to tens of thousands of requests per second.
- LiteLLM on the same machine was not able to scale beyond 40-50 RPS before reaching CPU limit
More metrics
Setup 1: Direct OpenAI endpoint calling
Setup 2: TrueFoundry LLM Gateway
Setup 3: LiteLLM
Speed features of LLM Gateway
- Near-Zero Overhead: Just 3-5 ms added latency
- Optimised Backend: Built with performant Node.js framework
- Config Caching: Config is stored in memory for quick look up
- Smart Routing: Minimal processing overhead
- Edge Ready: Deploy close to your apps
- High Capacity: A
t2.2xlarge
AWS instance (43$ per month on spot) machine can scale upto ~3000 RPS with no issues.
Supported Providers
Below is a comprehensive list of popular LLM providers that is supported by TrueFoundry LLM Gateway: