An LLM gateway acts as a centralized interface that simplifies the complexities associated with accessing multiple LLM providers. By providing a unified API, it allows developers to interact with various models without needing to navigate the intricacies of each provider's specific requirements.
Moreover, LLM gateways play a critical role in enhancing security and compliance. They manage authentication, rate limiting, and data governance, ensuring that sensitive information is protected and that interactions adhere to regulatory standards. This is particularly important for industries dealing with personal data or operating under strict compliance frameworks, as it mitigates risks associated with data breaches and misuse.
In addition to security, LLM gateways optimize performance through features like load balancing and caching, which help manage traffic and reduce latency. By distributing requests across multiple models and caching frequent responses, these gateways ensure that applications remain responsive even under high demand.
Learn how to evaluate an LLM Gateway by assessing its features for authentication, model selection, usage analytics, cost management etc.
The LLM Gateway should include centralized key management to securely store and manage API keys, assigning individual keys to each developer or product for accountability while safeguarding root keys.
It should also integrate with secret managers like AWS SSM, Google Secret Store, or Azure Vault for enhanced security and streamlined key management.
TrueFoundry’s LLM Gateway provides fine-grained access control over all the models that are third-party (Called through their respective APIs) or the self-hosted models through a singular admin interface. We ensure that admins do not have to share the 3rd party (e.g. OpenAI) API keys with users, safeguarding against leaks.
We have a concept of ProviderAccounts through which any self-hosted or third-party LLM can be integrated. Once set up, the admins can provide or restrict access to any user or application to any of the integrated models.
The authorization configuration is saved as a YAML which can also be tracked on git for auditing.
The Unified API in the Gateway should offer a standardized interface for accessing and interacting with language models from various providers. It allows for seamless switching between models and providers without requiring changes to your application's code structure. By abstracting underlying complexities, the Unified API simplifies multi-model integration and maintains consistency in access and usage. Additionally, it adheres to the OpenAI request-response format, ensuring compatibility with popular Python libraries like OpenAI and LangChain.
Key Features of the Unified API:
TrueFoundry’s Gateway provides automated code generation for integrating language models using various languages and libraries. You can call any model from any provider using standardized code through the Gateway.
LLM gateways facilitate seamless connections to third-party models hosted on platforms like AWS Bedrock, Azure OpenAI, and others. In addition to third-party models, LLM gateways should support the integration of self-hosted models that organizations may develop or fine-tune for specific applications.
TrueFoundry by its design can provide access to any open source or commercially available LLMs and is not restricted to any model or providers or specific set of open source model. Documentation for adding any model to gateway. This can be done through the following 3 routes:
Integrate with any of the model providers for providing access to commercial LLMs. Cost would be the same as the cost charged by the model provider. Some integrations present (not limited to)
The User would be able to make use of any LLM model provided by these providers and any other provider that is not present in the list.
Users can deploy any Open Source LLM on their own cloud. This is not restricted to any particular set of models. We provide direct integration with HuggingFace so that any model from the HuggingFace Model Hub can directly be deployed and added to the gateway with a few clicks and is ready for use through the gateway.
Additional documentation about deploying any self-hosted model can be found here. Any open-source model can be deployed through this route. Some of them can be found here: https://huggingface.co/models
In addition custom built/pre-trained or fine-tuned models can also be deployed and served to the LLM gateway through this route.
TrueFoundry provides one click ‘Add to Gateway’ feature for all the self hosted even finetuned models.
Provide access to popular Open Source LLMs through models hosted by TrueFoundry and shared by multiple clients of TrueFoundry
Most of the latest and popular models are available through this route like:
And 100+ More Models. Most latest and popular Open-Source models are available through this route.
LLM gateways should collect a wide range of performance metrics, including:
TrueFoundry captures various performance monitoring metrics such as the ones mentioned below and provides intuitive dashboards and reporting tools to visualize performance data
An LLM Gateway should provide comprehensive usage analytics to monitor and manage interactions with LLMs effectively. This ensures that organizations can track performance, optimize resource allocation, and maintain control over model usage.
TrueFoundry captures various usage analytics metrics such as ones mentioned below-
TrueFoundry logs the cost of all self-hosted and third-party models used by its users. The platform offers the ability to rate-limit access to these models at a granular level, including by model, user, provider, project, and team.
Metrics can be exported to any preferred dashboard, with an integrated dashboard available for monitoring within the platform. Alerts can be configured based on this data, and administrators can receive notifications through their chosen channel (such as email or Slack) depending on the alerting tool in use.
An LLM Gateway should ensure effective model caching through the following features:
Fallback: LLM Gateways need fallback capabilities to maintain uninterrupted service and application performance. When the primary model encounters issues or fails, the gateway can automatically switch to backup models, ensuring that users continue to receive responses without experiencing downtime or degradation in service quality.
Automatic retries: Automatic retries are crucial for improving request success rates by addressing temporary disruptions or errors. If a request fails due to transient issues, the gateway will automatically attempt to resend it, minimizing the impact of brief service interruptions and enhancing the reliability of the system.
Rate Limiting Support: Rate limiting helps manage the volume of requests sent to an LLM service, preventing overload and maintaining service stability. By restricting the number of requests within a specified timeframe, the gateway ensures fair usage, prevents abuse, and controls costs associated with high usage, thereby contributing to better resource management.
Load Balancing: Load balancing is essential for optimizing the distribution of requests across multiple LLM providers or models. By evenly distributing the load, the gateway enhances performance, increases availability, and helps manage costs effectively. It ensures that no single provider or model is overwhelmed, leading to more reliable and efficient service.
Tool calling allows LLMs to perform specific tasks beyond their core natural language processing abilities. By integrating with external tools and APIs, LLMs can access real-time data, execute custom functions, and extend their utility to a wide range of applications.
Tool calling within the TrueFoundry LLM Gateway allows language models to simulate interactions with external functions.While the gateway does not execute calls to external tools directly, it enables users to describe the tools and simulate the call within the response. This simulation provides a comprehensive representation of the request and the expected response, helping developers understand how the language model would interact with external systems.
Multimodal support in LLM Gateways is essential for applications that need to process and integrate multiple types of data simultaneously. For instance, a customer support application leveraging multimodal capabilities can handle text descriptions and images in a single support ticket, providing more accurate responses by analyzing both modalities.
By connecting with a wide range of tools, gateways can enhance the functionality, security, and performance of AI applications.
By integrating with monitoring tools like Prometheus and Grafana, LLM gateways can track key performance metrics in real-time.
Integrating with guardrail tools like Guardrails AI and Nemo Guardrails enables LLM gateways to implement safety measures around LLM interactions. These integrations help filter out inappropriate or harmful content, ensuring that model outputs align with organizational policies and user expectations.
Tools like Arize AI and MLflow allow LLM gateways to continuously evaluate the performance and accuracy of their models. By integrating with these frameworks, gateways can track key metrics such as response quality, relevance, and user satisfaction.
Discover more about TrueFoundry's Gateway and its advanced features by reaching out to us. We can schedule a personalized demo to showcase its capabilities.
Join AI/ML leaders for the latest on product, community, and GenAI developments