As generative AI moves into critical user journeys, search, support, decision support, automation, the tolerance for “best-effort” reliability disappears. Platform and SRE teams now need the same level of observability for LLM traffic that they already expect from core microservices:

What is the end-to-end latency for each request path?
Which models, tenants, or regions are driving error budgets?
How do we correlate LLM behavior with the rest of the stack?

The integration between TrueFoundry AI Gateway and Last9 addresses exactly this problem. By exporting OpenTelemetry (OTEL) traces from the Gateway into Last9, teams gain deep, cost-efficient observability into all LLM traffic, without rewriting applications or scattering SDKs across services.

This article explains:

What Last9 and TrueFoundry AI Gateway provide
How the integration works at an architectural level
A practical, step-by-step view of the setup
The concrete benefits for SRE, platform, and AI teams

Last9: Observability Designed for High-Cardinality Systems

Last9 is a modern observability platform focused on high-performance telemetry management across logs, metrics, and traces. It is designed specifically for environments where cardinality and scale are non-negotiable

Key capabilities relevant to LLM workloads include:

High-cardinality handling: Last9 can ingest and query telemetry tagged with rich dimensions such as user, tenant, route, provider, model, and prompt version, without prohibitive performance or cost penalties.
Unified telemetry: Logs, metrics, and traces live in a single platform, enabling teams to move seamlessly from an SLO breach or latency spike to the exact trace and span that caused it.
OpenTelemetry-native design: Last9 is built around OTEL, making it straightforward to integrate any OTEL-speaking component.

This makes Last9 a natural fit for enterprises that are standardizing on OTEL across their infrastructure and want LLM observability to plug into that same strategy.

TrueFoundry AI Gateway: Unified Control Plane for LLM Traffic

TrueFoundry AI Gateway acts as a proxy layer between applications and LLM providers or MCP servers. It provides a unified, OpenAI-compatible interface to hundreds of models while centralizing governance, security, routing, and observability.

Core capabilities include:

Unified API access across 250+ models and providers
Low-latency routing and sophisticated load balancing
Enterprise security: RBAC, audit logging, quota and cost controls
Native observability with request/response logging, metrics, and traces

Crucially, AI Gateway can export OTEL traces to external systems, so your LLM telemetry becomes part of the same observability fabric as the rest of your infrastructure.

Key Metrics for Evaluating Gateway

Criteria	What should you evaluate ?	Priority	TrueFoundry
Latency	Adds <10ms p95 overhead for time-to-first-token?	Must Have	✅ Supported
Data Residency	Keeps logs within your region (EU/US)?	Depends on use case	✅ Supported
Latency-Based Routing	Automatically reroutes based on real-time latency/failures?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported

Evaluating an AI Gateway?

A practical guide used by platform & infra teams

Integration Overview: How TrueFoundry and Last9 Work Together

At a high level, the integration is straightforward:

Applications send all LLM traffic to TrueFoundry AI Gateway instead of directly to model providers.
AI Gateway routes the request to the configured model (OpenAI, Claude, Gemini, self-hosted, etc.), applying routing, rate limits, and guardrails as needed.
For each request, AI Gateway emits OpenTelemetry traces that capture spans for gateway handling, outbound model calls, MCP operations, and more.
These OTEL traces are exported over HTTP to Last9’s OTLP endpoint.
Inside Last9, traces are visualized in the Traces UI, with duration heatmaps, detailed trace lists, and span-level data for the tfy-llm-gateway service.

There are no code changes to application logic. Once the Gateway’s OTEL exporter is configured, every LLM request automatically becomes observable in Last9.

Prerequisites

To enable the integration, you’ll need:

TrueFoundry account with AI Gateway configured and at least one model provider set up. You can follow the Gateway Quick Start Guide in the TrueFoundry docs.
Last9 account with access to the Last9 dashboard.

With these in place, the rest of the configuration happens entirely through the respective UIs.

Step-by-Step Integration Guide

1. Retrieve the Last9 Authorization Header

From the Last9 dashboard:

Log in to Last9.
Navigate to Integrations in the left sidebar.
Click Connect on the OpenTelemetry integration card.
In the integration guide, locate “Authentication with Authorization Header.”
Copy the provided Auth Header value, which is already formatted, for example:
Basic dHJ1ZWZvdW5kcnk6...

This header will be passed directly from TrueFoundry to Last9 for OTEL authentication.

2. Configure OTEL Export in TrueFoundry AI Gateway

In the TrueFoundry console:

Go to AI Gateway → Controls → OTEL Config.
Enable the Otel Traces Exporter Configuration toggle.
Select the HTTP Configuration tab.

3. Set the Last9 OTLP Endpoint

Under HTTP configuration, provide the following values:

Traces endpoint
https://otlp.last9.io/v1/traces
Encoding
Proto

This is Last9’s OTLP ingestion endpoint for traces.

4. Add the Required Authorization Header

In the same configuration screen, click “+ Add Headers” and add: Paste the Auth Header exactly as copied from the Last9 UI (for example, Basic dHJ1ZWZvdW5kcnk6...). No additional formatting is required.

5. Save the Configuration

Click Save to apply the OTEL export settings. From this point onward, all LLM traces from the TrueFoundry AI Gateway will be exported to Last9.

6. View LLM Traces in Last9

Once LLM traffic flows through the Gateway, open the Last9 dashboard:

Navigate to the Traces section.
Filter by service name:
tfy-llm-gateway
Explore:
- Duration heatmap – visualize latency trends and outliers over time.
- Trace details – see individual traces with operation names, durations, and status codes.
- Span information – inspect spans for HTTP calls, MCP operations, and underlying LLM requests.

This gives you an end-to-end view of how the Gateway and downstream providers behave under real production conditions.

Advanced Configuration: Enriching Traces with Resource Attributes

TrueFoundry’s OTEL configuration supports Additional Resource Attributes, enabling you to attach custom metadata to every exported trace. This is particularly powerful when combined with Last9’s high-cardinality capabilities.

Typical attributes you may want to add include:

env=prod, env=staging
region=us-east-1, region=eu-west-1
team=platform, team=search
tenant_id=enterprise-customer-a

In Last9, these attributes can be used to:

Compare latency or error rates across regions and environments
Isolate incidents impacting a specific tenant or product surface
Build dashboards per team or business unit without duplicating telemetry

By planning your attribute strategy upfront, you enable richer queries and faster root-cause analysis later.

What This Integration Delivers for Your Teams

For SRE and Platform Engineering

Production-grade visibility into LLM traffic: Identify latency spikes, error hotspots, and saturation in real time, with full trace context behind each event.
Faster incident response: Move from a failing SLO to the precise trace and span causing it—whether that’s an upstream service, a specific model provider, or a misconfigured route.
Consistent tooling: Keep LLM observability within the same OTEL-based workflows and dashboards you use for the rest of your microservices.

For AI and Application Teams

Safe experimentation with models and prompts: Roll out new model versions, routing rules, or prompt strategies via TrueFoundry, and observe the impact directly in Last9’s traces and heatmaps.
Performance and cost awareness: Correlate slow or failing interactions with specific routes, tenants, or models, and feed those insights back into routing and caching policies in the Gateway.
Cleaner separation of concerns: Developers focus on application logic and agent behavior; the Gateway and Last9 jointly handle routing, governance, and observability.

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now

The fastest way to build, govern and scale your AI

Book a Demo

TrueFoundry AI Gateway integration with Last9

Last9: Observability Designed for High-Cardinality Systems

TrueFoundry AI Gateway: Unified Control Plane for LLM Traffic