Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.truefoundry.com/llms.txt

Use this file to discover all available pages before exploring further.

The Gateway Model Metrics Query API provides a flexible way to query gateway model metrics on model usage, performance, cost, and user activity. You can retrieve either distribution (aggregated) or timeseries model metrics with powerful filtering and grouping capabilities.
This page covers datasource: "modelMetrics". For querying MCP server / tool metrics, see API Access to MCP Metrics.

Access control

  • Tenant admins: Can query metrics for the entire organization (tenant-wide).
  • Users: Can query their own data and their teams’ data.
  • Virtual accounts: Can query their own data and their teams’ data; with tenant-admin permissions, they can access tenant-wide data.

Contents

SectionDescription
OverviewAuthentication, quick start, and API reference
FilteringFilter operators, fields, and combinations
Distribution examplesAggregated (distribution) query examples
Timeseries examplesTime-bucketed (timeseries) query examples
Response formatResponse JSON structure

Authentication

You need to authenticate with your TrueFoundry API key. You can use either a Personal Access Token (PAT) or Virtual Account Token (VAT).
To generate an API key:
  1. Personal Access Token (PAT): Go to Access → Personal Access Tokens in your TrueFoundry dashboard
  2. Virtual Account Token (VAT): Go to Access → Virtual Account Tokens (requires admin permissions)
For detailed authentication setup, see our Authentication guide.

Quick Start

By default, the API returns metrics for both models and virtual models. If you want metrics for only models or only virtual models, you must explicitly filter using the IS_NULL operator on the virtual model name field in your request filters.

Distribution Query

Get aggregated model metrics distribution with multiple aggregations including count, sum, and percentiles:
import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "distribution",
        "aggregations": [
            {"type": "count", "column": "modelName"},
            {"type": "sum", "column": "inputTokens"},
            {"type": "sum", "column": "outputTokens"},
            {"type": "p99", "column": "latencyMs"},
            {"type": "sum", "column": "costInUSD"}
        ],
        "groupBy": ["modelName"],
        "filters": [
            {
                "fieldName": "virtualModelName",
                "operator": "IS_NULL",
                "value": true
            }
        ]
    }
)

print(response.json())

Timeseries Query

Get model metrics over time with hourly intervals, including latency percentiles:
import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "timeseries",
        "aggregations": [
            {"type": "count", "column": "modelName"},
            {"type": "sum", "column": "inputTokens"},
            {"type": "p99", "column": "latencyMs"}
        ],
        "groupBy": ["modelName"],
        "intervalInSeconds": 3600,
        "filters": [
            {
                "fieldName": "virtualModelName",
                "operator": "IS_NULL",
                "value": true
            }
        ],
    }
)

print(response.json())

API Reference

Endpoint

POST https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query

Request Parameters

startTs
string
required
ISO 8601 timestamp for the start of the data range (e.g., "2025-01-21T00:00:00.000Z")
endTs
string
required
ISO 8601 timestamp for the end of the data range (e.g., "2025-01-22T00:00:00.000Z")
datasource
string
required
The data source to query. Use "modelMetrics" for gateway model metrics.
type
string
required
The type of query to execute:
  • "distribution" - Returns aggregated metrics
  • "timeseries" - Returns metrics over time intervals
aggregations
array
Array of aggregation objects. Each aggregation specifies:
  • type - The aggregation type
  • column - The column to aggregate on
Supported aggregation types:
TypeDescription
countCount of records
sumSum of values
p5050th percentile (median)
p7575th percentile
p9090th percentile
p9999th percentile
"aggregations": [
    {"type": "count", "column": "modelName"},
    {"type": "sum", "column": "inputTokens"},
    {"type": "p90", "column": "latencyMs"}
]
Supported columns for aggregation:
  • costInUSD - Cost incurred in USD
  • inputTokens - Number of input tokens
  • outputTokens - Number of output tokens
  • latencyMs - Total request latency (ms)
  • interTokenLatencyMs - Latency between the generation of consecutive tokens (ms)
  • timeToFirstTokenMs - Time to first token (ms)
  • timePerOutputTokenLatencyMs - Latency per output token (ms)
groupBy
array
Array of fields to group the metrics by. Available options:
  • modelName - Group by model name
  • userEmail - Group by user email
  • virtualaccount - Group by virtual account
  • team - Group by team (unnests the Teams array)
  • virtualModel - Group by virtual model
  • errorCode - HTTP error code returned
  • requestType - Type of model request (e.g. ChatCompletion, Embedding etc)
  • providerAccountType - Account type of provider (e.g. model, mcp-server, guardrail-config)
  • providerModelName - Underlying provider model name
  • createdBySubjectType - Subject type (e.g. user, virtualaccount)
  • metadata.<key> - Group by a custom metadata key (e.g., metadata.environment)
"groupBy": ["modelName", "team", "metadata.environment"]
filters
array
Array of filter objects to narrow down the results. See Filtering for details.
intervalInSeconds
number
Required for timeseries queries. The time interval in seconds for grouping data points.Common values:
  • 60 - 1 minute intervals
  • 300 - 5 minute intervals
  • 1800 - 30 minute intervals
  • 3600 - 1 hour intervals
  • 86400 - 1 day intervals