API Access to Model Metrics

The Gateway Model Metrics Query API provides a flexible way to query gateway model metrics on model usage, performance, cost, and user activity. You can retrieve either distribution (aggregated) or timeseries model metrics with powerful filtering and grouping capabilities.

This page covers datasource: "modelMetrics". For querying MCP server / tool metrics, see API Access to MCP Metrics.

Access control

Tenant admins: Can query metrics for the entire organization (tenant-wide).
Users: Can query their own data and their teams’ data.
Virtual accounts: Can query their own data and their teams’ data; with tenant-admin permissions, they can access tenant-wide data.

Section	Description
Overview	Authentication, quick start, and API reference
Filtering	Filter operators, fields, and combinations
Distribution examples	Aggregated (distribution) query examples
Timeseries examples	Time-bucketed (timeseries) query examples
Response format	Response JSON structure

Authentication

You need to authenticate with your TrueFoundry API key. You can use either a Personal Access Token (PAT) or Virtual Account Token (VAT).

Get your API key

To generate an API key:

Personal Access Token (PAT): Go to Access → Personal Access Tokens in your TrueFoundry dashboard
Virtual Account Token (VAT): Go to Access → Virtual Account Tokens (requires admin permissions)

For detailed authentication setup, see our Authentication guide.

Quick Start

By default, the API returns metrics for both models and virtual models. If you want metrics for only models or only virtual models, you must explicitly filter using the IS_NULL operator on the virtual model name field in your request filters.

Distribution Query

Get aggregated model metrics distribution with multiple aggregations including count, sum, and percentiles:

import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "distribution",
        "aggregations": [
            {"type": "count", "column": "modelName"},
            {"type": "sum", "column": "inputTokens"},
            {"type": "sum", "column": "outputTokens"},
            {"type": "p99", "column": "latencyMs"},
            {"type": "sum", "column": "costInUSD"}
        ],
        "groupBy": ["modelName"],
        "filters": [
            {
                "fieldName": "virtualModelName",
                "operator": "IS_NULL",
                "value": true
            }
        ]
    }
)

print(response.json())

Timeseries Query

Get model metrics over time with hourly intervals, including latency percentiles:

import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "timeseries",
        "aggregations": [
            {"type": "count", "column": "modelName"},
            {"type": "sum", "column": "inputTokens"},
            {"type": "p99", "column": "latencyMs"}
        ],
        "groupBy": ["modelName"],
        "intervalInSeconds": 3600,
        "filters": [
            {
                "fieldName": "virtualModelName",
                "operator": "IS_NULL",
                "value": true
            }
        ],
    }
)

print(response.json())

API Reference

Endpoint

POST https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query

Request Parameters

startTs

string

required

ISO 8601 timestamp for the start of the data range (e.g., "2025-01-21T00:00:00.000Z")

endTs

string

required

ISO 8601 timestamp for the end of the data range (e.g., "2025-01-22T00:00:00.000Z")

datasource

string

required

The data source to query. Use "modelMetrics" for gateway model metrics.

type

string

required

The type of query to execute:

"distribution" - Returns aggregated metrics
"timeseries" - Returns metrics over time intervals

aggregations

array

Array of aggregation objects. Each aggregation specifies:

type - The aggregation type
column - The column to aggregate on

Supported aggregation types:

Type	Description
`count`	Count of records
`sum`	Sum of values
`p50`	50th percentile (median)
`p75`	75th percentile
`p90`	90th percentile
`p99`	99th percentile

"aggregations": [
    {"type": "count", "column": "modelName"},
    {"type": "sum", "column": "inputTokens"},
    {"type": "p90", "column": "latencyMs"}
]

Supported columns for aggregation:

costInUSD - Cost incurred in USD
inputTokens - Number of input tokens
outputTokens - Number of output tokens
latencyMs - Total request latency (ms)
interTokenLatencyMs - Latency between the generation of consecutive tokens (ms)
timeToFirstTokenMs - Time to first token (ms)
timePerOutputTokenLatencyMs - Latency per output token (ms)

groupBy

array

Array of fields to group the metrics by. Available options:

modelName - Group by model name
userEmail - Group by user email
virtualaccount - Group by virtual account
team - Group by team (unnests the Teams array)
virtualModel - Group by virtual model
errorCode - HTTP error code returned
requestType - Type of model request (e.g. ChatCompletion, Embedding etc)
providerAccountType - Account type of provider (e.g. model, mcp-server, guardrail-config)
providerModelName - Underlying provider model name
createdBySubjectType - Subject type (e.g. user, virtualaccount)
metadata.<key> - Group by a custom metadata key (e.g., metadata.environment)

"groupBy": ["modelName", "team", "metadata.environment"]

filters

array

Array of filter objects to narrow down the results. See Filtering for details.

intervalInSeconds

number

Required for timeseries queries. The time interval in seconds for grouping data points.Common values:

60 - 1 minute intervals
300 - 5 minute intervals
1800 - 30 minute intervals
3600 - 1 hour intervals
86400 - 1 day intervals

Get Started

LLM Gateway

MCP Registry and Gateway

Agent Registry

Skills Registry

Guardrails and Security

Prompt Management

Observability

Deployment

Admin Guide

Chat

Agent

Messages

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Fine-tuning

Moderations

Models

Access control

Contents

Authentication

Quick Start

Distribution Query

Timeseries Query

API Reference

Endpoint

Request Parameters

Query Examples

Get Started

LLM Gateway

MCP Registry and Gateway

Agent Registry

Skills Registry

Guardrails and Security

Prompt Management

Observability

Deployment

Admin Guide

Chat

Agent

Messages

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Fine-tuning

Moderations

Models

Documentation Index

​Access control

​Contents

​Authentication

​Quick Start

​Distribution Query

​Timeseries Query

​API Reference

​Endpoint

​Request Parameters

Query Examples

Access control

Contents

Authentication

Quick Start

Distribution Query

Timeseries Query

API Reference

Endpoint

Request Parameters