The Gateway Model Metrics Query API provides a flexible way to query gateway model metrics on model usage, performance, cost, and user activity. You can retrieve either distribution (aggregated) or timeseries metrics with powerful filtering and grouping capabilities.
Access control
Tenant admins: Can query metrics for the entire organization (tenant-wide).
Users: Can query their own data and their teams’ data.
Virtual accounts: Can query their own data and their teams’ data; with tenant-admin permissions, they can access tenant-wide data.
Authentication
You need to authenticate with your TrueFoundry API key. You can use either a Personal Access Token (PAT) or Virtual Account Token (VAT) .
To generate an API key:
Personal Access Token (PAT) : Go to Access → Personal Access Tokens in your TrueFoundry dashboard
Virtual Account Token (VAT) : Go to Access → Virtual Account Tokens (requires admin permissions)
For detailed authentication setup, see our Authentication guide .
Quick Start
Distribution Query
Get aggregated metrics distribution with multiple aggregations including count, sum, and percentiles:
import requests
response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [
{ "type" : "count" , "column" : "modelName" },
{ "type" : "sum" , "column" : "inputTokens" },
{ "type" : "sum" , "column" : "outputTokens" },
{ "type" : "p99" , "column" : "latencyMs" }
],
"groupBy" : [ "modelName" ]
}
)
print (response.json())
Timeseries Query
Get metrics over time with hourly intervals, including latency percentiles:
import requests
response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "timeseries" ,
"aggregations" : [
{ "type" : "count" , "column" : "modelName" },
{ "type" : "sum" , "column" : "inputTokens" },
{ "type" : "p99" , "column" : "latencyMs" }
],
"groupBy" : [ "modelName" ],
"intervalInSeconds" : 3600
}
)
print (response.json())
API Reference
Endpoint
POST https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query
Request Parameters
ISO 8601 timestamp for the start of the data range (e.g., "2025-01-21T00:00:00.000Z")
ISO 8601 timestamp for the end of the data range (e.g., "2025-01-22T00:00:00.000Z")
The data source to query. Use "modelMetrics" for gateway model metrics.
The type of query to execute:
"distribution" - Returns aggregated metrics
"timeseries" - Returns metrics over time intervals
Array of aggregation objects. Each aggregation specifies:
type - The aggregation type
column - The column to aggregate on
Supported aggregation types: Type Description countCount of records sumSum of values p5050th percentile (median) p7575th percentile p9090th percentile p9999th percentile
"aggregations" : [
{ "type" : "count" , "column" : "modelName" },
{ "type" : "sum" , "column" : "inputTokens" },
{ "type" : "p90" , "column" : "latencyMs" }
]
Array of fields to group the metrics by. Available options:
modelName - Group by model name
userEmail - Group by user email
virtualaccount - Group by virtual account
team - Group by team (unnests the Teams array)
metadata.<key> - Group by a custom metadata key (e.g., metadata.environment)
"groupBy" : [ "modelName" , "team" , "metadata.environment" ]
Array of filter objects to narrow down the results. See Filtering for details.
Required for timeseries queries. The time interval in seconds for grouping data points.Common values:
60 - 1 minute intervals
300 - 5 minute intervals
1800 - 30 minute intervals
3600 - 1 hour intervals
86400 - 1 day intervals
Filtering
Filters allow you to narrow down your query results. The API supports different filter operators depending on the field type.
Filter Structure
Field Filters
Metadata Filters
For standard fields, use fieldName: {
"fieldName" : "modelName" ,
"operator" : "IN" ,
"value" : [ "gpt-4" , "gpt-3.5-turbo" ]
}
For metadata fields, use metadataKey: {
"metadataKey" : "environment" ,
"operator" : "IN" ,
"value" : [ "production" ]
}
Filterable Fields
Field Type Description modelNamestring The name of the LLM model requestTypestring Type of request (e.g., “chat”, “completion”) userEmailstring The email of the user making requests virtualAccountstring The virtual account name teamarray Teams associated with the request latencyMsnumber Request latency in milliseconds conversationIDstring The conversation identifier virtualModelNamestring The virtual model name inputTokensnumber Number of input tokens outputTokensnumber Number of output tokens
Filter Operators
String Field Operators
Operator Description Example Value EQUALExact match "gpt-4"INMatch any value in the list ["gpt-4", "gpt-3.5-turbo"]NOT_INExclude values in the list ["deprecated-model"]STRING_CONTAINSContains substring "gpt"STRING_STARTS_WITHStarts with prefix "gpt-"STRING_ENDS_WITHEnds with suffix "-turbo"GREATER_THANLexicographically greater than "gpt-3"LESS_THANLexicographically less than "gpt-5"GREATER_THAN_EQUALLexicographically greater than or equal "gpt-3"LESS_THAN_EQUALLexicographically less than or equal "gpt-5"BETWEENLexicographically between two values ["a", "z"]
# Filter for specific models
{
"fieldName" : "modelName" ,
"operator" : "IN" ,
"value" : [ "gpt-4" , "gpt-3.5-turbo" ]
}
# Exclude specific users
{
"fieldName" : "userEmail" ,
"operator" : "NOT_IN" ,
"value" : [ "excluded@example.com" ]
}
# Filter models containing "gpt"
{
"fieldName" : "modelName" ,
"operator" : "STRING_CONTAINS" ,
"value" : "gpt"
}
# Filter models starting with "claude"
{
"fieldName" : "modelName" ,
"operator" : "STRING_STARTS_WITH" ,
"value" : "claude"
}
Numeric Field Operators
Operator Description Example Value EQUALExact match 1000GREATER_THANGreater than value 1000LESS_THANLess than value 5000GREATER_THAN_EQUALGreater than or equal to 100LESS_THAN_EQUALLess than or equal to 1000BETWEENBetween two values (inclusive) [500, 5000]INMatch any value in the list [100, 200, 300]NOT_INExclude values in the list [0]
# Filter for high latency requests
{
"fieldName" : "latencyMs" ,
"operator" : "GREATER_THAN" ,
"value" : 1000
}
# Filter for latency within a range
{
"fieldName" : "latencyMs" ,
"operator" : "BETWEEN" ,
"value" : [ 500 , 5000 ]
}
# Filter for requests with significant input tokens
{
"fieldName" : "inputTokens" ,
"operator" : "GREATER_THAN" ,
"value" : 100
}
Array Field Operators (Teams)
Operator Description Example Value ARRAY_HAS_ANYMatch if array contains any of the values ["team-alpha", "team-beta"]ARRAY_HAS_NONEMatch if array contains none of the values ["excluded-team"]
# Filter for specific teams
{
"fieldName" : "team" ,
"operator" : "ARRAY_HAS_ANY" ,
"value" : [ "team-alpha" , "team-beta" ]
}
# Exclude specific teams
{
"fieldName" : "team" ,
"operator" : "ARRAY_HAS_NONE" ,
"value" : [ "excluded-team" ]
}
Combining Multiple Filters
You can combine multiple filters in a single query. All filters are applied with AND logic:
{
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [],
"filters" : [
{
"fieldName" : "modelName" ,
"operator" : "IN" ,
"value" : [ "gpt-4" , "gpt-3.5-turbo" ]
},
{
"fieldName" : "latencyMs" ,
"operator" : "LESS_THAN" ,
"value" : 5000
},
{
"fieldName" : "team" ,
"operator" : "ARRAY_HAS_ANY" ,
"value" : [ "team-alpha" ]
},
{
"metadataKey" : "environment" ,
"operator" : "IN" ,
"value" : [ "production" ]
}
],
"groupBy" : [ "modelName" , "team" ]
}
Query Examples
Distribution Queries
Get request counts grouped by model: response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [
{ "type" : "count" , "column" : "modelName" }
],
"groupBy" : [ "modelName" ]
}
)
Get request counts grouped by team: response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [
{ "type" : "count" , "column" : "team" }
],
"groupBy" : [ "team" ]
}
)
Get total input and output tokens grouped by model: response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [
{ "type" : "sum" , "column" : "inputTokens" },
{ "type" : "sum" , "column" : "outputTokens" }
],
"groupBy" : [ "modelName" ]
}
)
Latency percentiles by model
Get p50, p90, and p99 latency percentiles grouped by model: response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [
{ "type" : "p50" , "column" : "latencyMs" },
{ "type" : "p90" , "column" : "latencyMs" },
{ "type" : "p99" , "column" : "latencyMs" }
],
"groupBy" : [ "modelName" ]
}
)
Multi-dimensional grouping
Group by multiple dimensions: response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [],
"groupBy" : [ "modelName" , "userEmail" , "virtualaccount" ]
}
)
Filter results to specific models: response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [],
"filters" : [
{
"fieldName" : "modelName" ,
"operator" : "IN" ,
"value" : [ "gpt-4" , "gpt-3.5-turbo" , "claude-2" ]
}
],
"groupBy" : [ "modelName" ]
}
)
Filter high latency requests
Find requests with latency above a threshold: response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [],
"filters" : [
{
"fieldName" : "latencyMs" ,
"operator" : "GREATER_THAN" ,
"value" : 1000
}
],
"groupBy" : [ "modelName" ]
}
)
Find requests within a latency range: response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [],
"filters" : [
{
"fieldName" : "latencyMs" ,
"operator" : "BETWEEN" ,
"value" : [ 500 , 5000 ]
}
],
"groupBy" : [ "modelName" ]
}
)
Filter by input and output token thresholds: response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [],
"filters" : [
{
"fieldName" : "inputTokens" ,
"operator" : "GREATER_THAN" ,
"value" : 100
},
{
"fieldName" : "outputTokens" ,
"operator" : "LESS_THAN_EQUAL" ,
"value" : 1000
}
],
"groupBy" : [ "modelName" ]
}
)
Filter to specific teams using array operators: response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [],
"filters" : [
{
"fieldName" : "team" ,
"operator" : "ARRAY_HAS_ANY" ,
"value" : [ "team-alpha" , "team-beta" ]
}
],
"groupBy" : [ "team" , "modelName" ]
}
)
Complex filter combination
Combine multiple filter types: response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [],
"filters" : [
{
"fieldName" : "modelName" ,
"operator" : "IN" ,
"value" : [ "gpt-4" , "gpt-3.5-turbo" ]
},
{
"fieldName" : "latencyMs" ,
"operator" : "BETWEEN" ,
"value" : [ 100 , 10000 ]
},
{
"fieldName" : "inputTokens" ,
"operator" : "GREATER_THAN" ,
"value" : 50
},
{
"fieldName" : "outputTokens" ,
"operator" : "LESS_THAN" ,
"value" : 2000
}
],
"groupBy" : [ "modelName" ]
}
)
Timeseries Queries
Basic timeseries (hourly)
Get hourly request counts: response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "timeseries" ,
"aggregations" : [
{ "type" : "count" , "column" : "modelName" }
],
"intervalInSeconds" : 3600
}
)
Get fine-grained metrics with 5-minute intervals: response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-21T06:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "timeseries" ,
"aggregations" : [
{ "type" : "count" , "column" : "modelName" }
],
"intervalInSeconds" : 300
}
)
Get hourly counts grouped by model: response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "timeseries" ,
"aggregations" : [
{ "type" : "count" , "column" : "modelName" }
],
"groupBy" : [ "modelName" ],
"intervalInSeconds" : 3600
}
)
Get hourly counts grouped by team: response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "timeseries" ,
"aggregations" : [
{ "type" : "count" , "column" : "team" }
],
"groupBy" : [ "team" ],
"intervalInSeconds" : 3600
}
)
Apply filters to timeseries data: response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "timeseries" ,
"aggregations" : [
{ "type" : "count" , "column" : "modelName" }
],
"filters" : [
{
"fieldName" : "modelName" ,
"operator" : "IN" ,
"value" : [ "gpt-4" , "gpt-3.5-turbo" ]
}
],
"groupBy" : [ "modelName" ],
"intervalInSeconds" : 3600
}
)
Timeseries with latency filter
Filter timeseries by latency threshold: response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "timeseries" ,
"aggregations" : [
{ "type" : "count" , "column" : "modelName" }
],
"filters" : [
{
"fieldName" : "latencyMs" ,
"operator" : "GREATER_THAN" ,
"value" : 500
}
],
"groupBy" : [ "modelName" ],
"intervalInSeconds" : 3600
}
)
Timeseries with team filter
Filter timeseries by team: response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "timeseries" ,
"aggregations" : [
{ "type" : "count" , "column" : "modelName" }
],
"filters" : [
{
"fieldName" : "team" ,
"operator" : "ARRAY_HAS_ANY" ,
"value" : [ "team-alpha" , "team-beta" ]
}
],
"groupBy" : [ "team" ],
"intervalInSeconds" : 3600
}
)
Get daily data for a week: response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-14T00:00:00.000Z" ,
"endTs" : "2025-01-21T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "timeseries" ,
"aggregations" : [
{ "type" : "count" , "column" : "modelName" }
],
"intervalInSeconds" : 86400
}
)
Combine filters, groupBy, and metadata: response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/query" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTs" : "2025-01-21T00:00:00.000Z" ,
"endTs" : "2025-01-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "timeseries" ,
"aggregations" : [
{ "type" : "count" , "column" : "modelName" }
],
"filters" : [
{
"fieldName" : "modelName" ,
"operator" : "IN" ,
"value" : [ "gpt-4" , "gpt-3.5-turbo" ]
},
{
"metadataKey" : "environment" ,
"operator" : "IN" ,
"value" : [ "production" ]
}
],
"groupBy" : [ "modelName" ],
"intervalInSeconds" : 3600
}
)
The API returns metrics data in JSON format. Aggregation results are returned with keys in camelCase format: {aggregationType}{ColumnName} where the column name is capitalized (e.g., countModelName, sumInputTokens, p99LatencyMs).
Distribution Response
{
"data" : [
{
"modelName" : "gpt-4" ,
"countModelName" : 150 ,
"sumInputTokens" : 125000 ,
"sumOutputTokens" : 45000 ,
"p99LatencyMs" : 2450.5
},
{
"modelName" : "gpt-3.5-turbo" ,
"countModelName" : 320 ,
"sumInputTokens" : 89000 ,
"sumOutputTokens" : 32000 ,
"p99LatencyMs" : 1820.3
}
]
}
Timeseries Response
{
"data" : [
{
"timestamp" : "2025-01-21T00:00:00.000Z" ,
"modelName" : "gpt-4" ,
"countModelName" : 25 ,
"sumInputTokens" : 15000 ,
"p99LatencyMs" : 2100.5
},
{
"timestamp" : "2025-01-21T01:00:00.000Z" ,
"modelName" : "gpt-4" ,
"countModelName" : 30 ,
"sumInputTokens" : 18500 ,
"p99LatencyMs" : 2350.2
}
]
}
If the groupBy array is empty, the API returns a summarized overview of all requests within the specified time range.