Truefoundry Docs

The Gateway Model Metrics Query API provides a flexible way to query gateway model metrics on model usage, performance, cost, and user activity. You can retrieve either distribution (aggregated) or timeseries metrics with powerful filtering and grouping capabilities.

Access control

Tenant admins: Can query metrics for the entire organization (tenant-wide).
Users: Can query their own data and their teams’ data.
Virtual accounts: Can query their own data and their teams’ data; with tenant-admin permissions, they can access tenant-wide data.

Authentication

You need to authenticate with your TrueFoundry API key. You can use either a Personal Access Token (PAT) or Virtual Account Token (VAT).

Get your API key

To generate an API key:

Personal Access Token (PAT): Go to Access → Personal Access Tokens in your TrueFoundry dashboard
Virtual Account Token (VAT): Go to Access → Virtual Account Tokens (requires admin permissions)

For detailed authentication setup, see our Authentication guide.

Quick Start

Distribution Query

Get aggregated metrics distribution with multiple aggregations including count, sum, and percentiles:

import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "distribution",
        "aggregations": [
            {"type": "count", "column": "modelName"},
            {"type": "sum", "column": "inputTokens"},
            {"type": "sum", "column": "outputTokens"},
            {"type": "p99", "column": "latencyMs"}
        ],
        "groupBy": ["modelName"]
    }
)

print(response.json())

Timeseries Query

Get metrics over time with hourly intervals, including latency percentiles:

import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "timeseries",
        "aggregations": [
            {"type": "count", "column": "modelName"},
            {"type": "sum", "column": "inputTokens"},
            {"type": "p99", "column": "latencyMs"}
        ],
        "groupBy": ["modelName"],
        "intervalInSeconds": 3600
    }
)

print(response.json())

API Reference

Endpoint

POST https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query

Request Parameters

startTs

string

required

ISO 8601 timestamp for the start of the data range (e.g., "2025-01-21T00:00:00.000Z")

endTs

string

required

ISO 8601 timestamp for the end of the data range (e.g., "2025-01-22T00:00:00.000Z")

datasource

string

required

The data source to query. Use "modelMetrics" for gateway model metrics.

type

string

required

The type of query to execute:

"distribution" - Returns aggregated metrics
"timeseries" - Returns metrics over time intervals

aggregations

array

Array of aggregation objects. Each aggregation specifies:

type - The aggregation type
column - The column to aggregate on

Supported aggregation types:

Type	Description
`count`	Count of records
`sum`	Sum of values
`p50`	50th percentile (median)
`p75`	75th percentile
`p90`	90th percentile
`p99`	99th percentile

"aggregations": [
    {"type": "count", "column": "modelName"},
    {"type": "sum", "column": "inputTokens"},
    {"type": "p90", "column": "latencyMs"}
]

groupBy

array

Array of fields to group the metrics by. Available options:

modelName - Group by model name
userEmail - Group by user email
virtualaccount - Group by virtual account
team - Group by team (unnests the Teams array)
metadata.<key> - Group by a custom metadata key (e.g., metadata.environment)

"groupBy": ["modelName", "team", "metadata.environment"]

filters

array

Array of filter objects to narrow down the results. See Filtering for details.

intervalInSeconds

number

Required for timeseries queries. The time interval in seconds for grouping data points.Common values:

60 - 1 minute intervals
300 - 5 minute intervals
1800 - 30 minute intervals
3600 - 1 hour intervals
86400 - 1 day intervals

Filtering

Filters allow you to narrow down your query results. The API supports different filter operators depending on the field type.

Filter Structure

Field Filters
Metadata Filters

For standard fields, use fieldName:

{
    "fieldName": "modelName",
    "operator": "IN",
    "value": ["gpt-4", "gpt-3.5-turbo"]
}

For metadata fields, use metadataKey:

{
    "metadataKey": "environment",
    "operator": "IN",
    "value": ["production"]
}

Filterable Fields

Field	Type	Description
`modelName`	string	The name of the LLM model
`requestType`	string	Type of request (e.g., “chat”, “completion”)
`userEmail`	string	The email of the user making requests
`virtualAccount`	string	The virtual account name
`team`	array	Teams associated with the request
`latencyMs`	number	Request latency in milliseconds
`conversationID`	string	The conversation identifier
`virtualModelName`	string	The virtual model name
`inputTokens`	number	Number of input tokens
`outputTokens`	number	Number of output tokens

Filter Operators

String Field Operators

Operator	Description	Example Value
`EQUAL`	Exact match	`"gpt-4"`
`IN`	Match any value in the list	`["gpt-4", "gpt-3.5-turbo"]`
`NOT_IN`	Exclude values in the list	`["deprecated-model"]`
`STRING_CONTAINS`	Contains substring	`"gpt"`
`STRING_STARTS_WITH`	Starts with prefix	`"gpt-"`
`STRING_ENDS_WITH`	Ends with suffix	`"-turbo"`
`GREATER_THAN`	Lexicographically greater than	`"gpt-3"`
`LESS_THAN`	Lexicographically less than	`"gpt-5"`
`GREATER_THAN_EQUAL`	Lexicographically greater than or equal	`"gpt-3"`
`LESS_THAN_EQUAL`	Lexicographically less than or equal	`"gpt-5"`
`BETWEEN`	Lexicographically between two values	`["a", "z"]`

# Filter for specific models
{
    "fieldName": "modelName",
    "operator": "IN",
    "value": ["gpt-4", "gpt-3.5-turbo"]
}

# Exclude specific users
{
    "fieldName": "userEmail",
    "operator": "NOT_IN",
    "value": ["excluded@example.com"]
}

# Filter models containing "gpt"
{
    "fieldName": "modelName",
    "operator": "STRING_CONTAINS",
    "value": "gpt"
}

# Filter models starting with "claude"
{
    "fieldName": "modelName",
    "operator": "STRING_STARTS_WITH",
    "value": "claude"
}

Numeric Field Operators

Operator	Description	Example Value
`EQUAL`	Exact match	`1000`
`GREATER_THAN`	Greater than value	`1000`
`LESS_THAN`	Less than value	`5000`
`GREATER_THAN_EQUAL`	Greater than or equal to	`100`
`LESS_THAN_EQUAL`	Less than or equal to	`1000`
`BETWEEN`	Between two values (inclusive)	`[500, 5000]`
`IN`	Match any value in the list	`[100, 200, 300]`
`NOT_IN`	Exclude values in the list	`[0]`

# Filter for high latency requests
{
    "fieldName": "latencyMs",
    "operator": "GREATER_THAN",
    "value": 1000
}

# Filter for latency within a range
{
    "fieldName": "latencyMs",
    "operator": "BETWEEN",
    "value": [500, 5000]
}

# Filter for requests with significant input tokens
{
    "fieldName": "inputTokens",
    "operator": "GREATER_THAN",
    "value": 100
}

Array Field Operators (Teams)

Operator	Description	Example Value
`ARRAY_HAS_ANY`	Match if array contains any of the values	`["team-alpha", "team-beta"]`
`ARRAY_HAS_NONE`	Match if array contains none of the values	`["excluded-team"]`

# Filter for specific teams
{
    "fieldName": "team",
    "operator": "ARRAY_HAS_ANY",
    "value": ["team-alpha", "team-beta"]
}

# Exclude specific teams
{
    "fieldName": "team",
    "operator": "ARRAY_HAS_NONE",
    "value": ["excluded-team"]
}

Combining Multiple Filters

You can combine multiple filters in a single query. All filters are applied with AND logic:

{
    "startTs": "2025-01-21T00:00:00.000Z",
    "endTs": "2025-01-22T00:00:00.000Z",
    "datasource": "modelMetrics",
    "type": "distribution",
    "aggregations": [],
    "filters": [
        {
            "fieldName": "modelName",
            "operator": "IN",
            "value": ["gpt-4", "gpt-3.5-turbo"]
        },
        {
            "fieldName": "latencyMs",
            "operator": "LESS_THAN",
            "value": 5000
        },
        {
            "fieldName": "team",
            "operator": "ARRAY_HAS_ANY",
            "value": ["team-alpha"]
        },
        {
            "metadataKey": "environment",
            "operator": "IN",
            "value": ["production"]
        }
    ],
    "groupBy": ["modelName", "team"]
}

Query Examples

Distribution Queries

Count by model name

Get request counts grouped by model:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "distribution",
        "aggregations": [
            {"type": "count", "column": "modelName"}
        ],
        "groupBy": ["modelName"]
    }
)

Count by team

Get request counts grouped by team:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "distribution",
        "aggregations": [
            {"type": "count", "column": "team"}
        ],
        "groupBy": ["team"]
    }
)

Sum tokens by model

Get total input and output tokens grouped by model:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "distribution",
        "aggregations": [
            {"type": "sum", "column": "inputTokens"},
            {"type": "sum", "column": "outputTokens"}
        ],
        "groupBy": ["modelName"]
    }
)

Latency percentiles by model

Get p50, p90, and p99 latency percentiles grouped by model:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "distribution",
        "aggregations": [
            {"type": "p50", "column": "latencyMs"},
            {"type": "p90", "column": "latencyMs"},
            {"type": "p99", "column": "latencyMs"}
        ],
        "groupBy": ["modelName"]
    }
)

Multi-dimensional grouping

Group by multiple dimensions:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "distribution",
        "aggregations": [],
        "groupBy": ["modelName", "userEmail", "virtualaccount"]
    }
)

Group by metadata

Group by a custom metadata key:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "distribution",
        "aggregations": [],
        "groupBy": ["modelName", "metadata.environment"]
    }
)

Filter by model name

Filter results to specific models:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "distribution",
        "aggregations": [],
        "filters": [
            {
                "fieldName": "modelName",
                "operator": "IN",
                "value": ["gpt-4", "gpt-3.5-turbo", "claude-2"]
            }
        ],
        "groupBy": ["modelName"]
    }
)

Filter high latency requests

Find requests with latency above a threshold:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "distribution",
        "aggregations": [],
        "filters": [
            {
                "fieldName": "latencyMs",
                "operator": "GREATER_THAN",
                "value": 1000
            }
        ],
        "groupBy": ["modelName"]
    }
)

Filter by latency range

Find requests within a latency range:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "distribution",
        "aggregations": [],
        "filters": [
            {
                "fieldName": "latencyMs",
                "operator": "BETWEEN",
                "value": [500, 5000]
            }
        ],
        "groupBy": ["modelName"]
    }
)

Filter by token counts

Filter by input and output token thresholds:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "distribution",
        "aggregations": [],
        "filters": [
            {
                "fieldName": "inputTokens",
                "operator": "GREATER_THAN",
                "value": 100
            },
            {
                "fieldName": "outputTokens",
                "operator": "LESS_THAN_EQUAL",
                "value": 1000
            }
        ],
        "groupBy": ["modelName"]
    }
)

Filter by team

Filter to specific teams using array operators:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "distribution",
        "aggregations": [],
        "filters": [
            {
                "fieldName": "team",
                "operator": "ARRAY_HAS_ANY",
                "value": ["team-alpha", "team-beta"]
            }
        ],
        "groupBy": ["team", "modelName"]
    }
)

Filter by metadata

Filter by custom metadata values:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "distribution",
        "aggregations": [],
        "filters": [
            {
                "metadataKey": "environment",
                "operator": "IN",
                "value": ["production"]
            }
        ],
        "groupBy": ["modelName"]
    }
)

Complex filter combination

Combine multiple filter types:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "distribution",
        "aggregations": [],
        "filters": [
            {
                "fieldName": "modelName",
                "operator": "IN",
                "value": ["gpt-4", "gpt-3.5-turbo"]
            },
            {
                "fieldName": "latencyMs",
                "operator": "BETWEEN",
                "value": [100, 10000]
            },
            {
                "fieldName": "inputTokens",
                "operator": "GREATER_THAN",
                "value": 50
            },
            {
                "fieldName": "outputTokens",
                "operator": "LESS_THAN",
                "value": 2000
            }
        ],
        "groupBy": ["modelName"]
    }
)

Timeseries Queries

Basic timeseries (hourly)

Get hourly request counts:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "timeseries",
        "aggregations": [
            {"type": "count", "column": "modelName"}
        ],
        "intervalInSeconds": 3600
    }
)

5-minute intervals

Get fine-grained metrics with 5-minute intervals:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-21T06:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "timeseries",
        "aggregations": [
            {"type": "count", "column": "modelName"}
        ],
        "intervalInSeconds": 300
    }
)

Timeseries by model

Get hourly counts grouped by model:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "timeseries",
        "aggregations": [
            {"type": "count", "column": "modelName"}
        ],
        "groupBy": ["modelName"],
        "intervalInSeconds": 3600
    }
)

Timeseries by team

Get hourly counts grouped by team:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "timeseries",
        "aggregations": [
            {"type": "count", "column": "team"}
        ],
        "groupBy": ["team"],
        "intervalInSeconds": 3600
    }
)

Timeseries with filters

Apply filters to timeseries data:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "timeseries",
        "aggregations": [
            {"type": "count", "column": "modelName"}
        ],
        "filters": [
            {
                "fieldName": "modelName",
                "operator": "IN",
                "value": ["gpt-4", "gpt-3.5-turbo"]
            }
        ],
        "groupBy": ["modelName"],
        "intervalInSeconds": 3600
    }
)

Timeseries with latency filter

Filter timeseries by latency threshold:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "timeseries",
        "aggregations": [
            {"type": "count", "column": "modelName"}
        ],
        "filters": [
            {
                "fieldName": "latencyMs",
                "operator": "GREATER_THAN",
                "value": 500
            }
        ],
        "groupBy": ["modelName"],
        "intervalInSeconds": 3600
    }
)

Timeseries with team filter

Filter timeseries by team:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "timeseries",
        "aggregations": [
            {"type": "count", "column": "modelName"}
        ],
        "filters": [
            {
                "fieldName": "team",
                "operator": "ARRAY_HAS_ANY",
                "value": ["team-alpha", "team-beta"]
            }
        ],
        "groupBy": ["team"],
        "intervalInSeconds": 3600
    }
)

Timeseries by metadata

Group timeseries by metadata:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "timeseries",
        "aggregations": [
            {"type": "count", "column": "modelName"}
        ],
        "groupBy": ["metadata.environment"],
        "intervalInSeconds": 3600
    }
)

Weekly timeseries

Get daily data for a week:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-14T00:00:00.000Z",
        "endTs": "2025-01-21T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "timeseries",
        "aggregations": [
            {"type": "count", "column": "modelName"}
        ],
        "intervalInSeconds": 86400
    }
)

Complex timeseries query

Combine filters, groupBy, and metadata:

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2025-01-21T00:00:00.000Z",
        "endTs": "2025-01-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "timeseries",
        "aggregations": [
            {"type": "count", "column": "modelName"}
        ],
        "filters": [
            {
                "fieldName": "modelName",
                "operator": "IN",
                "value": ["gpt-4", "gpt-3.5-turbo"]
            },
            {
                "metadataKey": "environment",
                "operator": "IN",
                "value": ["production"]
            }
        ],
        "groupBy": ["modelName"],
        "intervalInSeconds": 3600
    }
)

Response Format

The API returns metrics data in JSON format. Aggregation results are returned with keys in camelCase format: {aggregationType}{ColumnName} where the column name is capitalized (e.g., countModelName, sumInputTokens, p99LatencyMs).

Distribution Response

{
    "data": [
        {
            "modelName": "gpt-4",
            "countModelName": 150,
            "sumInputTokens": 125000,
            "sumOutputTokens": 45000,
            "p99LatencyMs": 2450.5
        },
        {
            "modelName": "gpt-3.5-turbo",
            "countModelName": 320,
            "sumInputTokens": 89000,
            "sumOutputTokens": 32000,
            "p99LatencyMs": 1820.3
        }
    ]
}

Timeseries Response

{
    "data": [
        {
            "timestamp": "2025-01-21T00:00:00.000Z",
            "modelName": "gpt-4",
            "countModelName": 25,
            "sumInputTokens": 15000,
            "p99LatencyMs": 2100.5
        },
        {
            "timestamp": "2025-01-21T01:00:00.000Z",
            "modelName": "gpt-4",
            "countModelName": 30,
            "sumInputTokens": 18500,
            "p99LatencyMs": 2350.2
        }
    ]
}

If the groupBy array is empty, the API returns a summarized overview of all requests within the specified time range.

Get Started

Developer Guide

MCP Registry and Gateway

Agent Hub

Guardrails and Security

Prompt Management

Observability

Deployment

Admin Guide

API Reference

Chat

Agent

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Moderations

Models

​Access control

​Authentication

​Quick Start

​Distribution Query

​Timeseries Query

​API Reference

​Endpoint

​Request Parameters

​Filtering

​Filter Structure

​Filterable Fields

​Filter Operators

​String Field Operators

​Numeric Field Operators

​Array Field Operators (Teams)

​Combining Multiple Filters

​Query Examples

​Distribution Queries

​Timeseries Queries

​Response Format

​Distribution Response

​Timeseries Response

Access control

Authentication

Quick Start

Distribution Query

Timeseries Query

API Reference

Endpoint

Request Parameters

Filtering

Filter Structure

Filterable Fields

Filter Operators

String Field Operators

Numeric Field Operators

Array Field Operators (Teams)

Combining Multiple Filters

Query Examples

Distribution Queries

Timeseries Queries

Response Format

Distribution Response

Timeseries Response