Batch API

This guide explains how to perform batch predictions using TrueFoundry’s AI Gateway with OpenAI, Azure OpenAI, Vertex AI, or AWS Bedrock providers.

Client Setup

All providers use the OpenAI SDK with provider-specific headers. Choose your provider to get started:

from openai import OpenAI

client = OpenAI(
    api_key="your-truefoundry-api-key",
    base_url="https://{controlPlaneUrl}/api/llm",
    default_headers={
        "x-tfy-provider-name": "openai-main"  # truefoundry provider integration name
    }
)

Input File Format

Create a JSONL file with one JSON object per line. Each line represents a single request:

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
{"custom_id": "request-3", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4-vision-preview", "messages": [{"role": "user", "content": [{"type": "text", "text": "What's in this image?"}, {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}]}], "max_tokens": 1000}}

Requirements: Valid JSON per line, meaningful custom_id values. See provider notes below for any provider-specific limits (e.g. minimum records, URL/body format).

Before using AWS Bedrock batch processing, ensure you have:

S3 Bucket: For storing input and output files
IAM Execution Role: With permissions for S3 access and Bedrock model invocation
User Permissions: Including iam:PassRole to pass the execution role to Bedrock

Before using Vertex AI batch processing, ensure you have:

Cloud Storage Bucket: The bucket must be in the same region as the model, and the service account must have read/write access to the bucket
Batch Prediction Permissions: The service account must have permission to create batch prediction jobs

Before using Azure OpenAI batch processing, ensure you have:

Deployment type: A model deployed as Global Batch or Data Zone Batch (standard/online deployments cannot be used for batch).
Headers: Set x-tfy-azure-api-version (e.g. 2024-12-01-preview) to match the API version used by your batch deployment.
Endpoint: When creating the job, use endpoint="/chat/completions" for chat (Azure API uses this form). For the Responses API, use url: "/v1/responses" and body.input in each JSONL line.

See Azure OpenAI batch documentation for details.

Workflow Steps

The batch process follows these steps for all providers:

Upload: Upload JSONL file → Get file ID
Create: Create batch job → Get batch ID
Monitor: Check status until complete
Fetch: Download results

Step-by-Step Examples

1. Upload Input File

from openai import OpenAI

client = OpenAI(
    api_key="your-truefoundry-api-key",
    base_url="https://{controlPlaneUrl}/api/llm",
    default_headers={
        "x-tfy-provider-name": "openai-main"  # truefoundry provider integration name
    }
)

# Upload the input file
file = client.files.create(
    file=open("request.jsonl", "rb"),
    purpose="batch"
)

print(file.id)  # Example: file-PnFGrFLN5LjjcWr4eFsStK

2. Create Batch Job

from openai import OpenAI

client = OpenAI(
    api_key="your-truefoundry-api-key",
    base_url="https://{controlPlaneUrl}/api/llm",
    default_headers={
        "x-tfy-provider-name": "openai-main"  # truefoundry provider integration name
    }
)

batch_job = client.batches.create(
    input_file_id=file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

print(batch_job.id)  # Example: batch_67f7bfc50b288190893f242d9fa47c52

3. Check Batch Status

from openai import OpenAI

client = OpenAI(
    api_key="your-truefoundry-api-key",
    base_url="https://{controlPlaneUrl}/api/llm",
    default_headers={
        "x-tfy-provider-name": "openai-main"  # truefoundry provider integration name
    }
)

batch_status = client.batches.retrieve(batch_job.id)
print(batch_status.status)  # Example: completed, validating, in_progress, etc.

4. Fetch Results

from openai import OpenAI

client = OpenAI(
    api_key="your-truefoundry-api-key",
    base_url="https://{controlPlaneUrl}/api/llm",
    default_headers={
        "x-tfy-provider-name": "openai-main"  # truefoundry provider integration name
    }
)

if batch_status.status == "completed":
    output_content = client.files.content(batch_status.output_file_id)
    print(output_content.content.decode('utf-8'))

Batch Status Reference

validating: Initial validation of the batch
in_progress: Processing the requests
completed: All requests processed successfully
failed: Batch processing failed (use error_file_id from the batch response to fetch error details)
finalizing: Results being prepared (OpenAI / Azure OpenAI)
expired: Did not complete within the time window (OpenAI / Azure OpenAI)
cancelling / cancelled: Batch cancelled; partial results may be available via output_file_id

Best Practices

File Format: Use meaningful custom_id values and valid JSONL format
Error Handling: Implement proper error handling and status monitoring
Security: Store API keys securely, use minimal IAM permissions
Provider-specific: Check the prerequisites above (e.g. Azure: batch deployment type and endpoint; Bedrock: minimum 100 records, IAM and S3)

Vertex AI Permissions

The following permissions are required for Vertex AI batch prediction:

Storage and service account requirements

Cloud Storage bucket

The Cloud Storage bucket used for batch input and output must be in the same region as the Vertex AI model you are using.
The service account that runs the batch job must have read and write access to this bucket (e.g. roles/storage.objectAdmin or equivalent object-level read/write permissions on the bucket).

Batch prediction job permissions

Create batch prediction jobs

The service account must have permission to create and manage batch prediction jobs in Vertex AI (e.g. roles/aiplatform.user or the aiplatform.batchPredictionJobs.create permission).

AWS Bedrock Permissions

User Permissions (for API calls)

These are the minimum permissions required to use the Bedrock Batch APIs. For complete official guidance, see AWS Bedrock Batch Inference Permissions.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:ListFoundationModels",
        "bedrock:GetFoundationModel",
        "bedrock:ListInferenceProfiles",
        "bedrock:GetInferenceProfile",
        "bedrock:ListCustomModels",
        "bedrock:GetCustomModel",
        "bedrock:TagResource",
        "bedrock:UntagResource",
        "bedrock:ListTagsForResource",
        "bedrock:CreateModelInvocationJob",
        "bedrock:GetModelInvocationJob",
        "bedrock:ListModelInvocationJobs",
        "bedrock:StopModelInvocationJob"
      ],
      "Resource": [
        "arn:aws:bedrock:<region>:<account_id>:model-customization-job/*",
        "arn:aws:bedrock:<region>:<account_id>:custom-model/*",
        "arn:aws:bedrock:<region>::foundation-model/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": ["s3:ListBucket", "s3:PutObject", "s3:GetObject", "s3:GetObjectAttributes"],
      "Resource": ["arn:aws:s3:::<bucket>", "arn:aws:s3:::<bucket>/*"]
    },
    {
      "Action": ["iam:PassRole"],
      "Effect": "Allow",
      "Resource": "arn:aws:iam::<account_id>:role/<service_role_name>",
      "Condition": {
        "StringEquals": {
          "iam:PassedToService": ["bedrock.amazonaws.com"]
        }
      }
    }
  ]
}

Service Role Permissions (for batch execution)

The service role (role_arn) used for creating and executing the batch job requires:Trust Relationship:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "bedrock.amazonaws.com"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "aws:SourceAccount": "<account_id>"
        },
        "ArnEquals": {
          "aws:SourceArn": "arn:aws:bedrock:<region>:<account_id>:model-invocation-job/*"
        }
      }
    }
  ]
}

Permission Policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject", "s3:ListBucket"],
      "Resource": ["arn:aws:s3:::<bucket>", "arn:aws:s3:::<bucket>/*"]
    }
  ]
}

Get Started

Developer Guide

MCP Registry and Gateway

Agent Hub

Guardrails and Security

Prompt Management

Observability

Deployment

Admin Guide

API Reference

Chat

Agent

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Moderations

Models

Client Setup

Input File Format

Workflow Steps

Step-by-Step Examples

Batch Status Reference

Best Practices

Vertex AI Permissions

AWS Bedrock Permissions

Get Started

Developer Guide

MCP Registry and Gateway

Agent Hub

Guardrails and Security

Prompt Management

Observability

Deployment

Admin Guide

API Reference

Chat

Agent

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Moderations

Models

​Client Setup

​Input File Format

​Workflow Steps

​Step-by-Step Examples

​Batch Status Reference

​Best Practices

​Vertex AI Permissions

​AWS Bedrock Permissions

Client Setup

Input File Format

Workflow Steps

Step-by-Step Examples

Batch Status Reference

Best Practices

Vertex AI Permissions

AWS Bedrock Permissions