Time to First Token (TTFT):
The delay between sending a request and receiving the first token of the response. This reflects the model’s initial processing latency.

Output Tokens per Second (tokens/s):
Measures how quickly the model generates response tokens, indicating generation speed and system responsiveness.

Inter-Token Latency:
The time between consecutive tokens in a streaming response. Lower values indicate smoother, more natural-feeling output in real-time applications.

Requests per Second (RPS):
The number of inference requests an LLM can handle per second—an essential measure of throughput.

LLM Locust: A Tool for Benchmarking LLM Performance

What is LLM Benchmarking?

Why Traditional Load Testing Tools Like Locust Fall Short for LLM Benchmarking (And How LLM Locust Fixes It)

Why Locust Is Great for Traditional Load Testing

The Problem: LLMs Break the Load Testing Mold

1. No Support for LLM-Specific Metrics

2. Token Streaming Inconsistency + CPU Bottlenecks

3. No Custom Charts

4. Competing Tools Are Limited

The Solution: Meet LLM Locust

How LLM Locust Works

Conclusion

Subscribe to our newsletter

Blazingly fast way to build, track and deploy your models!

Company

Product

Resources

Goodreads

LLM Locust: A Tool for Benchmarking LLM Performance

What is LLM Benchmarking?

Why Traditional Load Testing Tools Like Locust Fall Short for LLM Benchmarking (And How LLM Locust Fixes It)

Why Locust Is Great for Traditional Load Testing

The Problem: LLMs Break the Load Testing Mold

1. No Support for LLM-Specific Metrics

2. Token Streaming Inconsistency + CPU Bottlenecks

3. No Custom Charts

4. Competing Tools Are Limited

The Solution: Meet LLM Locust

How LLM Locust Works

Conclusion

Subscribe to our Newsletter

Subscribe to our newsletter

Discover More

Related Blogs

Blazingly fast way to build, track and deploy your models!

Company

Product

Resources

Goodreads

Subscribe to our newsletter