Benchmarking Llama-2-13B

April 16, 2024

Cognita: Building an Open Source, Modular, RAG applications for Production

LLMs & GenAI

April 11, 2024

How To Choose The Best Vector Database

LLMs & GenAI

March 28, 2024

Leveraging Fractional GPUs on Kubernetes

GPU

LLMs & GenAI

March 14, 2024

Helping Enterprises accelerate the time to value for GenAI

LLMs & GenAI

PARAMETERS	LLAMA-2-13B ON A100	LLAMA-2-13B ON A10G
Max Batch Prefill Tokens	10100	10100

Benchmarking Llama-2-13B

Model: Llama2-13B

Metrics to Benchmark

Use cases & Deployment Modes Benchmarked

Benchmarking Setup

Benchmarking Results Summary

Latency, RPS, and Cost

Tokens Per Second

Detailed Results

Subscribe to our newsletter

Cognita: Building an Open Source, Modular, RAG applications for Production

How To Choose The Best Vector Database

Leveraging Fractional GPUs on Kubernetes

Helping Enterprises accelerate the time to value for GenAI

Blazingly fast way to build, track and deploy your models!

Company

Product

Resources

Goodreads

Benchmarking Llama-2-13B

Model: Llama2-13B

Metrics to Benchmark

Use cases & Deployment Modes Benchmarked

Benchmarking Setup

Benchmarking Results Summary

Latency, RPS, and Cost

Tokens Per Second

Detailed Results

Subscribe to our Newsletter

Subscribe to our newsletter

Discover More

Cognita: Building an Open Source, Modular, RAG applications for Production

How To Choose The Best Vector Database

Leveraging Fractional GPUs on Kubernetes

Helping Enterprises accelerate the time to value for GenAI

Related Blogs

Blazingly fast way to build, track and deploy your models!

Company

Product

Resources

Goodreads

Subscribe to our newsletter