NEW E-BOOK | GenAI Blueprint for Enterprises with Real-World Tech Architecture. Get Now→

Case Study

No one can keep pace with the skyrocketing demand for GPUs. To increase the utilization of their GPU fleet and serve more clients, the team built a multi-agent LLM system to automate cluster optimization. The team used TrueFoundry to solve hybrid/multi-cloud management challenges, model switching, and to develop and deploy LLM agents.

Summary

NVIDIA is the world's leading supplier of GPUs. With a never-seen-before demand for GPUs globally, the team wanted to improve the performance and utilization of GPU clusters in the data centers. This solution would help them provide GPUs to more clients and improve user experience by reducing the lag time between GPU requests and fulfillment.

The solution devised was an AI system that processes all the GPU telemetry data (utilization, power consumption, memory usage, errors, etc.) collected in real-time from their clusters, rates the GPUs based on their utilization, and suggests steps to optimize the workloads.


The team built and deployed a novel multi-agent conversational system and domain-specific LLMs on the TrueFoundry platform. The agents built ML models and optimization algorithms using telemetry data to optimize the GPU utilization!

NVIDIA is Synonymous with AI Today

NVIDIA is one of the most valuable companies in the world today amidst the AI Gold Rush. It was founded in 1993 to build accelerated computing that solves challenges specific to gaming and other use cases that general-purpose computing couldn’t solve.

NVIDIA’s naming story is funny! Before the company had a name, the co-founders named all their files NV, as in "next version." Incorporating the company prompted the co-founders to review all words with those two letters. At one point, the co-founders wanted to call the company NVision, but a toilet paper manufacturer already took that name. Huang suggested the name NVIDIA from "invidia," the Latin word for "envy."

Fast-forward to 2024. NVIDIA’s GPUs are the powerhouse of most research and value creation using LLMs and GenAI. In a given quarter, NVIDIA generated over $25Bn in revenue, and NVIDIA GPUs have become so valuable that they are transported in armored cars to data centers. There is so much demand for NVIDIA’s GPUs that external and internal users often have to wait for the best-in-class versions.

Motivation: Better GPU Utilization Helps Fulfill Its Enormous Demand

Given how precious a GPU is today and how its demand is increasing exponentially, NVIDIA created a team within the company with the following objectives:

Increasing ROI from GPU Clusters

Maximizing the performance and utilization of each GPU cluster

Faster Fulfillment of GPU Requests

Improving user experience and value creation from the Existing GPUs.

Traditional Approach with ML Models has Limitations

Traditionally, this problem has been solved by looking at historical telemetry data and using domain knowledge to build Machine learning models that optimize the performance/utilization of the clusters on any given axes.


The problem with this approach is that it's influenced by:

  1. Human Biases: Is limited by the axes that dev teams could think of optimizing
  2. Non-Scalable: It does not scale with the number of workload types, problem classes, or cluster types, each of which could require its optimization technique!

This leaves many GPUs underutilized, many workloads waiting, much innovation shelved, and many human beings dissatisfied.

A whole new approach to optimization using LLM Agents

The team considered leveraging LLMs for their ability to process large datasets and deduce logical actions to improve and scale GPU optimization. A solution would require the following:


  1. Data Collection: Cluster Telemetry Data (GPU Usage, Temperature, Workloads) needs to be gathered from data centers across geographies and cloud providers.
  2. Monitoring and Analysis Dashboard: Providing a seamless way for operators to ask questions and analyze incoming data, monitor it in real time, and create visualizations
  3. Automated Optimization: A continuously monitoring agent that can process the data and take actions to optimize the cluster workloads and resource utilization.
Approach devised by the NVIDIA Team for an Automated Cluster Optimization System

The Agent Should be Able to Utilize Domain Expert’s Knowledge

NVIDIA team wanted the LLM Agent system to help the domain experts and operators generate actionable insights by letting them ask relevant domain-specific questions. The LLM Agent should be able to do all the data wrangling, code execution, and model building needed to obtain these insights. Users could ask abstract questions like:

Solution: The NVIDIA Team came up with a Novel, Multi-Agent-Based Approach

The Autonomous Observability Agents Team at NVIDIA came up with a unique approach to solve this problem where they decided to automate this optimization using AI Agents that can:


  1. Each performs a specific set of tasks
  2. Communicate with each other
  3. Build analytics and ML models
  4. Run simulations
  5. Devise Strategies to Optimize GPU Utilization/

These strategies can be surfaced to the end user through an application named Llo11yPop that allows them to ask abstract questions and let the model do the entire orchestration!

Architecture of the Multi-Agent LLM System

Challenge: A Multitude of Engineering Orchestrations were Required to Realize the Vision

This moonshot problem required the NVIDIA team to build custom foundational models, fine-tune Small Language Models (SLMs), develop specialized agents, automate distributed computing across various data sources, and run workloads on-prem and cloud service providers. Some of the engineering challenges to building such a system are:


The team decided to use the TrueFoundry platform to solve these engineering challenges and provide the necessary toolkit for model pre-training, fine-tuning, agent deployment, and more. The team wanted to focus solely on solving the business problem and developing the most performant solution.

The Stack: With TrueFoundry platform solving the engineering challenges, NVIDIA team started shipping within 6 weeks!

We could easily switch models out as per use case, and as new ones were released, this pace of fast experimentation helped us ship a working PoC in just 6 weeks

Aaron Erickson

Senior Engineering Manager

Autonomous Observability Team, NVIDIA

The NVIDIA team realized early on that to solve a complicated problem like the above; they needed to address the challenges head-on at the beginning of the project. This would enable quick iterations and rapidly support different data sources, agents, user personas, and types of questions. They leveraged the TrueFoundry platform to build a comprehensive GenAI stack.

Generative AI Infrastructure Powered by TrueFoundry

Impact of the Project

The demand for NVIDIA GPUs is virtually limitless in the AI revolution. This solution affects the utilization and faster replacement of these GPU fleets, enabling NVIDIA to provide these resources to many more clients, and much faster.

Every percentage or part of it translates into a substantial business impact. Even minor improvements in utilization enable the team to serve new clients, resulting in net new business for the company. Team TrueFoundry has been fortunate to collaborate with the NVIDIA team on an impactful project at such a transformative time for the domain.

GenAI infra- simple, faster, cheaper

Trusted by Fortune 100s and startups alike