Building Compound AI Systems

September 5, 2024
Share this post
https://www.truefoundry.com/blog/building-compound-ai-systems
URL
Building Compound AI Systems

AI systems that rely on single monolithic models are limited by their design. These models are built to handle a wide range of general tasks but often struggle to adapt to specific contexts. Generative AI models are fundamentally probabilistic in nature, which can lead to hallucinations. Additionally, large language models require significant computational power and memory, making them resource-intensive. Most companies today are optimizing for "intelligence per dollar." All of this has led to the development of complex AI systems with multiple components.

  • For example, ChatGPT Plus can use external tools like web browsing or code execution to improve its responses. It decides when it needs extra information beyond its training, such as accessing up-to-date data from the web or running a Python script.
  • Another example is Retrieval-Augmented Generation (RAG), which can include various components. These may consist of a retriever, embedding model, reranker, large language model (LLM), prompt construction module, post-processing module for filtering and answer verification, and a caching system for frequently used documents.

In the tweet below, Matei Zaharia, co-founder and CTO at Databricks and Professor at UC Berkeley, highlights an important point - the shift to ‘thinking in systems’. The example he gives—32-CoT (Chain of Thought prompting with 32 steps) vs. 5-shot learning—illustrates that different systems can behave very differently depending on the context of how they are used, even with the same base model. The point he stresses is that focusing on the broader system and its components is essential to fully understand and benchmark the performance of AI systems, rather than just isolating the model itself.

Read the tweet here

What Are Compound AI Systems?

Compound AI systems refer to AI architectures that consist of multiple models and components working together to perform tasks that a single model cannot handle efficiently. Berkeley AI coined the term in a blog that highlights the shift from using single models to compound systems.

This new paradigm leverages the strengths of various AI models, tools, and processing steps to enhance performance, versatility, and reusability.

Key Components of a Compound AI System

A typical Compound AI system can vary depending on the use case, but some common, repeatable components include:

  1. Large Language Models (LLMs): Generate and verify natural language responses based on user inputs and context.
  2. Retrievers: Fetch relevant information from databases or external sources to inform the system's responses.
  3. Databases/ VectorDBs: Store structured and unstructured data for easy querying and retrieval by the AI system.
  4. External Tools: Access APIs and services to perform specific functions, such as web browsing or executing code.
  5. Embedding Models: Convert data into vector representations for efficient similarity searches and retrieval tasks.
  6. Rerankers: Evaluate and prioritize retrieved results to ensure the most relevant information is presented.
  7. Prompt Construction Modules: Formulate effective prompts to optimize input for large language models.
  8. Post-processing Modules: Filter and verify generated outputs to ensure quality and coherence before delivery.
  9. Caching Systems: Store frequently accessed responses to improve efficiency and reduce retrieval latency.
  10. Task and Data Planners: Manage task orchestration and data flow to optimize component interactions and resource use.
  11. Evaluation Modules: Assess the system's performance and output quality to guide improvements and fine-tuning.
  12. Monitoring and Feedback Systems: Continuously track performance and gather user feedback for ongoing adaptation and enhancement.
  13. Fine-Tuning Modules: Adapt pre-trained models to specific tasks or domains by training them on targeted datasets 
  14. Agent Frameworks: Provide a structure for building agents that can autonomously perform tasks & make decisions.

An example of an advanced RAG Pipeline used by Elastic here

Example - A sample RAG pipeline used by Elastic. Read more here 

Why use compound AI systems?

The Berkeley blog lays out very well why compound systems are important - 

  1. Better improvement via system design that training - Some tasks are better improved through system design rather than just adding more resources. Large language models (LLMs) benefit from more computing power, but the cost often outweighs the gains.
  2. More flexibility to create dynamic systems - Since machine learning models learn from static data sets, their knowledge is fixed. Developers can enhance these models by integrating them with other components like search functions to pull in timely data.
  3. Improve control and trust - Training influences neural networks but doesn't ensure they avoid certain behaviors. Building an AI system that filters outputs can provide tighter control. For example, combining LLMs with fact-checking tools can make them more trustworthy by adding citations or verifying data.
  4. Balancing cost and quality - Developers need to design systems that can use budgets effectively.Making trade offs between costs and quality/precision is often required based on the use case. 

Challenges of Compound AI Systems

Compound AI Systems however pose multiple challenges in building, optimizing and deployment of these systems 

Building 

Constructing a compound AI system involves managing multiple models and processing steps that must work together seamlessly.Effective coordination logic is needed to ensure smooth data flow between components, and robust metrics and logging systems are crucial for debugging and performance analysis. Building such systems without the right tools can require substantial engineering effort.

Platforms like Truefoundry simplify the process by offering intuitive modules that abstract away complexity.

  • Seamless integration of open-source/proprietary models into modular workflows
  • Easy orchestration between components
  • Built-in observability for critical performance metrics across the entire workflow
  • Testing and debugging capabilities
  • Flexibility to tailor hardware (GPUs and CPUs) and scaling for distinct processing steps

Optimizing 

Optimization in compound AI systems goes beyond the individual performance of models—it extends to managing the interplay between multiple models and additional processing steps.

Properly balancing latency, throughput, and resource utilization is essential to avoid bottlenecks.

TrueFoundry helps you optimize the right infrastructure, with built-in features like selecting the best model servers,compute etc. Leveraging auto scaling and advanced cost-optimization techniques, TrueFoundry minimizes inefficiencies and eliminates unnecessary spending. It also provides intelligent auto detection and auto fixes any infrastructure inefficiencies, helping you maintain an optimal balance between performance and cost.

Deploying 

Each component of a compound AI system has specific requirements for hardware, software, and scalability. Building a system that meets these diverse needs can require a significant investment of engineering time.

TrueFoundry simplifies this process by providing scalable infrastructure for deploying each component from various sources, such as local environments, Git repositories, Docker containers, Python scripts, and Hugging Face URLs. It also offers pre-built integrations with popular applications like vector databases, integrated development environments (IDEs), and observability tools, ensuring seamless interaction between components.

Additionally, TrueFoundry supports auto scaling and auto shutdown features, along with multiple deployment strategies, such as blue-green and canary deployments. This flexibility allows developers to build compound AI systems while maintaining high performance. 

How TrueFoundry helps build Compound AI systems? 

TrueFoundry helps build compound AI systems by offering a robust framework that streamlines model deployment, scaling, and integration. Here's how it achieves that:

TrueFoundry's Architecture

Core Abstractions

TrueFoundry simplifies the complexities of building AI systems by providing powerful abstractions:

Services: Enables the seamless deployment of AI models as scalable services, managing inference tasks with minimal infrastructure concerns. This abstraction simplifies operational aspects like auto-scaling and health monitoring.

Jobs: Facilitates the scheduling of tasks for batch processing, training, or automated workflows. These jobs can be executed on-demand or at specified intervals, offering flexibility for complex workflows.

Workflows: Helps connect multiple tasks into a cohesive AI pipeline. By building workflows, users can automate processes and link different models, tasks, or services into compound AI systems.

Open-source Helm Charts: Streamlines the packaging and deployment of AI workloads onto Kubernetes clusters, offering ease of use with industry-standard Helm charts.

Modules for Building Compound AI Systems

TrueFoundry offers several pre-built modules to simplify and accelerate the development of compound AI systems:

Model as a Service: Simplifies the deployment of AI models, allowing developers to focus on building compound AI systems rather than worrying about infrastructure scalability or reliability.

No-Code Model Fine-Tuning: Allows users to fine-tune pre-trained models with minimal effort, making it easier to customize models without extensive coding knowledge.

LLM Templates for Agents & RAG Framework: Provides inbuilt templates and frameworks to kickstart projects, especially for Retrieval-Augmented Generation (RAG) systems and AI agents. These are essential components for creating compound AI systems involving multiple models or task-specific agents.

AI Gateway: Centralizes prompt management, key management, and provides a unified API for interacting with models, enabling better control and security, especially across distributed teams. The gateway serves as the hub for managing and orchestrating multiple AI components, crucial for compound systems.

Read More About LLM Gateways
Read More

Features for Scalability and Cost Optimization

TrueFoundry provides several features to ensure scalability while optimizing for costs:

GPU Management: Efficiently manages GPU resources to optimize model training and inference. This is critical for resource-intensive tasks in compound AI systems.

Cost Optimization: Automatically manages resources, leveraging cost-saving strategies such as spot instances, fractional GPUs, and avoiding costly retraining errors.

Autoscaling: Dynamically scales resources up or down depending on workload, ensuring that the AI system always operates at peak performance without incurring unnecessary costs.

Secret Management: Safeguards sensitive information such as API keys and tokens, ensuring secure interactions across models and workflows.

CI/CD Integration: Seamlessly integrates with Continuous Integration/Continuous Deployment pipelines, accelerating the cycle of model development and deployment. This helps developers focus on building and improving models within compound AI systems.

Scale to Zero: Minimizes costs during periods of inactivity by automatically reducing resource consumption, a significant advantage for optimizing total cost of ownership in AI systems.

Underlying Infrastructure

TrueFoundry is built on top of Kubernetes, which provides a foundation for high scalability, reliability, and efficient resource management. It supports multi-cloud as well as on-premise workloads, ensuring flexibility regardless of the environment. This infrastructure is essential for the deployment of compound AI systems that need to scale across different cloud providers or physical data centers.

Developer-Centric Design

TrueFoundry's design puts developers first, offering multiple entry points for building AI systems:

Custom Code and Models: Developers can easily bring their own code and models, allowing flexibility to design and deploy customized AI systems that integrate multiple models and tasks.

Templates and GitHub Integration: To speed up deployment, TrueFoundry provides templates that can be quickly adapted, or users can integrate directly with GitHub repositories for seamless model deployment into production environments.

Discover more about TrueFoundry's Compound AI approach and its advanced features by reaching out to us. We can schedule a personalized demo to showcase its capabilities.

Know more about building Compound AI Systems on TrueFoundry
Book Demo
Build, Train, and Deploy LLM/ML Faster
Building Compound AI Systems
Book a Demo

Discover More

September 12, 2024

Understanding Total Cost of Ownership for GenAI Infrastructure

Engineering and Product
September 6, 2024

Build Vs Buy

Engineering and Product
August 8, 2024

A Guide to LLM Gateways

Engineering and Product
October 5, 2023

<Webinar> GenAI Showcase For Enterprises

Engineering and Product

Related Blogs

No items found.

Blazingly fast way to build, track and deploy your models!

pipeline