NEW E-BOOK | GenAI Blueprint for Enterprises with Real-World Tech Architecture. Get Now→

Adding a Generative AI Ready Core To Aviso AI’s Tech Stack 

About Aviso AI

Aviso AI is a revenue operating system designed to help sales teams better predict and drive revenue, optimize rep performance, and prioritize go-to-market (GTM) strategies. It combines conversational intelligence with sales apps to provide accurate forecasts, predict future pipelines, and win more deals. 

It has helped teams from companies like Seagate, Honeywell, GitHub, etc., close 20% more deals and achieve a 15-35 % topline increase.

Objectives that the team wanted to achieve

The ML Team at Aviso AI wanted to further strengthen the company’s AI first stack and further the impact by:

  1. Make the stack robust for new models like LLMs: The team wanted support for all the latest models within their cloud and make the stack ready for anything new that might come in the future.
  2. Simplify Infrastructure Handling for ML Developers: The team wanted ML/DS developers to spend minimal time handling the infrastructure so they could experiment with, test, and deploy their models. 
  3. Reduce cloud costs: The team wanted to increase the utilization percentage of the provisioned infrastructure and package multiple services/models into the same compute. 

What we achieved together with the Aviso AI Team

Through the partnership, the teams were jointly able to achieve the following: 

  1. Save 100+ Hours of developer time per month: The Machine Learning team moved to a docker-based environment, thereby reducing build times and facilitating easier local testing.
  2. Cloud Cost Savings: The team was able to package more software/models for the provisioned compute using docker. They were also able to use spot instances without worrying about reliability. Overall, the team realized a cloud cost savings of ~30-40%.
  3. LLM Deployment at Scale: Using the catalog of pre-configured Open Source LLM models, the team could deploy any LLM model from HuggingFace or other sources with optimized inference on infra provisioned by the TrueFoundry platform.

Aviso AI uses AI to revolutionise Revenue Ops

Aviso AI is an integrated platform that combines various sales tools powered by AI to optimize revenue execution. Some of their main products include:

  1. Sales Forecasting: Forecasting sales outcomes, enabling teams to focus on essential deals.
  2. Performance Optimization: Helping optimize the sales strategy and provide enhancement feedback to the team.
  3. Real-time Insights: Helps leaders watch and diagnose any disruption or market change
  4. Actionable Recommendations: Suggest the next best action for sales representatives using its AI-based analysis.
Aviso AI's Operating System for Revenue Teams

Their other offerings include AI-guided deal Forecasting, Pipeline and deal Management, Conversational Intelligence, Coaching and enablement, NLP Analytics and reporting, Sales Engagement, Lead Intelligence, and Customer Success Intelligence.

MIKI: World’s First LLM Chief of Staff

MIKI: World's first LLM Chief of Staff; Built by Aviso AI

Aviso AI team has also been innovating with Generative AI; central to its approach is MIKI, the world’s first Generative AI Chief of Staff for Revenue Intelligence, designed to boost GTM teams’ productivity and save reps up to 15-20 hours/week. It helps in:

  1. Question Answering: Answering queries asked by customers to aide salespersons in real-time 
  2. Suggesting Next Best Actions: Suggesting optimum steps to increase the probability of closing a sales conversation.
  3. Coaching Sales Representatives: Analyzing and providing feedback to salespersons
  4. Automating mundane tasks done by Account Executives: Like research, writing emails 

The Team wanted LLM models to be deployed scalably as an independent service

Aviso's AI team had been deploying their software and ML services on AMIs (Amazon Machine Images). AMIs are pre-configured machines that include the operating system, the application server, and the application/model you want to deploy. 

The team wanted to create a simpler and more efficient tech stack for training, testing, and deploying their models as the use cases expanded to newer and more demanding models like LLMs (Large Language Models) and Generative AI.

Wanted to keep the AMIs lean 

Since the software and ML services were bundled together and baked into AMIs, this could pose challenges as the models got larger, especially true in the case of LLMs

Making testing and fault diagnosis easier

Singling out issues while releasing or correcting issues could become challenging because of dependencies between ML and non-ML services. The team wanted to keep the two deployments and their testing separate.

Simpler Environment management and Efficient Scaling

The LLM and software services could require different environments for running. Their resource requirements are also vastly different. Hence, the team saw merit in managing the environment and resource handling for the two separately.

With an updated infrastructure stack, the team saves 100+ Dev Hours Per Month

The team moved to a scalable infrastructure stack in a dockerized environment. We jointly decided that this could set the team up for success in the longer run while saving time and 

The New Dockerized Environment helps the team save costs and be more Agile

  1. Lightweight: Docker images are much lighter and only encapsulate the application and its dependencies. This makes them much smaller and faster to build. 
  2. Microservices Architecture: Docker images are building blocks that break a monolithic application into smaller microservices. Microservices make the application much more reliable and transparent. 
  3. Cost-effective: Containers share the host OS kernels, making them more resource-efficient than Virtual Machines. Multiple containers can run on shared infrastructure, leading to high resource utilization.

ML Team’s new workflow with TrueFoundry

Workflow of the Aviso AI team using TrueFoundry

TrueFoundry helped the team move seamlessly from their existing setup to a new docker-based setup that ensures:

  1. Easier for DS teams to manage: Each time a service/model needs to be deployed or tested, it can be done locally.
  2. SRE Best practices were auto-enforced: Autoscaling, version management, data and model lineage tracking, cost visibility, etc.
  3. ~40% Cloud Cost Savings: Through reliable usage of spot instances, more resource utilization

Aviso AI team could ship LLMs from Day 1 using TrueFoundry

With the new and modular stack in place, the team was set up to seamlessly deploy and use new-age and heavier models like LLMs to power MIKI and newer use cases coming up.

“The team did not have to think about how to configure and manage resources.”

- Santosh SK Madilla, Principal Data Scientist at Aviso AI

Given the scale and recency of these models, training, fine-tuning, and deploying these models at scale are complex engineering problems. These include:

  1. Scaling up of GPU infrastructure: To support huge models like LlaMA 2 70 Bn, etc. 
  2. Figuring out suitable model server configurations: New models are released every few weeks, and teams need to determine the correct parameters to serve them on model servers like vLLM, TGI, etc. Finding this config based on available resources and performance requirements can take weeks.
  3. Fine-tuning and Pre-Training: Fine-tuning and pre-training require orchestrating multi-GPU clusters, checkpointing, and continuously monitoring the training job.

The team could just deploy their models and be assured of reliability and optimal costs by default

TrueFoundry helped the team to:

  1. 1 Click deploy any Open-Source LLM from Hugging Face Hub or other sources
  2. Autoscaling with the best performance over model servers to give the most performant models
  3. Save costs by using spot instances, allowing to scale down model in certain periods of the day and deployment of bare Kubernetes.

TrueFoundry became the single pane of glass for admins and ML Teams

TrueFoundry became the single dashboard through which the different projects within the company deployed their ML Models. This allowed easier context-sharing between the teams since everyone, especially the admins could look at what deployments and model building is being done by the different teams.

Operate your ML Pipeline from Day 0

pipeline