Adding a Generative AI Ready Core to Aviso AI's Tech Stack
Aviso AI is a leading Revenue Operation System that helps teams manage and increase their revenues using conversational intelligence and forecasting models. TrueFoundry helped the team add the capability to deploy its proprietary LLM Models, which power its AI Chief of Staff MIKI.
0
Code Refactoring Required
Adding a Generative AI Ready Core To Aviso AI’s Tech Stack
About Aviso AI
Aviso AI is a revenue operating system designed to help sales teams better predict and drive revenue, optimize rep performance, and prioritize go-to-market (GTM) strategies. It combines conversational intelligence with sales apps to provide accurate forecasts, predict future pipelines, and win more deals.
It has helped teams from companies like Seagate, Honeywell, GitHub, etc., close 20% more deals and achieve a 15-35 % topline increase.
Objectives that the team wanted to achieve
The ML Team at Aviso AI wanted to further strengthen the company’s AI first stack and further the impact by:
Make the stack robust for new models like LLMs: The team wanted support for all the latest models within their cloud and make the stack ready for anything new that might come in the future.
Simplify Infrastructure Handling for ML Developers: The team wanted ML/DS developers to spend minimal time handling the infrastructure so they could experiment with, test, and deploy their models.
Reduce cloud costs: The team wanted to increase the utilization percentage of the provisioned infrastructure and package multiple services/models into the same compute.
What we achieved together with the Aviso AI Team
Through the partnership, the teams were jointly able to achieve the following:
Save 100+ Hours of developer time per month: The Machine Learning team moved to a docker-based environment, thereby reducing build times and facilitating easier local testing.
Cloud Cost Savings: The team was able to package more software/models for the provisioned compute using docker. They were also able to use spot instances without worrying about reliability. Overall, the team realized a cloud cost savings of ~30-40%.
LLM Deployment at Scale: Using the catalog of pre-configured Open Source LLM models, the team could deploy any LLM model from HuggingFace or other sources with optimized inference on infra provisioned by the TrueFoundry platform.
Aviso AI uses AI to revolutionise Revenue Ops
Aviso AI is an integrated platform that combines various sales tools powered by AI to optimize revenue execution. Some of their main products include:
Sales Forecasting: Forecasting sales outcomes, enabling teams to focus on essential deals.
Performance Optimization: Helping optimize the sales strategy and provide enhancement feedback to the team.
Real-time Insights: Helps leaders watch and diagnose any disruption or market change
Actionable Recommendations: Suggest the next best action for sales representatives using its AI-based analysis.
Their other offerings include AI-guided deal Forecasting, Pipeline and deal Management, Conversational Intelligence, Coaching and enablement, NLP Analytics and reporting, Sales Engagement, Lead Intelligence, and Customer Success Intelligence.
MIKI: World’s First LLM Chief of Staff
Aviso AI team has also been innovating with Generative AI; central to its approach is MIKI, the world’s first Generative AI Chief of Staff for Revenue Intelligence, designed to boost GTM teams’ productivity and save reps up to 15-20 hours/week. It helps in:
Question Answering: Answering queries asked by customers to aide salespersons in real-time
Suggesting Next Best Actions: Suggesting optimum steps to increase the probability of closing a sales conversation.
Coaching Sales Representatives: Analyzing and providing feedback to salespersons
Automating mundane tasks done by Account Executives: Like research, writing emails
The Team wanted LLM models to be deployed scalably as an independent service
Aviso's AI team had been deploying their software and ML services on AMIs (Amazon Machine Images). AMIs are pre-configured machines that include the operating system, the application server, and the application/model you want to deploy.
The team wanted to create a simpler and more efficient tech stack for training, testing, and deploying their models as the use cases expanded to newer and more demanding models like LLMs (Large Language Models) and Generative AI.
Wanted to keep the AMIs lean
Since the software and ML services were bundled together and baked into AMIs, this could pose challenges as the models got larger, especially true in the case of LLMs
Making testing and fault diagnosis easier
Singling out issues while releasing or correcting issues could become challenging because of dependencies between ML and non-ML services. The team wanted to keep the two deployments and their testing separate.
Simpler Environment management and Efficient Scaling
The LLM and software services could require different environments for running. Their resource requirements are also vastly different. Hence, the team saw merit in managing the environment and resource handling for the two separately.
With an updated infrastructure stack, the team saves 100+ Dev Hours Per Month
The team moved to a scalable infrastructure stack in a dockerized environment. We jointly decided that this could set the team up for success in the longer run while saving time and
The New Dockerized Environment helps the team save costs and be more Agile
Lightweight: Docker images are much lighter and only encapsulate the application and its dependencies. This makes them much smaller and faster to build.
Microservices Architecture: Docker images are building blocks that break a monolithic application into smaller microservices. Microservices make the application much more reliable and transparent.
Cost-effective: Containers share the host OS kernels, making them more resource-efficient than Virtual Machines. Multiple containers can run on shared infrastructure, leading to high resource utilization.
ML Team’s new workflow with TrueFoundry
TrueFoundry helped the team move seamlessly from their existing setup to a new docker-based setup that ensures:
Easier for DS teams to manage: Each time a service/model needs to be deployed or tested, it can be done locally.
SRE Best practices were auto-enforced: Autoscaling, version management, data and model lineage tracking, cost visibility, etc.
~40% Cloud Cost Savings: Through reliable usage of spot instances, more resource utilization
Aviso AI team could ship LLMs from Day 1 using TrueFoundry
With the new and modular stack in place, the team was set up to seamlessly deploy and use new-age and heavier models like LLMs to power MIKI and newer use cases coming up.
“The team did not have to think about how to configure and manage resources.”
- Santosh SK Madilla, Principal Data Scientist at Aviso AI
Given the scale and recency of these models, training, fine-tuning, and deploying these models at scale are complex engineering problems. These include:
Scaling up of GPU infrastructure: To support huge models like LlaMA 2 70 Bn, etc.
Figuring out suitable model server configurations: New models are released every few weeks, and teams need to determine the correct parameters to serve them on model servers like vLLM, TGI, etc. Finding this config based on available resources and performance requirements can take weeks.
Fine-tuning and Pre-Training: Fine-tuning and pre-training require orchestrating multi-GPU clusters, checkpointing, and continuously monitoring the training job.
The team could just deploy their models and be assured of reliability and optimal costs by default
TrueFoundry helped the team to:
1 Click deploy any Open-Source LLM from Hugging Face Hub or other sources
Autoscaling with the best performance over model servers to give the most performant models
Save costs by using spot instances, allowing to scale down model in certain periods of the day and deployment of bare Kubernetes.
TrueFoundry became the single pane of glass for admins and ML Teams
TrueFoundry became the single dashboard through which the different projects within the company deployed their ML Models. This allowed easier context-sharing between the teams since everyone, especially the admins could look at what deployments and model building is being done by the different teams.