We are back with another episode of True ML Talks. In this, we again dive deep into MLOps and LLMs Applications at GitLab and we are speaking with Pruthvi Shetty.
Pruthvi is a Staff Data Scientist at Twilio. Before that, he was also leading ML for SAP as well as a startup called ZapLabs which was acquired by Anywhere RE. In Twilio, Pruthvi leads the Gen AI efforts for Twilio, and we'll deep dive into that today.
📌
Our conversations with Pruthvi will cover below aspects:- ML and GenAI applications and use cases around GTM- XGPT: Twilio's Powerhouse for Go-to-Market Teams- Battling OpenAI Rate Limits- Experimenting with Open-Source LLM- RFP Genie: Automating RFP Responses- Workflow for Traditional ML Models
Twilio has a long history of leveraging machine learning (ML) and data science to optimize its products and services. However, the recent advancements in Generative AI (GenAI) have opened up new opportunities to further enhance the way GTM teams operate.
While GenAI is a powerful tool, Twilio has not abandoned its traditional ML roots. The company continues to use ML for various GTM tasks, such as:
Twilio recognized the potential of GenAI early on and established a dedicated team to explore its applications. This team has built a suite of GenAI-powered tools specifically for GTM teams, including:
Twilio recognized the potential of Generative AI (GenAI) early on and built a dedicated team to explore its applications. This team, led by Pruthvi, has built a suite of GenAI-powered tools specifically for GTM teams. One of the key tools they built is XGPT.
XGPT was developed as a response to two issues with using publicly available GenAI models like ChatGPT:
XGPT tackled these issues by:
We've had it for about 4-5 five months now. Currently, we are answering about 15,000 questions a month, and we've seen a super good lift in the power users of our applications. That's been XGPT so far. - Pruthvi
XGPT is a secure and customizable platform that:
XGPT is not just one model, but a suite of products, each tailored for specific GTM roles and needs. These products include FlexGPT for customer service representatives and SegGPT for segmentation tasks.
A custom pipeline of RAG flow gathers all relevant information for XGPT, including public and private data. This information comes from various sources, such as content management systems, internal documents, call transcripts, Salesforce notes, and product documentation.
Offline embeddings are used for FlexGPT and other applications, created using tools like Space and Chroma. Custom tweaks ensure scalability and control. In addition to text, XGPT also understands audio and visual data through multimodal embeddings. Whisper transcribes product demos, while a vision model extracts information from charts and diagrams. These embeddings are then converted to Face embeddings, allowing XGPT to link them to relevant sources in its answers.
The main LLM processing is handled by OpenAI API. In specific cases, like RFPs, Llama is used for interpretation. Parallelization and batching strategies optimize processing and avoid rate limits. An interpretation layer filters and contextualizes questions before feeding them to the LLM. XGPT provides links to the relevant documentation for each answer, allowing you to explore further.
Heroku hosts the applications, ensuring stability and performance. Docker containers enable easy deployment and scalability. Data is securely stored in Postgres. Airtable tracks questions and feedback, constantly improving XGPT's functionality. CloudWatch monitors metrics for optimal performance.
The team is constantly working on improving XGPT and RAG flow. Their vision for the future includes:
Twilio's XGPT, a powerhouse for go-to-market teams, faced a significant obstacle: OpenAI's rate limits. Answering questions iteratively, the initial version quickly hit these limits. Rotating API keys offered a temporary solution, but OpenAI's organizational rate limit proved more challenging.
To solve this challenge, The team's first step was to utilize OpenAI's best practices for avoiding rate limits and parallelizing calls. This provided a solid foundation, but further optimization was needed. Twilio's engineers also devised a clever solution: strategically batching API calls to fly under OpenAI's radar. This involved carefully grouping questions while maintaining the user experience of the application. To further improve efficiency, engineers assigned strategic weights to different tasks. This ensured that critical questions received priority while still allowing less urgent requests to be processed.
While both ChatGPT and Llama are powerful language models, Twilio opted for Llama for their XGPT application for a few key reasons:
By choosing Llama for the first layer of interpretation, Twilio achieved a cost-effective solution that met the task requirements while diversifying their LLM usage and demonstrating their commitment to the open source community.
RFP Genie is another generative AI tool developed by Twilio's internal team. It automates the process of responding to RFPs, which can be a time-consuming and tedious task for GTM teams. RFP Genie can:
In the Introduction, we briefly touched on the Traditional ML Models still used for GTM in Twilio, like Propensity and Lead Generation Models.
The Traditional ML Models workflow leverages a powerful combination of tools and technologies:
Keep watching the TrueML youtube series and reading the TrueML blog series.
TrueFoundry is a ML Deployment PaaS over Kubernetes to speed up developer workflows while allowing them full flexibility in testing and deploying models while ensuring full security and control for the Infra team. Through our platform, we enable Machine learning Teams to deploy and monitor models in 15 minutes with 100% reliability, scalability, and the ability to roll back in seconds - allowing them to save cost and release Models to production faster, enabling real business value realisation.
Join AI/ML leaders for the latest on product, community, and GenAI developments