NEW E-BOOK | GenAI Blueprint for Enterprises with Real-World Tech Architecture. Get Now→

Enabling a Fortune 100 Healthcare Company to ship 30+ LLM use cases in less than a year

The client in the study is a USA-based, Fortune 100 Healthcare company. It invests heavily in healthcare research and leveraging cutting-edge technology. Given its sheer size (50K+ employees), they have functions ranging from manufacturing, research, and supply chain management to internal use cases like HR, operations, Customer Experience, etc. 

Given the inclination of the company to be an early adopter of new technology, when LLMs were released, the team went to the drawing board and identified a set of 30 + Use cases with an impact potential of  > $500 Mn+ per year. With this ambitious goal in mind, the team started to undertake these use cases and build its core stack for Generative AI to:

  • Quickly deliver high-impact LLM use cases: To unlock topline growth and cost reduction across functions like research, customer experience, document search, etc.
  • Letting teams reuse each other’s work: By incrementally making available every new project all the assets (data parsers, models, data features, etc.) developed by other teams. This would ensure that every new use case being built takes less time than the previous one.

In addition to bringing the cutting edge to their use cases, the team wanted to democratize AI to increase its adoption. It wanted to enable:

  • 1-Click deployment of business rules and existing models: So that any user can directly start using the models/rules that are implemented once without the need for a Data Scientist.
  • A single pane of glass view to manage all the deployed models: Data movement regulations forced the company to deploy models separately in each region of presence. This created a management nightmare for deploying and monitoring the performance of these models. The team wanted to simplify this process for ML and DevOps teams.

With collaboration between the client team and TrueFoundry, we were able to - 

  • Achieve a 60-80% reduction in TTV of LLM use cases: With access to the use case templates and the option to deploy each element of the use case model/UI/DB/Embedding model/data parsers/splitters). With a single click, the team could ship the use case in 1 Week. 
  • Democratize use of AI: The team was able to create a discoverable marketplace of all internal business rules and models that any non-ML user could also infer directly from the UI and get results by email.
  • Simplify Model Management: The team could ensure business ROI is delivered from its deployed models by being able to monitor all of them through a single pane of glass. We were also able to simplify the release and update process for these models significantly. 

About the Client

The client is a Fortune 100 Healthcare major with a history of more than 100 years. They have a footprint across 120+ countries and have a significant positive impact on public health in these countries. They have a DNA of intense research and stay committed to being at the forefront of technology. Its research and development division hires 7000+ employees, and it spends more than $10 Bn.

The client had multiple internal teams developing use cases for different business verticals already. With the release of the Large Language Models, most verticals went to the drawing board to reimagine their processes. Delivering these use cases was delegated to the Data Science team.

The Data Science team was responsible for building different use cases and also tooling to make individual BU Data Science teams more efficient. It’s a unique combination of vertical and horizontal charter in this group which presents interesting challenges and opportunities. 

Unlocking the business potential of LLMs 

With 30+ LLM use cases scoped out by the team, the leaders realized that without building additional Generative AI Capability, it would take years and 10s of millions of dollars before they would have been able to execute all these use cases. 

These use cases were spread across multiple domains:

  1. Research: Helping the research teams by summarizing articles and papers, helping them stay up to date with the latest developments and at an advanced level even helping to devise new experiment ideas and proposing tests.
  2. Customer Welfare: Developing applications that helped improve the experience of their customers and those aimed at populations of the countries that they operate in, helping improve the general health of these countries. This included applications like the QnA bot for clearing doubts of the patients, generating educational content on drugs and vaccine administration, etc.
  3. HR and Internal Operations: Helping streamline and automate processes like resume matching, candidate profiling, talent acquisition, etc. which had typically been a hugely time-consuming manual process.

Decreasing Time to Value of Artificial Intelligence

The leadership in the company identified that since there were multiple business verticals and multiple Data Science teams operating within the company, oftentimes one team was blind-sighted about the work done by another team.

Knowledge transfer between the teams was scarce. When it did happen, the team that tried to build on the work of another team had to face a huge lag before they could make the asset (model/UI/Business Logic, etc.) useful for their team. This was caused by:

  • Limited discoverability of work done across teams: Teams have limited knowledge transfer amongst each other and the assets generated in each project.
  • Just Documentation is not enough: Often, documentation becomes redundant, incomplete, and takes time to read and implement. This introduces friction when teams want to reuse each other’s work.
  • Dependence on the Engineering team for reuse: Reusing someone’s work also meant involving the engineering team in deploying the models.
  • Decreasing time to maintain the models: Since most models had been deployed separately in each region that the company operates in, maintaining them (updates and changes) or simply monitoring if they were
Explaining how data scientists had to query the model performance across each region manually
Managing models deployed across different regions is difficult

The team had started development on both fronts

When TrueFoundry started to explore a partnership with the team, they had started to develop both of their objectives. However, after 3-4 months into the development they started facing some challenges:

 A few LLM use case was contracted to consulting companies

The company was already working with some of the top consulting and implementation companies. They decided to allocate some of the use cases to these companies and to validate the idea started with 1 use case. Some of the issues that they faced here were:

  • Each use case costs $500K-$1 Mn for V1: The team understood that scaling up these use cases and refining and maintaining them through this route would not scale to the level of impact that they had envisioned.
  • Slow Process: Time to Value for each of the use cases was 3-4 months hence for 30 use cases the team would have had to either wait 2-3 Years or spend significantly more.
  • Capability building was limited: Since the field is updating every day, the team realized that without strengthening their own team’s capability it would be impossible to keep the wheel rolling in the long term.

The internal ML team had also started building another use case

The internal ML team started development on one of the use cases themselves. However, they were finding it difficult to keep up with leveraging the pace at which developments were happening in the field. Some of their main challenges were:

  1. Limited Access to 3rd party APIs and Tools: Anything that required sending data out was beyond the scope of the team. They also did not have support built for some of the tools that simplify model fine-tuning, testing, etc. and hence they needed to figure out these components on their own.
  2. Dependence on DevOps: Since the LLM/GenAI paradigm of Machine Learning required orchestration of infrastructure at a scale that was unknown before to the team, they faced a lot of delay in being able to create support for whatever new became available in the market. 
  3. Experimentation was constrained: by the models that could be supported by the infra team and hence the team was unaware if they were at the best possible quality that could have been achieved. Moreover, they were facing lags while trying to take up more complex tasks like LoRA fine-tuning etc. 

Generative AI marketplace was reduced to only discoverability and not deployment of resources

The team devised building a Generative AI marketplace kind of entity where all the ML teams can publish their work (models, data features, parsers, pre-processing etc.). The marketplace had to host:

  1. Internally Developed ML models: For easy incremental training and deployment
  2. LLM Assets: To help develop end to end LLM applications with models, DBs, UI etc.
  3. Base Models: Including LLMs, Regression, Time series models etc.
  4. Code Utilities: Data loaders, parsers etc.
  5. Apps: Fully functional internal applications for different use cases
Generative AI marketplace with: Internally Developed ML models: For easy incremental training and deploymentLLM Assets: To help develop end to end LLM applications with models, DBs, UI etc.Base Models: Including LLMs, Regression, Time series models etc.Code Utilities: Data loaders, parsers etc.Apps: Fully functional internal applications for different use cases
Team's vision for Generative AI marketplace

However, as the team started the development of the project, they realized that it would take a lot of time for them to build the underlying orchestrating layer that could fulfill their vision:

  1. Deploying models was difficult: Unless the models were deployed as they were developed, it was very difficult to ensure the same performance levels.
  2. Models/services were not dockerized: It was not common practice to dockerize the models and Data Scientists were reluctant to carry out any additional steps.
  3. Orchestrating infrastructure was complicated: It required to take care of GPU Scaling, Auto-scaling, ensuring reliability

Hence the team decided that they would keep the marketplace only to let teams discover each other’s work. They decided to remove executability, which was one of the core features, from the initial version of the marketplace

Team wanted to ship business rules as a Python Library

However they realized that this approach would not work because:

  1. It would compromise the discoverability: Without creating a front for it, 
  2. Version control of these rules would be impossible: Since these rules would be executed in the local machines of users, ensuring that all users have same library version would be impossible, especially if a fix/change is made different users would be using different versions of it.

The company decided to co-build their AI stack with TrueFoundry

Two high-value LLM use cases were delivered in <3 Months

The client team decided to develop 2 high-value use cases using the LLM module of the TrueFoundry platform. These use cases were as follows:

Market report summarization

An internal team used to analyze different market intelligence reports and generate a summary report. This weekly activity meant:

  1. 100s of Hours spent each month
  2. Limited coverage of available information

The team wanted to create an LLM based solution that could summarize these reports and provide a QnA interface with them:

Proposed solution for summarizing Market Reports

Vaccine Intelligence Chatbot

Through this use case the company wanted to be able to increase the awareness about vaccines by developing a QnA chatbot that can search on the available documents about the administration of vaccines and clarify any doubts that a patient might have. 

Increasing vaccination rates: Through this use case the company was trying to clarify any inhibitions that a vaccine taker may have due to fake news that is often associated with vaccines and which creates stigma around it.

TrueFoundry helped reduce Time of Delivery to 1/5th of initial estimate

Building the use case needs putting multiple components together. We provided the team with a template to assemble parts of the RAG (Retrieval Augmented Generation) pipeline together. This included components like:

  1. Open Source LLM Deployment: Deploy models like LLaMA 2, Bloom etc. along with different quantized versions of models
  2. Model Fine-Tuning: We helped the team simply plug in their data sources and trigger fine tuning runs on optimized infrastructure configurations. 
  3. Data Loading, Splitting and Chunking micro-service: To break data into logical chunks before the embedding
  4. Backend service: To accept user query and return response
  5. Embedding model: To convert the chunks of texts into an its representative vectors
  6. Vector Database: To store the vectorized chunks of data
  7. Final Model Deployment: Deploy the final model scalably
Workflow for developing a RAG system
RAG Use Case Workflow

TrueFoundry powered the AI marketplace of the company

TrueFoundry acted as the powering rail that is being used to power the internal marketplace. To enable this we helped the team:

  1. Initiated the Marketplace Components: with ready to use assets provided by TrueFoundry
  2. Implement an Async Inference Architecture: This ensured that no requests get dropped and that the same API end point can serve requests which took different amount of time to respond (>10-15 mins if dataset is huge)
  3. Set up Use Case pipelines like the RAG pipeline: With all the components like data parsers, chunking logic, models etc. available to the teams, the team could easily replicate what they did with Vaccine Intelligence and Report summarization to any new use case in <1 Month
  4. Added Discoverability through UI: We provided the team with APIs built on TrueFoundry deployments and jobs that they integrated with a UI to make inference from any model or deploying any component single click for the teams without need to read documentation.

“TrueFoundry has acted as partners in enabling us to unlock LLMOps capabilities at scale. The team did extra work to support any new model we needed. Today, we can proudly say we are a leader in our space in using LLMs. TrueFoundry team offered us a novel model of “product team as a service,” bringing hard-to-find skills augmented by the platform. In ever-changing technology areas like Gen AI, the TrueFoundry offered enterprises a low-risk-high-reward engagement mechanism.”

- Global Head of Data Science

Business users are seamlessly able to infer from business rules

All the business logics were packaged into an API that was run on the cloud server using TrueFoundry. We ensured that this API is structured similar to a Python Library for ease of usage. This enabled that there was:

  1. No version management issue
  2. Simple Execution through UI
  3. Email notifications when the results were available

TrueFoundry is the single pane of glass for all the deployed models

All the different region clusters are connected to TrueFoundry. They can view and manage all these models from a single control plane.
TrueFoundry helped team manage models deployed in different clusters

Interacting with TrueFoundry to monitor, update and release models in different regions helped the team to:

  1. Decrease models deployment time by 60-80%
  2. Improve model ROI by monitoring their performance

Way Ahead

As the partnership between the two companies is progressing, we get to learn a lot about practical problems that may come in a ML team this scale. We are able to battle test the platform while also developing new and more mature features. Together, we are determined to build a state-of-the-art technology that enables Data Science teams to just focus on delivering value through ML use cases without ever finding a need to orchestrate infrastructure or take up/lose time in engineering tasks.

Operate your ML Pipeline from Day 0

pipeline