True ML Talks #10 - LLMs & GenAI @ Meta with Engineering Director

We are back with another episode of True ML Talks. In this, we dive deep into LLM's and Generative AI's, and we are speaking with Anant.

Anant is an engineering leader. Currently, an engineering director at Meta, working on some of the smart glasses product experiences. He started as an open source contributor and has held key roles at both startups and tech giants. At Mozilla, he has helped define and implement WebRTC among other browser technologies. He was also an early Firebase engineer, and Ozlo's first engineer.

📌

Our conversations with Anant will cover below aspects:
- Fine-tuning Stable Diffusion
- Building Embeddings for Hacker News
- The Debate on Long-Term Sustainability
- The Need for Improved Experiment Tracking and Accessible Documentation in LLMOps
- The Power and Potential of OpenAI Plugins
- The Power of Language Models in Recovery Flow and Plugin Adaptation

Anant spoke to us in his personal capacity and his views do not represent those of the organization (Meta) he is affiliated with.

Watch the full episode below:

Key Observations from Anant's experiments:

LoRA: An intriguing technique in fine-tuning large language models.

LoRA's Accessibility: Compared to other techniques like Dreambooth, LoRA is more accessible and involves adding a few layers to a pre-trained model without altering its weights.
Cost-Effectiveness and Simplicity: LoRA is cost-effective and can be used by non-technical individuals through platforms like Google Colab. Its simplicity enables quick iteration and experimentation.
Potential in Text and Image: While initially applied to images, LoRA's potential in text-related tasks is being explored.
Future of Fine-tuning: LoRA is favored over traditional techniques due to its cost-effectiveness, accessibility, and compatibility with older graphics cards.

Building Embeddings for Hacker News: Challenges and Model Selection

Understand Fundamentals: Choose appropriate solutions based on project scale and requirements.
Grasping Embeddings: Embeddings are collections of floating-point numbers, and storing them in an SQLite database proved effective.
Model Selection: Open-source embedding libraries were preferred over cloud providers. The Instructor large model was chosen based on benchmarking and prototyping using MTEB by Hugging Face.
Start from First Principles: Gain a deep understanding of the chosen solution and focus on project requirements rather than hype.
Scale-Up with Clear Need: Scaling up should be done when it aligns with a clear requirement.

The Debate on Long-Term Sustainability: Large Models vs. Smaller, Fine-tuned Models

One of the significant discussions in the MLOps ecosystem revolves around the long-term sustainability of large generalized models versus smaller, fine-tuned models tailored to specific datasets or use cases. This debate gains insights from a leaked memo that suggests the potential commoditization of large language models (LLMs).

The leaked memo, although not an official stance, indicates a growing sentiment that LLMs are likely to become more accessible and replicable. This development has sparked excitement within the community, particularly among those with an open-source background. Recent advancements have made it easier to replicate LLMs, addressing previous concerns about data acquisition and model training costs.

Projects like RunwayML and the diffusion model have contributed to an open-source movement, enabling the release of models on platforms like GitHub. This democratizes access to LLMs, allowing hobbyists and hackers to explore and experiment. While not all LLMs are open source, licensed open-source options are available, fostering a diverse range of contributors.

The benefits of open development and widespread involvement are emphasized, as it prevents power from being concentrated in the hands of a few entities. Moreover, open development provides transparency and security, taking into account global factors and the potential interest from nation-states.

Anticipating the commoditization of LLMs, there is a parallel drawn with the cloud computing landscape. Users will have the flexibility to choose among different providers, similar to cloud service options like AWS, Azure, and Google Compute. This allows for healthy competition and innovation within the ecosystem.

The debate also considers the interplay between large models and smaller, on-device models. Both types have their place in the MLOps ecosystem, with computation occurring at multiple layers. While simpler tasks can be efficiently performed on devices, more resource-intensive tasks can be offloaded to servers. The choice of deployment depends on the specific use case, with a hybrid approach being advocated for, rather than favoring one side over the other.

In the quest for long-term sustainability, the MLOps community must carefully consider the practicality and advantages of large models versus fine-tuned models. Striking a balance and leveraging the strengths of each approach will shape the future of AI model development and deployment, ensuring continued progress within the field.

📌

Performance Comparison: Large Language Models vs. Smaller Versions
Another aspect discussed within the MLOps community is the performance comparison between large language models and their smaller counterparts. Large models tend to excel in tasks like generating blogs or poems, showcasing their impressive capabilities. However, smaller models often struggle to match their performance, particularly when dealing with smaller datasets.
It is important to scrutinize the evaluation methods and tests supporting such claims to ensure accurate comparisons. While acknowledging that different use cases and evaluation approaches may exist, a thorough understanding of the limitations and performance differences between large and small models is crucial.

The Need for Improved Experiment Tracking and Accessible Documentation in LLMOps

As the field of machine learning progresses, the importance of robust tooling, including MLOps frameworks, becomes evident. However, with the rise of LLMOps (Large Language Model Operations), there is a specific need for tailored tools to support developers working with LLMs. In this discussion, the focus is on the lessons learned and the recommendations for experiment tracking and accessible documentation in LLMOps.

Reflecting on the learning journey, it becomes apparent that proper experiment documentation is essential. Initially, there was a lack of emphasis on maintaining a training diary or structured tracking system, which led to challenges during experiments. Particularly in the LoRA fine-tuning project, managing numerous hyperparameters became overwhelming without a systematic approach to track the values and corresponding outputs.

Recognizing the value of thorough documentation, the necessity for a reliable training diary or integrated tracking system becomes evident. Unfortunately, readily available solutions were scarce, presenting a challenge for finding a suitable tool. However, the discovery of Weights & Biases (wandb.ai), a startup offering experiment tracking and visualization tools, proved to be beneficial. Recommending these tools to others, it is acknowledged that incorporating them earlier in the process could have improved experiment management.

Additionally, the importance of accessible documentation within the machine learning community is stressed. Insufficient information about hyperparameters and their effects hindered understanding and hindered the optimization of experiments.

Addressing Data Security Concerns in MLOps: Ensuring Privacy and Trust

Data security is a paramount concern within the realm of MLOps, raising questions and prompting discussions within the community. In an exploration of this critical issue, let's delve into the approach taken by Open AI, as well as the broader perspective surrounding data privacy.

Open AI's stance on data privacy is deemed reasonable, particularly for consumers utilizing services like ChatGPT. Considering ChatGPT is offered as a free product, users find value in the platform, justifying the exchange of their data to enhance the models. It is seen as a fair trade-off, with users willingly contributing their conversations to improve the service, considering the resource-intensive nature of running such platforms.

For ChatGPT Plus subscribers who pay a monthly fee, the option to opt out of data usage is available. However, this choice comes with the consequence of losing conversation history. Yet, given the affordable price of the subscription and the immense value derived from the service, users generally perceive this trade-off as reasonable. They express satisfaction with the arrangement, understanding that their data contributes to improving the model while subsidizing the cost.

Enterprises seeking to leverage AI models for specific use cases have unique requirements concerning data security. Open AI has already taken steps to address these concerns through partnerships, such as Microsoft Azure's Secure Enclave. These collaborations provide secure environments where data remains under the enterprise's control. Additionally, partnerships like Anthropic's integration with AWS Bedrock offer secure enclaves for running cloud models, assuaging concerns about data leaving the premises. These industry moves are poised to offer suitable solutions for enterprises focused on data security.

Resolving the issues of data privacy and security requires the collective efforts of companies like Open AI, Azure, and other major players. For instance, Google, with its in-house capabilities, is well-positioned to effectively address these concerns. It is important to adopt a balanced perspective on data privacy, recognizing that reputable companies can build trust with their customers, who may be willing to trade some privacy for the value provided by AI services.

The Power and Potential of OpenAI Plugins

OpenAI plugins are a groundbreaking development that showcases the true power and potential of AI language models. When diving into the concept of plugins, it becomes apparent how remarkable they are in enabling interactions with the model without the need for writing code. Instead, the focus shifts towards leveraging English communication skills to instruct the model effectively. This realization can be a mind-blowing moment for developers and non-technical individuals alike.

Plugins revolve around providing instructions to the AI model in English, specifically concerning API descriptions and triggers. By crafting a one-page document that details the API schema and specification, users can effectively communicate when and how to trigger their plugin. This emphasizes the importance of strong English language skills in harnessing the capabilities of ChatGPT.

This innovative approach has led to comparisons with previous research, such as the Toolformer paper, highlighting that similar problems have been explored in the past. However, OpenAI's language models, particularly the ChatGPT, demonstrate significant advancements in quality and performance compared to existing open source models.

The quality disparity primarily stems from the core language model's competence in coding-related tasks. ChatGPT excels in handling code-related instructions, which translates into its ability to efficiently dispatch and utilize plugins. It showcases the critical role of the underlying model's proficiency in coding tasks when implementing plugins effectively.

While OpenAI currently holds a substantial lead in terms of quality, it is essential to give open source models time to catch up. The open source community continuously strives to bridge the gap and enhance the capabilities of their models. The fact that OpenAI acknowledges the potential of open source models and explores avenues like multimodal capabilities is encouraging. Sam Altman's recent interview with Lex Friedman highlights OpenAI's perspective, indicating that the focus is shifting away from a parameter race to differentiating factors.

As open source models evolve and reach the level of GPT-3.5 and 4, it is reasonable to expect the availability of plugin functionality in open source frameworks as well. The progress in the field holds promise for the future, where open source models and plugins can revolutionize the way developers interact with AI systems.

The Power of Language Models in Recovery Flow and Plugin Adaptation

The rise of language models, such as OpenAI's GPT, has brought English to the forefront as a new programming language in many ways. Leveraging English to instruct AI models and trigger plugins has opened up new possibilities for developers.

When it comes to plugins, the focus is not on micromanaging their usage but rather on instructing how to use them effectively. By providing instructions on plugin utilization, developers enable the AI model to determine the appropriate moments to trigger the plugins. However, it's important to note that the current implementation has limitations, such as allowing only three plugins to be enabled at a time and occasional mistakes in plugin triggering.

Nevertheless, the power of language models lies in their recovery flow. Even when the model doesn't initially understand or trigger a plugin correctly, the user experience remains positive. In contrast to traditional voice assistants like Alexa, where explicit and precise commands are required, language models like ChatGPT offer a different experience.

With ChatGPT, if the model misses the mark, users can confidently provide explicit follow-up instructions to correct the mistake. The model's understanding and responsiveness create a sense of trust and collaboration. Users feel that their instructions will be acknowledged and followed. The recovery capability of language models is a game-changer compared to older generation assistants, which often led to frustration and disappointment.

The power of language models, especially when combined with plugins, lies in their ability to recover from errors seamlessly. Users appreciate the model's acknowledgment of confusion, the polite apology, and the willingness to rectify the mistake. This level of recovery and adaptability is unparalleled in previous assistant technologies, even with the advancements in multi-turn dialogue systems.

The architecture of language models enables a phenomenal level of recovery, making it an ideal platform for plugin integration. The seamless integration and collaboration between developers and the model enhance the overall user experience. As developers explore the potential of plugins within this architecture, the possibilities for creating dynamic and adaptable AI systems are vast.

With language models serving as the foundation for AI-driven interactions, recovery flow and plugin adaptation become essential components in building advanced and user-friendly systems. The combination of natural language understanding and responsiveness positions language models as transformative tools in the MLOps landscape.

📌

Buying GPUs for MLOps: Challenges and Impulse Purchases
Accessing high-performance GPUs through cloud providers is frustrating for hobbyists, Enterprise prioritization and long-term commitments pose challenges.
Cloud-based GPU access is time-consuming for hobbyists needing short GPU time. Setting up a personal GPU is challenging, including manual configuration and dependency management.
Despite the convenience of pre-configured cloud GPU images, Anant values the control and reliability of his personal GPU. Overcoming challenges, his decision to buy a dedicated GPU proved beneficial for MLOps.

Staying Informed: Anant Narayanan's Approach to Keeping Up with New Developments

Twitter: Use Twitter as a knowledge hub for AI-related updates. Follow experts and organizations in the AI community to quickly discover new developments and stay informed.
Podcasts: Listen to industry podcasts for valuable insights. Recommended podcasts include "Stratechery" by Ben Thompson and "Dithering" co-hosted by Ben Thompson and John Gruber. These podcasts cover a wide range of tech-related topics, including AI advancements.
Offline Interactions: Attend in-person events and gatherings to gain valuable insights and perspectives. Look for comprehensive event listings on platforms like cerebralvalley.ai. Engage in face-to-face interactions, participate in hackathons, and exchange ideas with other professionals to expand knowledge and build meaningful connections.

Read our previous blogs in the True ML Talks series:

‍

True ML Talks # 9 - Machine Learning Platform @ DoorDash

In this blog, we dive deep into Doordash’s Machine Learning Platform. Understand DoorDash’s ML architecture, how ML is used at DoorDash.

TrueFoundry Blog TrueFoundry

‍

Keep watching the TrueML youtube series and reading the TrueML blog series.

TrueFoundry is a ML Deployment PaaS over Kubernetes to speed up developer workflows while allowing them full flexibility in testing and deploying models while ensuring full security and control for the Infra team. Through our platform, we enable Machine learning Teams to deploy and monitor models in 15 minutes with 100% reliability, scalability, and the ability to roll back in seconds - allowing them to save cost and release Models to production faster, enabling real business value realisation.

Discuss About your ML Pipeline Challenges with us here

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now

The fastest way to build, govern and scale your AI

Book a Demo

True ML Talks #10 - LLMs and GenAI with engineering director @ Meta

Watch the full episode below:

Key Observations from Anant's experiments:

LoRA: An intriguing technique in fine-tuning large language models.

Building Embeddings for Hacker News: Challenges and Model Selection

The Debate on Long-Term Sustainability: Large Models vs. Smaller, Fine-tuned Models

The Need for Improved Experiment Tracking and Accessible Documentation in LLMOps

Addressing Data Security Concerns in MLOps: Ensuring Privacy and Trust

The Power and Potential of OpenAI Plugins

The Power of Language Models in Recovery Flow and Plugin Adaptation

Staying Informed: Anant Narayanan's Approach to Keeping Up with New Developments

Read our previous blogs in the True ML Talks series:

Built for Speed: ~10ms Latency, Even Under Load

True ML Talks #23 - MLOps and LLMs Applications @ GitLab

Gartner on AI Gateways: Here’s what Enterprise AI Teams Should Know

Best LLM Observability Tools

Exploring Vertex AI Alternatives for 2026

Data Residency Comparison for AI Gateways

The Complete Guide to AI Gateways and MCP Servers

True ML Talks #10 - LLMs and GenAI with engineering director @ Meta

Watch the full episode below:

Key Observations from Anant's experiments:

LoRA: An intriguing technique in fine-tuning large language models.

Building Embeddings for Hacker News: Challenges and Model Selection

The Debate on Long-Term Sustainability: Large Models vs. Smaller, Fine-tuned Models

The Need for Improved Experiment Tracking and Accessible Documentation in LLMOps

Addressing Data Security Concerns in MLOps: Ensuring Privacy and Trust

The Power and Potential of OpenAI Plugins

The Power of Language Models in Recovery Flow and Plugin Adaptation

Staying Informed: Anant Narayanan's Approach to Keeping Up with New Developments

Read our previous blogs in the True ML Talks series:

Built for Speed: ~10ms Latency, Even Under Load

Discover More

True ML Talks #23 - MLOps and LLMs Applications @ GitLab

Gartner on AI Gateways: Here’s what Enterprise AI Teams Should Know

Best LLM Observability Tools

Exploring Vertex AI Alternatives for 2026

Data Residency Comparison for AI Gateways

The Complete Guide to AI Gateways and MCP Servers

Subscribe to our newsletter