In the latest episode of TrueML Talks, Nikunj, co-founder at True Foundry, delves into an enlightening conversation with Vincent, a foundational figure at Snorkel AI. As a company that finds itself at the heart of AI's evolving landscape, Snorkel AI's journey from academia to leading the charge in data-centric AI development offers profound insights. Vincent shares his experiences from the early days at Stanford AI Lab to steering product and design at Snorkel AI, shedding light on the intricacies of machine learning (ML), Large Language Models (LLMs), and the impact of generative AI on the industry. We touched upon the following topics:
- The Evolution of Snorkel AI- Data-Centric AI Development- Transition to Product Leadership- Generative AI and Open Models- Career Advice for AI Enthusiasts
Vincent tells about Snorkel AI's roots as an academic project focused on weak supervision and programmatic labeling. This approach laid the groundwork for what Snorkel AI has become today navigating AI application development enterprises. Vincent's journey from a graduate student to a leader at Snorkel AI shows us how strong academic research converts into a startup and what snorkel is today. At Stanford, they collaborated with doctors and created tailored datasets for them which helped them get a real-life use-case for their research.He also covers his days at Y-Combinator, sharing his early days and his hunger for growth and learning in tech.
Vincent shares how at the beginning creating databases was just sharing large data sheets between teams and an unorganised task this has been changed. Vincent elaborates on the company's focus on facilitating enterprise teams to manage, curate, and label data at scale—turning, the janitorial tasks of AI development. This data-centric approach enables companies to align AI closely with their unique objectives and datasets, emphasizing the critical role of data in programming AI systems he also mentions that for industries like banking and healthcare, there can't be a probability of data accuracy as one mistake on LLMs part can be fatal for operations.
Coming from an ML background Vincent shares how the role of Head of Product(AI/ML) and design helps him talk directly to data scientists, and ML engineers. This helps him understand their use-cases and pain points which he can incorporate directly into the product. Due to his multi-dimensional involvement across different domains in Snorkel, he can navigate the product according to the needs of customers.
The generative AI age and the proliferation of open models have significantly influenced the AI landscape. Vincent explains how LLMs are the newest addition in generating datasets for training purposes but on the other hand, they often struggle with the accuracy of produced datasets. As we discussed before, data generated by an LLM can be suited for generalized use cases and demo-level tasks but this does not apply to use cases where accuracy plays an important role in domains like banking, finance, insurance, and healthcare domains.
Vincent's hot take on the current state of AI development emphasizes the pivotal shift towards open-source models and data, proposing a more holistic approach to sharing AI innovations. He argues that the true essence of open-sourcing in AI should extend beyond merely releasing the model weights; it should include making datasets, development processes, and the rationale behind model training accessible. This approach fosters a collaborative ecosystem that accelerates innovation, ensures reproducibility, and builds safer AI systems. By advocating for the open data movement, Vincent highlights the importance of transparency in AI development, enabling a broader community to contribute to and benefit from the advancements in the field. This perspective not only challenges the conventional practices of AI sharing but also calls for a comprehensive strategy that could democratize AI development, ensuring that the benefits of AI technologies are widely distributed and accessible.
Vincent mentions, that the hackathon level is not enough, you'll have to get your hands dirty and try out something that you use that will help you get results and stand out. Reflecting on his journey, Vincent offers advice to those embarking on their AI careers. He emphasizes the value of hands-on experience, encouraging individuals to build and iterate on AI projects that address real-world challenges. This experiential learning, coupled with collaboration and a passion for exploration, is pivotal in navigating the rapidly evolving AI domain.
Join AI/ML leaders for the latest on product, community, and GenAI developments