Weekly talks and fireside chats about everything that has to do with the new space emerging around DevOps for Machine Learning aka MLOps aka Machine Learning Op...
Domino: Communication-Free LLM Training Engine // Guanhua Wang // #278
Guanhua Wang is a Senior Researcher in DeepSpeed Team at Microsoft. Before Microsoft, Guanhua earned his Computer Science PhD from UC Berkeley.
Domino: Communication-Free LLM Training Engine // MLOps Podcast #278 with Guanhua "Alex" Wang, Senior Researcher at Microsoft.
// Abstract
Given the popularity of generative AI, Large Language Models (LLMs) often consume hundreds or thousands of GPUs to parallelize and accelerate the training process. Communication overhead becomes more pronounced when training LLMs at scale. To eliminate communication overhead in distributed LLM training, we propose Domino, which provides a generic scheme to hide communication behind computation. By breaking the data dependency of a single batch training into smaller independent pieces, Domino pipelines these independent pieces of training and provides a generic strategy of fine-grained communication and computation overlapping. Extensive results show that compared with Megatron-LM, Domino achieves up to 1.3x speedup for LLM training on Nvidia DGX-H100 GPUs.
// Bio
Guanhua Wang is a Senior Researcher in the DeepSpeed team at Microsoft. His research focuses on large-scale LLM training and serving. Previously, he led the ZeRO++ project at Microsoft which helped reduce over half of model training time inside Microsoft and Linkedin. He also led and was a major contributor to Microsoft Phi-3 model training. He holds a CS PhD from UC Berkeley advised by Prof Ion Stoica.
// MLOps Swag/Merch
https://shop.mlops.community/
// Related Links
Website: https://guanhuawang.github.io/
DeepSpeed hiring: https://www.microsoft.com/en-us/research/project/deepspeed/opportunities/
Large Model Training and Inference with DeepSpeed // Samyam Rajbhandari // LLMs in Prod Conference: https://youtu.be/cntxC3g22oU
--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, blogs, newsletters, and more: https://mlops.community/
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Guanhua on LinkedIn: https://www.linkedin.com/in/guanhua-wang/
Timestamps:
[00:00] Guanhua's preferred coffee
[00:17] Takeaways
[01:36] Please like, share, leave a review, and subscribe to our MLOps channels!
[01:47] Phi model explanation
[06:29] Small Language Models optimization challenges
[07:29] DeepSpeed overview and benefits
[10:58] Crazy unimplemented crazy AI ideas
[17:15] Post training vs QAT
[19:44] Quantization over distillation
[24:15] Using Lauras
[27:04] LLM scaling sweet spot
[28:28] Quantization techniques
[32:38] Domino overview
[38:02] Training performance benchmark
[42:44] Data dependency-breaking strategies
[49:14] Wrap up
--------
49:47
AI's Next Frontier // Aditya Naganath // #277
Thanks to the High Signal Podcast by Delphina:
https://go.mlops.community/HighSignalPodcast
Aditya Naganath is an experienced investor currently working with Kleiner Perkins. He has a passion for connecting with people over coffee and discussing various topics related to tech, products, ideas, and markets.
AI's Next Frontier // MLOps Podcast #277 with Aditya Naganath, Principal at Kleiner Perkins.
// Abstract
LLMs have ushered in an unmistakable supercycle in the world of technology. The low-hanging use cases have largely been picked off. The next frontier will be AI coworkers who sit alongside knowledge workers, doing work side by side. At the infrastructure level, one of the most important primitives invented by man - the data center, is being fundamentally rethought in this new wave.
// Bio
Aditya Naganath joined Kleiner Perkins’ investment team in 2022 with a focus on artificial intelligence, enterprise software applications, infrastructure and security. Prior to joining Kleiner Perkins, Aditya was a product manager at Google focusing on growth initiatives for the next billion users team. He previously was a technical lead at Palantir Technologies and formerly held software engineering roles at Twitter and Nextdoor, where he was a Kleiner Perkins fellow. Aditya earned a patent during his time at Twitter for a technical analytics product he co-created.
Originally from Mumbai India, Aditya graduated magna cum laude from Columbia University with a bachelor’s degree in Computer Science, and an MBA from Stanford University. Outside of work, you can find him playing guitar with a hard rock band, competing in chess or on the squash courts, and fostering puppies. He is also an avid poker player.
// MLOps Swag/Merch
https://shop.mlops.community/
// Related Links
Faith's Hymn by Beautiful Chorus: https://open.spotify.com/track/1bDv6grQB5ohVFI8UDGvKK?si=4b00752eaa96413b Substack: https://adityanaganath.substack.com/?utm_source=substack&utm_medium=web&utm_campaign=substack_profileWith thanks to the High Signal Podcast by Delphina: https://go.mlops.community/HighSignalPodcastBuilding the Future of AI in Software Development // Varun Mohan // MLOps Podcast #195 - https://youtu.be/1DJKq8StuToDo Re MI for Training Metrics: Start at the Beginning // Todd Underwood // AIQCON - https://youtu.be/DxyOlRdCofo
--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, blogs, newsletters, and more: https://mlops.community/
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Aditya on LinkedIn: https://www.linkedin.com/in/aditya-naganath/
--------
57:30
PyTorch for Control Systems and Decision Making // Vincent Moens // #276
Dr Vincent Moens is an Applied Machine Learning Research Scientist at Meta and an author of TorchRL and TensorDict in Pytorch.
PyTorch for Control Systems and Decision Making // MLOps Podcast #276 with Vincent Moens, Research Engineer at Meta.
// Abstract
PyTorch is widely adopted across the machine learning community for its flexibility and ease of use in applications such as computer vision and natural language processing. However, supporting reinforcement learning, decision-making, and control communities is equally crucial, as these fields drive innovation in areas like robotics, autonomous systems, and game-playing. This podcast explores the intersection of PyTorch and these fields, covering practical tips and tricks for working with PyTorch, an in-depth look at TorchRL, and discussions on debugging techniques, optimization strategies, and testing frameworks. By examining these topics, listeners will understand how to effectively use PyTorch for control systems and decision-making applications.
// Bio
Vincent Moens is a research engineer on the PyTorch core team at Meta, based in London. As the maintainer of TorchRL (https://github.com/pytorch/rl) and TensorDict (https://github.com/pytorch/tensordict), Vincent plays a key role in supporting the decision-making community within the PyTorch ecosystem.
Alongside his technical role in the PyTorch community, Vincent also actively contributes to AI-related research projects.
Before joining Meta, Vincent worked as an ML researcher at Huawei and AIG.
Vincent holds a Medical Degree and a PhD in Computational Neuroscience.
// MLOps Swag/Merch
https://shop.mlops.community/
// Related Links
Musical recommendation: https://open.spotify.com/artist/1Uff91EOsvd99rtAupatMP?si=jVkoFiq8Tmq0fqK_OIEglg
Website: github.com/vmoens
TorchRL: https://github.com/pytorch/rl
TensorDict: https://github.com/pytorch/tensordict
LinkedIn post: https://www.linkedin.com/posts/vincent-moens-9bb91972_join-the-tensordict-discord-server-activity-7189297643322253312-Wo9J?utm_source=share&utm_medium=member_desktop
--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, blogs, newsletters, and more: https://mlops.community/
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Vincent on LinkedIn: https://www.linkedin.com/in/mvi/
--------
56:39
AI-Driven Code: Navigating Due Diligence & Transparency in MLOps // Matt van Itallie // #275
Matt Van Itallie is the founder and CEO of Sema. Prior to this, they were the Vice President of Customer Support and Customer Operations at Social Solutions.
AI-Driven Code: Navigating Due Diligence & Transparency in MLOps // MLOps Podcast #275 with Matt van Itallie, Founder and CEO of Sema.
// Abstract
Matt Van Itallie, founder and CEO of Sema, discusses how comprehensive codebase evaluations play a crucial role in MLOps and technical due diligence. He highlights the impact of Generative AI on code transparency and explains the Generative AI Bill of Materials (GBOM), which helps identify and manage risks in AI-generated code. This talk offers practical insights for technical and non-technical audiences, showing how proper diligence can enhance value and mitigate risks in machine learning operations.
// Bio
Matt Van Itallie is the Founder and CEO of Sema. He and his team have developed Comprehensive Codebase Scans, the most thorough and easily understandable assessment of a codebase and engineering organization. These scans are crucial for private equity and venture capital firms looking to make informed investment decisions. Sema has evaluated code within organizations that have a collective value of over $1 trillion. In 2023, Sema served 7 of the 9 largest global investors, along with market-leading strategic investors, private equity, and venture capital firms, providing them with critical insights.
In addition, Sema is at the forefront of Generative AI Code Transparency, which measures how much code created by GenAI is in a codebase. They are the inventors behind the Generative AI Bill of Materials (GBOM), an essential resource for investors to understand and mitigate risks associated with AI-generated code.
Before founding Sema, Matt was a Private Equity operating executive and a management consultant at McKinsey. He graduated from Harvard Law School and has had some interesting adventures, like hiking a third of the Appalachian Trail and biking from Boston to Seattle.
Full bio: https://alistar.fm/bio/matt-van-itallie
// MLOps Swag/Merch
https://shop.mlops.community/
// Related Links
Website: https://en.m.wikipedia.org/wiki/Michael_Gschwind
--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, blogs, newsletters, and more: https://mlops.community/
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Matt on LinkedIn: https://www.linkedin.com/in/mvi/
--------
57:01
PyTorch's Combined Effort in Large Model Optimization // Michael Gschwind // #274
Dr. Michael Gschwind is a Director / Principal Engineer for PyTorch at Meta Platforms. At Meta, he led the rollout of GPU Inference for production services.
// MLOps Podcast #274 with Michael Gschwind, Software Engineer, Software Executive at Meta Platforms.
// Abstract
Explore the role in boosting model performance, on-device AI processing, and collaborations with tech giants like ARM and Apple. Michael shares his journey from gaming console accelerators to AI, emphasizing the power of community and innovation in driving advancements.
// Bio
Dr. Michael Gschwind is a Director / Principal Engineer for PyTorch at Meta Platforms. At Meta, he led the rollout of GPU Inference for production services. He led the development of MultiRay and Textray, the first deployment of LLMs at a scale exceeding a trillion queries per day shortly after its rollout. He created the strategy and led the implementation of PyTorch donation optimization with Better Transformers and Accelerated Transformers, bringing Flash Attention, PT2 compilation, and ExecuTorch into the mainstream for LLMs and GenAI models. Most recently, he led the enablement of large language models on-device AI with mobile and edge devices.
// MLOps Swag/Merch
https://mlops-community.myshopify.com/
// Related Links
Website: https://en.m.wikipedia.org/wiki/Michael_Gschwind
--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, blogs, newsletters, and more: https://mlops.community/
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Michael on LinkedIn: https://www.linkedin.com/in/michael-gschwind-3704222/?utm_source=share&utm_campaign=share_via&utm_content=profile&utm_medium=ios_app
Timestamps:
[00:00] Michael's preferred coffee
[00:21] Takeaways
[01:59] Please like, share, leave a review, and subscribe to our MLOps channels!
[02:10] Gaming to AI Accelerators
[11:34] Torch Chat goals
[18:53] Pytorch benchmarking and competitiveness
[21:28] Optimizing MLOps models
[24:52] GPU optimization tips
[29:36] Cloud vs On-device AI
[38:22] Abstraction across devices
[42:29] PyTorch developer experience
[45:33] AI and MLOps-related antipatterns
[48:33] When to optimize
[53:26] Efficient edge AI models
[56:57] Wrap up
Weekly talks and fireside chats about everything that has to do with the new space emerging around DevOps for Machine Learning aka MLOps aka Machine Learning Operations.