In this episode, we talk with Will Brown, a research lead at Prime Intellect, about his journey into reinforcement learning (RL) and multi-agent systems, exploring their theoretical foundations and practical applications. We discuss the importance of RL in the current LLMs pipeline and the challenges it faces. We also discuss applying agentic workflows to real-world applications and the ongoing evolution of AI development.Chapters00:00 Introduction to Reinforcement Learning and Will's Journey03:10 Theoretical Foundations of Multi-Agent Systems06:09 Transitioning from Theory to Practical Applications09:01 The Role of Game Theory in AI11:55 Exploring the Complexity of Games and AI14:56 Optimization Techniques in Reinforcement Learning17:58 The Evolution of RL in LLMs21:04 Challenges and Opportunities in RL for LLMs23:56 Key Components for Successful RL Implementation27:00 Future Directions in Reinforcement Learning36:29 Exploring Agentic Reinforcement Learning Paradigms38:45 The Role of Intermediate Results in RL41:16 Multi-Agent Systems: Challenges and Opportunities45:08 Distributed Environments and Decentralized RL49:31 Prompt Optimization Techniques in RL52:25 Statistical Rigor in Evaluations55:49 Future Directions in Reinforcement Learning59:50 Task-Specific Models vs. General Models01:02:04 Insights on Random Verifiers and Learning Dynamics01:04:39 Real-World Applications of RL and Evaluation Challenges01:05:58 Prime RL Framework: Goals and Trade-offs01:10:38 Open Source vs. Closed Source Models01:13:08 Continuous Learning and Knowledge ImprovementMusic:"Kid Kodi" — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0."Palms Down" — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.Changes: trimmed
--------
1:05:43
--------
1:05:43
EP16: AI News and Papers
In this episode, we discuss various topics in AI, including the challenges of the conference review process, the capabilities of Kimi K2 thinking, the advancements in TPU technology, the significance of real-world data in robotics, and recent innovations in AI research. We also talk about the cool "Chain of Thought Hijacking" paper, how to use simple ideas to scale RL, and the implications of the Cosmos project, which aims to enable autonomous scientific discovery through AI.Papers and links:Chain-of-Thought Hijacking - https://arxiv.org/pdf/2510.26418Kosmos: An AI Scientist for Autonomous Discovery - https://t.co/9pCr6AUXAeJustRL: Scaling a 1.5B LLM with a Simple RL Recipe - https://relieved-cafe-fe1.notion.site/JustRL-Scaling-a-1-5B-LLM-with-a-Simple-RL-Recipe-24f6198b0b6b80e48e74f519bfdaf0a8Chapters00:00 Navigating the Peer Review Process04:17 Kimi K2 Thinking: A New Era in AI12:27 The Future of Tool Calls in AI17:12 Exploring Google's New TPUs22:04 The Importance of Real-World Data in Robotics28:10 World Models: The Next Frontier in AI31:36 Nvidia's Dominance in AI Partnerships32:08 Exploring Recent AI Research Papers37:46 Chain of Thought Hijacking: A New Threat43:05 Simplifying Reinforcement Learning Training54:03 Cosmos: AI for Autonomous Scientific DiscoveryMusic:"Kid Kodi" — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0."Palms Down" — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.Changes: trimmed
--------
59:20
--------
59:20
EP15: The Information Bottleneck and Scaling Laws with Alex Alemi
In this episode, we sit down with Alex Alemi, an AI researcher at Anthropic (previously at Google Brain and Disney), to explore the powerful framework of the information bottleneck and its profound implications for modern machine learning.We break down what the information bottleneck really means, a principled approach to retaining only the most informative parts of data while compressing away the irrelevant. We discuss why compression is still important in our era of big data, how it prevents overfitting, and why it's essential for building models that generalize well.We also dive into scaling laws: why they matter, what we can learn from them, and what they tell us about the future of AI research.Papers and links:Alex's website - https://www.alexalemi.com/Scaling exponents across parameterizations and optimizers - https://arxiv.org/abs/2407.05872Deep Variational Information Bottleneck - https://arxiv.org/abs/1612.00410Layer by Layer: Uncovering Hidden Representations in Language Models - https://arxiv.org/abs/2502.02013Information in Infinite Ensembles of Infinitely-Wide Neural Networks - https://proceedings.mlr.press/v118/shwartz-ziv20a.htmlMusic:“Kid Kodi” — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.“Palms Down” — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.Changes: trimmed
--------
1:22:50
--------
1:22:50
EP14: AI News and Papers
In this episode, we talked about AI news and recent papers. We explored the complexities of using AI models in healthcare (the Nature Medicine paper on GPT-5's fragile intelligence in medical contexts). We discussed the delicate balance between leveraging LLMs as powerful research tools and the risks of over-reliance, touching on issues such as hallucinations, medical disagreements among practitioners, and the need for better education on responsible AI use in healthcare.We also talked about Stanford's "Cartridges" paper, which presents an innovative approach to long-context language models. The paper tackles the expensive computational costs of billion-token context windows by compressing KV caches through a clever "self-study" method using synthetic question-answer pairs and context distillation. We discussed the implications for personalization, composability, and making long-context models more practical.Additionally, we explored the "Continuous Autoregressive Language Models" paper and touched on insights from the Smol Training Playbook.Papers discussed:The fragile intelligence of GPT-5 in medicine: https://www.nature.com/articles/s41591-025-04008-8Cartridges: Lightweight and general-purpose long context representations via self-study: https://arxiv.org/abs/2506.06266Continuous Autoregressive Language Models: https://arxiv.org/abs/2510.27688The Smol Training Playbook: https://huggingface.co/spaces/HuggingFaceTB/smol-training-playbookMusic:“Kid Kodi” — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.“Palms Down” — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.Changes: trimmedThis is an experimental format for us, just news and papers without a guest interview. Let us know what you think!
--------
57:20
--------
57:20
EP13: Recurrent-Depth Models and Latent Reasoning with Jonas Geiping
In this episode, we host Jonas Geiping from ELLIS Institute & Max-Planck Institute for Intelligent Systems, Tübingen AI Center, Germany. We talked about his broad research on Recurrent-Depth Models and latent reasoning in large language models (LLMs). We talked about what these models can and can't do, what are the challenges and next breakthroughs in the field, world models, and the future of developing better models. We also talked about safety and interpretability, and the role of scaling laws in AI development.Chapters00:00 Introduction and Guest Introduction01:03 Peer Review in Preprint Servers06:57 New Developments in Coding Models09:34 Open Source Models in Europe11:00 Dynamic Layers in LLMs26:05 Training Playbook Insights30:05 Recurrent Depth Models and Reasoning Tasks43:59 Exploring Recursive Reasoning Models46:46 The Role of World Models in AI48:41 Innovations in AI Training and Simulation50:39 The Promise of Recurrent Depth Models52:34 Navigating the Future of AI Algorithms54:44 The Bitter Lesson of AI Development59:11 Advising the Next Generation of Researchers01:06:42 Safety and Interpretability in AI Models01:10:46 Scaling Laws and Their Implications01:16:19 The Role of PhDs in AI ResearchLinks and paper:Jonas' website - https://jonasgeiping.github.io/Scaling up test-time compute with latent reasoning: A recurrent depth approach - https://arxiv.org/abs/2502.05171The Smol Training Playbook: The Secrets to Building World-Class LLMs - https://huggingface.co/spaces/HuggingFaceTB/smol-training-playbookVaultGemma: A Differentially Private Gemma Model - https://arxiv.org/abs/2510.15001Music:“Kid Kodi” — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.“Palms Down” — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.Changes: trimmed
Two AI Researchers - Ravid Shwartz Ziv, and Allen Roush, discuss the latest trends, news, and research within Generative AI, LLMs, GPUs, and Cloud Systems.