
Episode 66: The Agent Paradox - Why Moderna's Most Productive AI Systems Aren't Agents
2026/1/08 | 42 mins.
Surprise. We don’t have agents. I actually went in and did an audit of all the LLM applications that we’ve developed internally. And if you were to take Anthropic’s definition of workflow versus agent, we don’t have agents. I would not classify any of our applications as agents. xEric Ma, who leads Research Data Science in the Data Science and AI group at Moderna, joins Hugo on moving past the hype of autonomous agents to build reliable, high-value workflows.We discuss:* Reliable Workflows: Prioritize rigid workflows over dynamic AI agents to ensure reliability and minimize stochasticity in production environments;* Permission Mapping: The true challenge in regulated environments is security, specifically mapping permissions across source documents, vector stores, and model weights;* Trace Log Risk: LLM execution traces pose a regulatory risk, inadvertently leaking restricted data like trade secrets or personal information;* High-Value Data Work: LLMs excel at transforming archived documents and freeform forms into required formats, offloading significant “janitorial” work from scientists;* “Non-LLM” First: Solve problems with simpler tools like Python or ML models before LLMs to ensure robustness and eliminate generative AI stochasticity;* Contextual Evaluation: Tailor evaluation rigor to consequences; low-stakes tools can be “vibe-checked,” while patient safety outputs demand exhaustive error characterization;* Serverless Biotech Backbone: Serverless infrastructure like Modal and reactive notebooks such as Marimo empowers biotech data scientists for rapid deployment without heavy infrastructure overhead.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2206. Here is a 35% discount code for readers. 👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgch👉 Eric & Hugo have a free upcoming livestream workshop: Building Tools for Thinking with AI (register to join live or get the recording afterwards) 👈Show notes* Eric’s website* Eric Ma on LinkedIn* Eric’s blog* Eric’s data science newsletter* Building Effective AI Agents by the Anthropic team* Wow, Marimo from Eric’s blog* Wow, Modal from Eric’s blog* Upcoming Events on Luma* Watch the podcast video on YouTube* Join the final cohort of our Building AI Applications course in Q1, 2026 (35% off for listeners)Timestamps00:00 Defining Agents and Workflows02:04 Challenges in Regulated Environments04:24 Eric Ma's Role at Moderna, Leading Research Data Science in the Data Science and AI Group12:37 Document Reformatting and Automation15:42 Data Security and Permission Mapping20:05 Choosing the Right Model for Production20:41 Evaluating Model Changes with Benchmarks23:10 Vibe-Based Evaluation vs. Formal Testing27:22 Security and Fine-Tuning in LLMs28:45 Challenges and Future of Fine-Tuning34:00 Security Layers and Information Leakage37:48 Wrap-Up and Final Remarks👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2026. Here is a 35% discount code for readers. 👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgch This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit hugobowne.substack.com

Episode 65: The Rise of Agentic Search
2025/12/19 | 51 mins.
We’re really moving from a world where humans are authoring search queries and humans are executing those queries and humans are digesting the results to a world where AI is doing that for us.Jeff Huber, CEO and co-founder of Chroma, joins Hugo to talk about how agentic search and retrieval are changing the very nature of search and software for builders and users alike.We Discuss:* “Context engineering”, the strategic design and engineering of what context gets fed to the LLM (data, tools, memory, and more), which is now essential for building reliable, agentic AI systems;* Why simply stuffing large context windows is no longer feasible due to “context rot” as AI applications become more goal-oriented and capable of multi-step tasks* A framework for precisely curating and providing only the most relevant, high-precision information to ensure accurate and dependable AI systems;* The “agent harness”, the collection of tools and capabilities an agent can access, and how to construct these advanced systems;* Emerging best practices for builders, including hybrid search as a robust default, creating “golden datasets” for evaluation, and leveraging sub-agents to break down complex tasks* The major unsolved challenge of agent evaluation, emphasizing a shift towards iterative, data-centric approaches.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2206. Here is a 35% discount code for readers. 👈Oh! One more thing: we’ve just announced a Vanishing Gradients livestream for January 21 that you may dig:* A Builder’s Guide to Agentic Search & Retrieval with Doug Turnbull and John Berryman (register to join live or get the recording afterwards.Show notes* Jeff Huber on Twitter* Jeff Huber on LinkedIn* Try Chroma!* Context Rot: How Increasing Input Tokens Impacts LLM Performance by The Chroma Team* AI Agent Harness, 3 Principles for Context Engineering, and the Bitter Lesson Revisited* From Context Engineering to AI Agent Harnesses: The New Software Discipline* Generative Benchmarking by The Chroma Team* Effective context engineering for AI agents by The Anthropic Team* Making Sense of Millions of Conversations for AI Agents by Ivan Leo (Manus) and Hugo* How we built our multi-agent research system by The Anthropic Team* Upcoming Events on Luma* Watch the podcast video on YouTube👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2206. Here is a 35% discount code for readers. 👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgch This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit hugobowne.substack.com

Episode 64: Data Science Meets Agentic AI with Michael Kennedy (Talk Python)
2025/12/03 | 1h 2 mins.
We have been sold a story of complexity. Michael Kennedy (Talk Python) argues we can escape this by relentlessly focusing on the problem at hand, reducing costs by orders of magnitude in software, data, and AI.In this episode, Michael joins Hugo to dig into the practical side of running Python systems at scale. They connect these ideas to the data science workflow, exploring which software engineering practices allow AI teams to ship faster and with more confidence. They also detail how to deploy systems without unnecessary complexity and how Agentic AI is fundamentally reshaping development workflows.We talk through:- Escaping complexity hell to reduce costs and gain autonomy- The specific software practices, like the "Docker Barrier", that matter most for data scientists- How to replace complex cloud services with a simple, robust $30/month stack- The shift from writing code to "systems thinking" in the age of Agentic AI- How to manage the people-pleasing psychology of AI agents to prevent broken code- Why struggle is still essential for learning, even when AI can do the work for youLINKSTalk Python In Production, the Book! (https://talkpython.fm/books/python-in-production)Just Enough Python for Data Scientists Course (https://training.talkpython.fm/courses/just-enough-python-for-data-scientists)Agentic AI Programming for Python Course (https://training.talkpython.fm/courses/agentic-ai-programming-for-python)Talk Python To Me (https://talkpython.fm/) and a recent episode with Hugo as guest: Building Data Science with Foundation LLM Models (https://talkpython.fm/episodes/show/526/building-data-science-with-foundation-llm-models)Python Bytes podcast (https://pythonbytes.fm/)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtube.com/live/jfSRxxO3aRo?feature=share)Join the final cohort of our Building AI Applications course starting Jan 12, 2026 (35% off for listeners) (https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav): https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit hugobowne.substack.com

Episode 63: Why Gemini 3 Will Change How You Build AI Agents with Ravin Kumar (Google DeepMind)
2025/11/22 | 1h
Gemini 3 is a few days old and the massive leap in performance and model reasoning has big implications for builders: as models begin to self-heal, builders are literally tearing out the functionality they built just months ago... ripping out the defensive coding and reshipping their agent harnesses entirely.Ravin Kumar (Google DeepMind) joins Hugo to breaks down exactly why the rapid evolution of models like Gemini 3 is changing how we build software. They detail the shift from simple tool calling to building reliable "Agent Harnesses", explore the architectural tradeoffs between deterministic workflows and high-agency systems, the nuance of preventing context rot in massive windows, and why proper evaluation infrastructure is the only way to manage the chaos of autonomous loops.They talk through:- The implications of models that can "self-heal" and fix their own code- The two cultures of agents: LLM workflows with a few tools versus when you should unleash high-agency, autonomous systems.- Inside NotebookLM: moving from prototypes to viral production features like Audio Overviews- Why Needle in a Haystack benchmarks often fail to predict real-world performance- How to build agent harnesses that turn model capabilities into product velocity- The shift from measuring latency to managing time-to-compute for reasoning tasksLINKSFrom Context Engineering to AI Agent Harnesses: The New Software Discipline, a podcast Hugo did with Lance Martin, LangChain (https://high-signal.delphina.ai/episode/context-engineering-to-ai-agent-harnesses-the-new-software-discipline)Context Rot: How Increasing Input Tokens Impacts LLM Performance (https://research.trychroma.com/context-rot)Effective context engineering for AI agents by Anthropic (https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/CloimQsQuJM)Join the final cohort of our Building AI Applications course starting Jan 12, 2026 (https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav): https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit hugobowne.substack.com

Episode 62: Practical AI at Work: How Execs and Developers Can Actually Use LLMs
2025/10/31 | 59 mins.
Many leaders are trapped between chasing ambitious, ill-defined AI projects and the paralysis of not knowing where to start. Dr. Randall Olson argues that the real opportunity isn't in moonshots, but in the "trillions of dollars of business value" available right now. As co-founder of Wyrd Studios, he bridges the gap between data science, AI engineering, and executive strategy to deliver a practical framework for execution.In this episode, Randy and Hugo lay out how to find and solve what might be considered "boring but valuable" problems, like an EdTech company automating 20% of its support tickets with a simple retrieval bot instead of a complex AI tutor. They discuss how to move incrementally along the "agentic spectrum" and why treating AI evaluation with the same rigor as software engineering is non-negotiable for building a disciplined, high-impact AI strategy.They talk through:How a non-technical leader can prototype a complex insurance claim classifier using just photos and a ChatGPT subscription.The agentic spectrum: Why you should start by automating meeting summaries before attempting to build fully autonomous agents.The practical first step for any executive: Building a personal knowledge base with meeting transcripts and strategy docs to get tailored AI advice.Why treating AI evaluation with the same rigor as unit testing is essential for shipping reliable products.The organizational shift required to unlock long-term AI gains, even if it means a short-term productivity dip.LINKSRandy on LinkedIn (https://www.zenml.io/llmops-database)Wyrd Studios (https://thewyrdstudios.com/)Stop Building AI Agents (https://www.decodingai.com/p/stop-building-ai-agents)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/-YQjKH3wRvc)🎓 Learn more:In Hugo's course: Building AI Applications for Data Scientists and Software Engineers (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20) — https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20 Next cohort starts November 3: come build with us! This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit hugobowne.substack.com



Vanishing Gradients