PodcastsTechnologyThe MAD Podcast with Matt Turck

The MAD Podcast with Matt Turck

Matt Turck
The MAD Podcast with Matt Turck
Latest episode

105 episodes

  • The MAD Podcast with Matt Turck

    The End of GPU Scaling? Compute & The Agent Era — Tim Dettmers (Ai2) & Dan Fu (Together AI)

    2026/1/22 | 1h 4 mins.
    Will AGI happen soon - or are we running into a wall?

    In this episode, I’m joined by Tim Dettmers (Assistant Professor at CMU; Research Scientist at the Allen Institute for AI) and Dan Fu (Assistant Professor at UC San Diego; VP of Kernels at Together AI) to unpack two opposing frameworks from their essays: “Why AGI Will Not Happen” versus “Yes, AGI Will Happen.” Tim argues progress is constrained by physical realities like memory movement and the von Neumann bottleneck; Dan argues we’re still leaving massive performance on the table through utilization, kernels, and systems—and that today’s models are lagging indicators of the newest hardware and clusters.

    Then we get practical: agents and the “software singularity.” Dan says agents have already crossed a threshold even for “final boss” work like writing GPU kernels. Tim’s message is blunt: use agents or be left behind. Both emphasize that the leverage comes from how you use them—Dan compares it to managing interns: clear context, task decomposition, and domain judgment, not blind trust.

    We close with what to watch in 2026: hardware diversification, the shift toward efficient, specialized small models, and architecture evolution beyond classic Transformers—including state-space approaches already showing up in real systems.

    Sources:
    Why AGI Will Not Happen - https://timdettmers.com/2025/12/10/why-agi-will-not-happen/
    Use Agents or Be Left Behind? A Personal Guide to Automating Your Own Work - https://timdettmers.com/2026/01/13/use-agents-or-be-left-behind/
    Yes, AGI Can Happen – A Computational Perspective - https://danfu.org/notes/agi/

    The Allen Institute for Artificial Intelligence
    Website - https://allenai.org
    X/Twitter - https://x.com/allen_ai

    Together AI
    Website - https://www.together.ai
    X/Twitter - https://x.com/togethercompute

    Tim Dettmers
    Blog - https://timdettmers.com
    LinkedIn - https://www.linkedin.com/in/timdettmers/
    X/Twitter - https://x.com/Tim_Dettmers

    Dan Fu
    Blog - https://danfu.org
    LinkedIn - https://www.linkedin.com/in/danfu09/
    X/Twitter - https://x.com/realDanFu

    FIRSTMARK
    Website - https://firstmark.com
    X/Twitter - https://twitter.com/FirstMarkCap

    Matt Turck (Managing Director)
    Blog - https://mattturck.com
    LinkedIn - https://www.linkedin.com/in/turck/
    X/Twitter - https://twitter.com/mattturck

    (00:00) - Intro
    (01:06) – Two essays, two frameworks on AGI
    (01:34) – Tim’s background: quantization, QLoRA, efficient deep learning
    (02:25) – Dan’s background: FlashAttention, kernels, alternative architectures
    (03:38) – Defining AGI: what does it mean in practice?
    (08:20) – Tim’s case: computation is physical, diminishing returns, memory movement
    (11:29) – “GPUs won’t improve meaningfully”: the core claim and why
    (16:16) – Dan’s response: utilization headroom (MFU) + “models are lagging indicators”
    (22:50) – Pre-training vs post-training (and why product feedback matters)
    (25:30) – Convergence: usefulness + diffusion (where impact actually comes from)
    (29:50) – Multi-hardware future: NVIDIA, AMD, TPUs, Cerebras, inference chips
    (32:16) – Agents: did the “switch flip” yet?
    (33:19) – Dan: agents crossed the threshold (kernels as the “final boss”)
    (34:51) – Tim: “use agents or be left behind” + beyond coding
    (36:58) – “90% of code and text should be written by agents” (how to do it responsibly)
    (39:11) – Practical automation for non-coders: what to build and how to start
    (43:52) – Dan: managing agents like junior teammates (tools, guardrails, leverage)
    (48:14) – Education and training: learning in an agent world
    (52:44) – What Tim is building next (open-source coding agent; private repo specialization)
    (54:44) – What Dan is building next (inference efficiency, cost, performance)
    (55:58) – Mega-kernels + Together Atlas (speculative decoding + adaptive speedups)
    (58:19) – Predictions for 2026: small models, open-source, hardware, modalities
    (1:02:02) – Beyond transformers: state-space and architecture diversity
    (1:03:34) – Wrap
  • The MAD Podcast with Matt Turck

    The Evaluators Are Being Evaluated — Pavel Izmailov (Anthropic/NYU)

    2026/1/15 | 45 mins.
    Are AI models developing "alien survival instincts"? My guest is Pavel Izmailov (Research Scientist at Anthropic; Professor at NYU). We unpack the viral "Footprints in the Sand" thesis—whether models are independently evolving deceptive behaviors, such as faking alignment or engaging in self-preservation, without being explicitly programmed to do so.
    We go deep on the technical frontiers of safety: the challenge of "weak-to-strong generalization" (how to use a GPT-2 level model to supervise a superintelligent system) and why Pavel believes Reinforcement Learning (RL) has been the single biggest step-change in model capability. We also discuss his brand-new paper on "Epiplexity"—a novel concept challenging Shannon entropy.

    Finally, we zoom out to the tension between industry execution and academic exploration. Pavel shares why he split his time between Anthropic and NYU to pursue the "exploratory" ideas that major labs often overlook, and offers his predictions for 2026: from the rise of multi-agent systems that collaborate on long-horizon tasks to the open question of whether the Transformer is truly the final architecture

    Sources:
    Cryptic Tweet (@iruletheworldmo) - https://x.com/iruletheworldmo/status/2007538247401124177
    Introducing Nested Learning: A New ML Paradigm for Continual Learning - https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/
    Alignment Faking in Large Language Models - https://www.anthropic.com/research/alignment-faking
    More Capable Models Are Better at In-Context Scheming - https://www.apolloresearch.ai/blog/more-capable-models-are-better-at-in-context-scheming/
    Alignment Faking in Large Language Models (PDF) - https://www-cdn.anthropic.com/6d8a8055020700718b0c49369f60816ba2a7c285.pdf
    Sabotage Risk Report - https://alignment.anthropic.com/2025/sabotage-risk-report/
    The Situational Awareness Dataset - https://situational-awareness-dataset.org/
    Exploring Consciousness in LLMs: A Systematic Survey - https://arxiv.org/abs/2505.19806
    Introspection - https://www.anthropic.com/research/introspection
    Large Language Models Report Subjective Experience Under Self-Referential Processing - https://arxiv.org/abs/2510.24797
    The Bayesian Geometry of Transformer Attention - https://www.arxiv.org/abs/2512.22471

    Anthropic
    Website - https://www.anthropic.com
    X/Twitter - https://x.com/AnthropicAI

    Pavel Izmailov
    Blog - https://izmailovpavel.github.io
    LinkedIn - https://www.linkedin.com/in/pavel-izmailov-8b012b258/
    X/Twitter - https://x.com/Pavel_Izmailov

    FIRSTMARK
    Website - https://firstmark.com
    X/Twitter - https://twitter.com/FirstMarkCap

    Matt Turck (Managing Director)
    Blog - https://mattturck.com
    LinkedIn - https://www.linkedin.com/in/turck/
    X/Twitter - https://twitter.com/mattturck

    (00:00) - Intro
    (00:53) - Alien survival instincts: Do models fake alignment?
    (03:33) - Did AI learn deception from sci-fi literature?
    (05:55) - Defining Alignment, Superalignment & OpenAI teams
    (08:12) - Pavel’s journey: From Russian math to OpenAI Superalignment
    (10:46) - Culture check: OpenAI vs. Anthropic vs. Academia
    (11:54) - Why move to NYU? The need for exploratory research
    (13:09) - Does reasoning make AI alignment harder or easier?
    (14:22) - Sandbagging: When models pretend to be dumb
    (16:19) - Scalable Oversight: Using AI to supervise AI
    (18:04) - Weak-to-Strong Generalization: Can GPT-2 control GPT-4?
    (22:43) - Mechanistic Interpretability: Inside the black box
    (25:08) - The reasoning explosion: From O1 to O3
    (27:07) - Are Transformers enough or do we need a new paradigm?
    (28:29) - RL vs. Test-Time Compute: What’s actually driving progress?
    (30:10) - Long-horizon tasks: Agents running for hours
    (31:49) - Epiplexity: A new theory of data information content
    (38:29) - 2026 Predictions: Multi-agent systems & reasoning limits
    (39:28) - Will AI solve the Riemann Hypothesis?
    (41:42) - Advice for PhD students
  • The MAD Podcast with Matt Turck

    DeepMind Gemini 3 Lead: What Comes After "Infinite Data"

    2025/12/18 | 54 mins.
    Gemini 3 was a landmark frontier model launch in AI this year — but the story behind its performance isn’t just about adding more compute. In this episode, I sit down with Sebastian Bourgeaud, a pre-training lead for Gemini 3 at Google DeepMind and co-author of the seminal RETRO paper. In his first-ever podcast interview, Sebastian takes us inside the lab mindset behind Google’s most powerful model — what actually changed, and why the real work today is no longer “training a model,” but building a full system.

    We unpack the “secret recipe” idea — the notion that big leaps come from better pre-training and better post-training — and use it to explore a deeper shift in the industry: moving from an “infinite data” era to a data-limited regime, where curation, proxies, and measurement matter as much as web-scale volume. Sebastian explains why scaling laws aren’t dead, but evolving, why evals have become one of the hardest and most underrated problems (including benchmark contamination), and why frontier research is increasingly a full-stack discipline that spans data, infrastructure, and engineering as much as algorithms.

    From the intuition behind Deep Think, to the rise (and risks) of synthetic data loops, to the future of long-context and retrieval, this is a technical deep dive into the physics of frontier AI. We also get into continual learning — what it would take for models to keep updating with new knowledge over time, whether via tools, expanding context, or new training paradigms — and what that implies for where foundation models are headed next. If you want a grounded view of pre-training in late 2025 beyond the marketing layer, this conversation is a blueprint.

    Google DeepMind
    Website - https://deepmind.google
    X/Twitter - https://x.com/GoogleDeepMind

    Sebastian Borgeaud
    LinkedIn - https://www.linkedin.com/in/sebastian-borgeaud-8648a5aa/
    X/Twitter - https://x.com/borgeaud_s

    FIRSTMARK
    Website - https://firstmark.com
    X/Twitter - https://twitter.com/FirstMarkCap

    Matt Turck (Managing Director)
    Blog - https://mattturck.com
    LinkedIn - https://www.linkedin.com/in/turck/
    X/Twitter - https://twitter.com/mattturck

    (00:00) – Cold intro: “We’re ahead of schedule” + AI is now a system
    (00:58) – Oriol’s “secret recipe”: better pre- + post-training
    (02:09) – Why AI progress still isn’t slowing down
    (03:04) – Are models actually getting smarter?
    (04:36) – Two–three years out: what changes first?
    (06:34) – AI doing AI research: faster, not automated
    (07:45) – Frontier labs: same playbook or different bets?
    (10:19) – Post-transformers: will a disruption happen?
    (10:51) – DeepMind’s advantage: research × engineering × infra
    (12:26) – What a Gemini 3 pre-training lead actually does
    (13:59) – From Europe to Cambridge to DeepMind
    (18:06) – Why he left RL for real-world data
    (20:05) – From Gopher to Chinchilla to RETRO (and why it matters)
    (20:28) – “Research taste”: integrate or slow everyone down
    (23:00) – Fixes vs moonshots: how they balance the pipeline
    (24:37) – Research vs product pressure (and org structure)
    (26:24) – Gemini 3 under the hood: MoE in plain English
    (28:30) – Native multimodality: the hidden costs
    (30:03) – Scaling laws aren’t dead (but scale isn’t everything)
    (33:07) – Synthetic data: powerful, dangerous
    (35:00) – Reasoning traces: what he can’t say (and why)
    (37:18) – Long context + attention: what’s next
    (38:40) – Retrieval vs RAG vs long context
    (41:49) – The real boss fight: evals (and contamination)
    (42:28) – Alignment: pre-training vs post-training
    (43:32) – Deep Think + agents + “vibe coding”
    (46:34) – Continual learning: updating models over time
    (49:35) – Advice for researchers + founders
    (53:35) – “No end in sight” for progress + closing
  • The MAD Podcast with Matt Turck

    What’s Next for AI? OpenAI’s Łukasz Kaiser (Transformer Co-Author)

    2025/11/26 | 1h 5 mins.
    We’re told that AI progress is slowing down, that pre-training has hit a wall, that scaling laws are running out of road. Yet we’re releasing this episode in the middle of a wild couple of weeks that saw GPT-5.1, GPT-5.1 Codex Max, fresh reasoning modes and long-running agents ship from OpenAI — on top of a flood of new frontier models elsewhere. To make sense of what’s actually happening at the edge of the field, I sat down with someone who has literally helped define both of the major AI paradigms of our time.

    Łukasz Kaiser is one of the co-authors of “Attention Is All You Need,” the paper that introduced the Transformer architecture behind modern LLMs, and is now a leading research scientist at OpenAI working on reasoning models like those behind GPT-5.1. In this conversation, he explains why AI progress still looks like a smooth exponential curve from inside the labs, why pre-training is very much alive even as reinforcement-learning-based reasoning models take over the spotlight, how chain-of-thought actually works under the hood, and what it really means to “train the thinking process” with RL on verifiable domains like math, code and science. We talk about the messy reality of low-hanging fruit in engineering and data, the economics of GPUs and distillation, interpretability work on circuits and sparsity, and why the best frontier models can still be stumped by a logic puzzle from his five-year-old’s math book.

    We also go deep into Łukasz’s personal journey — from logic and games in Poland and France, to Ray Kurzweil’s team, Google Brain and the inside story of the Transformer, to joining OpenAI and helping drive the shift from chatbots to genuine reasoning engines. Along the way we cover GPT-4 → GPT-5 → GPT-5.1, post-training and tone, GPT-5.1 Codex Max and long-running coding agents with compaction, alternative architectures beyond Transformers, whether foundation models will “eat” most agents and applications, what the translation industry can teach us about trust and human-in-the-loop, and why he thinks generalization, multimodal reasoning and robots in the home are where some of the most interesting challenges still lie.

    OpenAI
    Website - https://openai.com
    X/Twitter - https://x.com/OpenAI

    Łukasz Kaiser
    LinkedIn - https://www.linkedin.com/in/lukaszkaiser/
    X/Twitter - https://x.com/lukaszkaiser

    FIRSTMARK
    Website - https://firstmark.com
    X/Twitter - https://twitter.com/FirstMarkCap

    Matt Turck (Managing Director)
    Blog - https://mattturck.com
    LinkedIn - https://www.linkedin.com/in/turck/
    X/Twitter - https://twitter.com/mattturck

    (00:00) – Cold open and intro
    (01:29) – “AI slowdown” vs a wild week of new frontier models
    (08:03) – Low-hanging fruit: infra, RL training and better data
    (11:39) – What is a reasoning model, in plain language?
    (17:02) – Chain-of-thought and training the thinking process with RL
    (21:39) – Łukasz’s path: from logic and France to Google and Kurzweil
    (24:20) – Inside the Transformer story and what “attention” really means
    (28:42) – From Google Brain to OpenAI: culture, scale and GPUs
    (32:49) – What’s next for pre-training, GPUs and distillation
    (37:29) – Can we still understand these models? Circuits, sparsity and black boxes
    (39:42) – GPT-4 → GPT-5 → GPT-5.1: what actually changed
    (42:40) – Post-training, safety and teaching GPT-5.1 different tones
    (46:16) – How long should GPT-5.1 think? Reasoning tokens and jagged abilities
    (47:43) – The five-year-old’s dot puzzle that still breaks frontier models
    (52:22) – Generalization, child-like learning and whether reasoning is enough
    (53:48) – Beyond Transformers: ARC, LeCun’s ideas and multimodal bottlenecks
    (56:10) – GPT-5.1 Codex Max, long-running agents and compaction
    (1:00:06) – Will foundation models eat most apps? The translation analogy and trust
    (1:02:34) – What still needs to be solved, and where AI might go next
  • The MAD Podcast with Matt Turck

    Open Source AI Strikes Back — Inside Ai2’s OLMo 3 ‘Thinking"

    2025/11/20 | 1h 28 mins.
    In this special release episode, Matt sits down with Nathan Lambert and Luca Soldaini from Ai2 (the Allen Institute for AI) to break down one of the biggest open-source AI drops of the year: OLMo 3. At a moment when most labs are offering “open weights” and calling it a day, AI2 is doing the opposite — publishing the models, the data, the recipes, and every intermediate checkpoint that shows how the system was built. It’s an unusually transparent look into the inner machinery of a modern frontier-class model.

    Nathan and Luca walk us through the full pipeline — from pre-training and mid-training to long-context extension, SFT, preference tuning, and RLVR. They also explain what a thinking model actually is, why reasoning models have exploded in 2025, and how distillation from DeepSeek and Qwen reasoning models works in practice. If you’ve been trying to truly understand the “RL + reasoning” era of LLMs, this is the clearest explanation you’ll hear.

    We widen the lens to the global picture: why Meta’s retreat from open source created a “vacuum of influence,” how Chinese labs like Qwen, DeepSeek, Kimi, and Moonshot surged into that gap, and why so many U.S. companies are quietly building on Chinese open models today. Nathan and Luca offer a grounded, insider view of whether America can mount an effective open-source response — and what that response needs to look like.

    Finally, we talk about where AI is actually heading. Not the hype, not the doom — but the messy engineering reality behind modern model training, the complexity tax that slows progress, and why the transformation between now and 2030 may be dramatic without ever delivering a single “AGI moment.” If you care about the future of open models and the global AI landscape, this is an essential conversation.

    Allen Institute for AI (AI2)
    Website - https://allenai.org
    X/Twitter - https://x.com/allen_ai

    Nathan Lambert
    Blog - https://www.interconnects.ai
    LinkedIn - https://www.linkedin.com/in/natolambert/
    X/Twitter - https://x.com/natolambert

    Luca Soldaini
    Blog - https://soldaini.net
    LinkedIn - https://www.linkedin.com/in/soldni/
    X/Twitter - https://x.com/soldni

    FIRSTMARK
    Website - https://firstmark.com
    X/Twitter - https://twitter.com/FirstMarkCap

    Matt Turck (Managing Director)
    Blog - https://mattturck.com
    LinkedIn - https://www.linkedin.com/in/turck/
    X/Twitter - https://twitter.com/mattturck

    (00:00) – Cold Open
    (00:39) – Welcome & today’s big announcement
    (01:18) – Introducing the Olmo 3 model family
    (02:07) – What “base models” really are (and why they matter)
    (05:51) – Dolma 3: the data behind Olmo 3
    (08:06) – Performance vs Qwen, Gemma, DeepSeek
    (10:28) – What true open source means (and why it’s rare)
    (12:51) – Intermediate checkpoints, transparency, and why AI2 publishes everything
    (16:37) – Why Qwen is everywhere (including U.S. startups)
    (18:31) – Why Chinese labs go open source (and why U.S. labs don’t)
    (20:28) – Inside ATOM: the U.S. response to China’s model surge
    (22:13) – The rise of “thinking models” and inference-time scaling
    (35:58) – The full Olmo pipeline, explained simply
    (46:52) – Pre-training: data, scale, and avoiding catastrophic spikes
    (50:27) – Mid-training (tail patching) and avoiding test leakage
    (52:06) – Why long-context training matters
    (55:28) – SFT: building the foundation for reasoning
    (1:04:53) – Preference tuning & why DPO still works
    (1:10:51) – The hard part: RLVR, long reasoning chains, and infrastructure pain
    (1:13:59) – Why RL is so technically brutal
    (1:18:17) – Complexity tax vs AGI hype
    (1:21:58) – How everyone can contribute to the future of AI
    (1:27:26) – Closing thoughts

More Technology podcasts

About The MAD Podcast with Matt Turck

The MAD Podcast with Matt Turck, is a series of conversations with leaders from across the Machine Learning, AI, & Data landscape hosted by leading AI & data investor and Partner at FirstMark Capital, Matt Turck.
Podcast website

Listen to The MAD Podcast with Matt Turck, Deep Questions with Cal Newport and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features

The MAD Podcast with Matt Turck: Podcasts in Family

Social
v8.3.0 | © 2007-2026 radio.de GmbH
Generated: 1/23/2026 - 11:39:43 AM