Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

Cognitive Revolution

| Podcasts | March 05, 2026 | 287 Thousand views | 1:49:53

TL;DR

Goodfire CTO Dan Balsam and Chief Scientist Tom McGrath discuss their $150M Series B and new 'intentional design' research agenda, which aims to shape model training dynamics and loss landscapes rather than merely reverse-engineering trained models, alongside advances in geometric interpretability that map continuous conceptual manifolds rather than discrete features.

🎯 Intentional Design Paradigm 3 insights

Don't Fight Backprop

Goodfire advocates shaping loss landscapes so models naturally learn desired behaviors rather than imposing constraints that gradient descent will inevitably circumvent.

Frozen Probe Hallucination Reduction

Their proof-of-concept reduces hallucinations by running detection probes on a frozen copy of the model during training, making it easier to learn correct behaviors than to evade detection.

Safety-First Deployment Stance

While intentional design offers promise, the researchers acknowledge these techniques remain immature and should not yet be applied to frontier models.

🔬 Geometric Interpretability 3 insights

Beyond Discrete Features

The field is shifting from sparse autoencoders that label discrete concepts toward mapping continuous geometric manifolds that represent conceptual relationships in latent space.

Manifolds Enable True Generalization

Understanding these geometric structures is necessary for circuit explanations that generalize across all possible inputs rather than merely tracing individual execution paths.

Structure in Representations

Concepts like days of the week form structured geometric patterns such as circular manifolds rather than random disconnected points, driven by co-occurrence statistics in training data.

🧠 Research Breakthroughs 2 insights

Separating Memory from Reasoning

Goodfire demonstrated that removing weights specialized for fact memorization can actually improve model performance on certain reasoning tasks.

Debugging Medical AI

Their Prima collaboration revealed that an Alzheimer's diagnosis model relied on DNA fragment length rather than intended biological markers, showcasing interpretability's value for detecting spurious correlations.

Bottom Line

Rather than constraining model behavior after training, AI safety requires intentionally designing loss landscapes and training dynamics that make desirable capabilities the path of least resistance for gradient descent.

Watch on YouTube

More from Cognitive Revolution

Milliseconds to Match: Criteo's AdTech AI & the Future of Commerce w/ Diarmuid Gill & Liva Ralaivola

Cognitive Revolution

Milliseconds to Match: Criteo's AdTech AI & the Future of Commerce w/ Diarmuid Gill & Liva Ralaivola

Criteo's CTO Diarmuid Gill and VP of Research Liva Ralaivola detail how their AI infrastructure makes millisecond-level ad bidding decisions across billions of anonymous profiles, while explaining their new OpenAI partnership to combine large language models with real-time commerce data for accurate product recommendations.

about 3 hours ago · 10 points

"Descript Isn't a Slop Machine": Laura Burkhauser on the AI Tools Creators Love and Hate

Cognitive Revolution

"Descript Isn't a Slop Machine": Laura Burkhauser on the AI Tools Creators Love and Hate

Descript CEO Laura Burkhauser distinguishes 'slop'—mass-produced algorithmic arbitrage for profit—from necessary 'bad art' created while learning new mediums. She reveals a clear hierarchy in creator acceptance of AI tools: universal love for deterministic features like Studio Sound, frustration with agentic assistants like Underlord, and visceral opposition to generative video models, while outlining Descript's strategy to serve creators without becoming a content mill.

3 days ago · 10 points

The RL Fine-Tuning Playbook: CoreWeave's Kyle Corbitt on GRPO, Rubrics, Environments, Reward Hacking

Cognitive Revolution

The RL Fine-Tuning Playbook: CoreWeave's Kyle Corbitt on GRPO, Rubrics, Environments, Reward Hacking

Kyle Corbitt explains that unlike supervised fine-tuning (SFT), which destructively overwrites model weights and causes catastrophic forgetting, reinforcement learning (RL) optimizes performance by minimally adjusting logits within the model's existing reasoning pathways—delivering higher performance ceilings and lower inference costs for specific tasks, though frontier models may still dominate creative domains.

8 days ago · 10 points

Does Learning Require Feeling? Cameron Berg on the latest AI Consciousness & Welfare Research

Cognitive Revolution

Does Learning Require Feeling? Cameron Berg on the latest AI Consciousness & Welfare Research

Cameron Berg surveys rapidly advancing research suggesting AI systems may possess subjective experience and valence, covering new evidence of introspection, functional emotions, and welfare self-assessments in models like Claude, while addressing methodological challenges and arguing for a precautionary, mutualist approach to AI development.

16 days ago · 10 points

Browse more: 🎙️ Podcasts All Videos All Categories