AutoGrad Changed Everything (Not Transformers) [Dr. Jeff Beck]

| Podcasts | December 31, 2025 | 18.9 Thousand views | 1:16:38

TL;DR

Dr. Jeff Beck argues that AutoGrad—not Transformers—enabled the modern AI revolution by turning neural network development into an engineering discipline, but current systems remain limited by function approximation alone; achieving human-like intelligence requires scalable Bayesian models structured like the brain and grounded in the causal physics of the world.

🧠 Bayesian Brain & The Scientific Method 3 insights

Bayesian inference encapsulates scientific inquiry

Beck describes Bayesian methods as the algorithmic embodiment of the scientific method, where new data is compared to existing data to build and test hypotheses through explicit generative models.

The brain performs optimal cue combination

Behavioral experiments show humans and animals integrate sensory information (e.g., visual and auditory cues) based on relative reliability on a trial-by-trial basis, behaving as if they optimally account for uncertainty.

Cognition is primarily filtering, not processing

Approximately 90% of brain function involves deciding what information to ignore; the system uses self-supervised learning to maintain low-level statistical models of the world without conscious perception of all inputs.

Causality and Modeling Reality 3 insights

Causal models are computational conveniences

Variables like momentum in physics are chosen not because they are directly observed, but because they render models Markovian and computationally tractable; causal structure simplifies prediction and intervention.

Macroscopic causation aligns with affordances

We care about causal relationships at the scale where we can act (macroscopic), not microscopic particle interactions, unless technology (like nuclear engineering) extends our affordances to smaller scales.

Downward causation validates macroscopic variables

A macroscopic description (like 'pressure' or 'culture') is only justified if it exhibits downward causation—meaning the macro-level description summarizes system behavior sufficiently to make microscopic details irrelevant for prediction.

🔧 AutoGrad: The Real AI Revolution 3 insights

AutoGrad transformed AI into an engineering problem

Automatic differentiation allowed researchers to experiment rapidly with architectures, nonlinearities, and memory structures without deriving learning rules by hand, enabling the shift from careful mathematical construction to iterative engineering.

Transformers may not be architecturally special

While often cited as a key breakthrough, Beck argues that scaling alone drives capability; Mamba (a state space model) achieves similar performance to Transformers when scaled, suggesting architecture is less important than previously thought.

Engineering solved backpropagation's theoretical limits

Backprop was historically dismissed due to vanishing gradients and biological implausibility, but engineering tricks (residual connections, normalization) made it viable once experimentation became possible through AutoGrad.

🚀 Beyond Function Approximation 3 insights

Current AI lacks world structure

Modern systems are sophisticated function approximators but lack the structured, causal models of the physical world (objects, relations, affordances) that form the "atomic elements of thought" in biological intelligence.

Active inference needs hyperscaling

While current active inference models remain small "toy" grid-world simulations, new mathematical tools for approximate Bayesian inference now enable building brain-like models that can scale to real-world complexity.

AGI requires mirroring brain and world structure

To move beyond current limitations, AI must incorporate how the brain actually works (Bayesian inference, efficient coding) and how the world is actually structured (causal macroscopic physics), not just scale up pattern matching.

Bottom Line

To achieve human-like intelligence, we must move beyond scaling function approximators and build scalable, structured Bayesian models that mirror both brain architecture and the causal structure of the physical world.

More from Machine Learning Street Talk

View all
Solving the Wrong Problem Works Better - Robert Lange
1:18:07
Machine Learning Street Talk Machine Learning Street Talk

Solving the Wrong Problem Works Better - Robert Lange

Robert Lange from Sakana AI explains how evolutionary systems like Shinka Evolve demonstrate that scientific breakthroughs require co-evolving problems and solutions through diverse stepping stones, while current LLMs remain constrained by human-defined objectives and fail to generate autonomous novelty.

12 days ago · 8 points
"Vibe Coding is a Slot Machine" - Jeremy Howard
1:26:40
Machine Learning Street Talk Machine Learning Street Talk

"Vibe Coding is a Slot Machine" - Jeremy Howard

Deep learning pioneer Jeremy Howard argues that 'vibe coding' with AI is a dangerous slot machine that produces unmaintainable code through an illusion of control, contrasting it with his philosophy that true software engineering insight emerges from interactive exploration (REPLs/notebooks) and deep engagement with models, drawing on his foundational ULMFiT research to demonstrate how understanding—not gambling—drives sustainable productivity.

22 days ago · 9 points
If You Can't See Inside, How Do You Know It's THINKING? [Dr. Jeff Beck]
46:57
Machine Learning Street Talk Machine Learning Street Talk

If You Can't See Inside, How Do You Know It's THINKING? [Dr. Jeff Beck]

Dr. Jeff Beck argues that agency cannot be verified from external behavior alone, requiring instead evidence of internal planning and counterfactual reasoning, while advocating for energy-based models and joint embedding architectures as biologically plausible alternatives to standard function approximation.

about 2 months ago · 10 points