AutoGrad Changed Everything (Not Transformers) [Dr. Jeff Beck]

Machine Learning Street Talk

| Podcasts | December 31, 2025 | 20.3 Thousand views | 1:16:38

TL;DR

Dr. Jeff Beck argues that AutoGrad—not Transformers—enabled the modern AI revolution by turning neural network development into an engineering discipline, but current systems remain limited by function approximation alone; achieving human-like intelligence requires scalable Bayesian models structured like the brain and grounded in the causal physics of the world.

🧠 Bayesian Brain & The Scientific Method 3 insights

Bayesian inference encapsulates scientific inquiry

Beck describes Bayesian methods as the algorithmic embodiment of the scientific method, where new data is compared to existing data to build and test hypotheses through explicit generative models.

The brain performs optimal cue combination

Behavioral experiments show humans and animals integrate sensory information (e.g., visual and auditory cues) based on relative reliability on a trial-by-trial basis, behaving as if they optimally account for uncertainty.

Cognition is primarily filtering, not processing

Approximately 90% of brain function involves deciding what information to ignore; the system uses self-supervised learning to maintain low-level statistical models of the world without conscious perception of all inputs.

⚡ Causality and Modeling Reality 3 insights

Causal models are computational conveniences

Variables like momentum in physics are chosen not because they are directly observed, but because they render models Markovian and computationally tractable; causal structure simplifies prediction and intervention.

Macroscopic causation aligns with affordances

We care about causal relationships at the scale where we can act (macroscopic), not microscopic particle interactions, unless technology (like nuclear engineering) extends our affordances to smaller scales.

Downward causation validates macroscopic variables

A macroscopic description (like 'pressure' or 'culture') is only justified if it exhibits downward causation—meaning the macro-level description summarizes system behavior sufficiently to make microscopic details irrelevant for prediction.

🔧 AutoGrad: The Real AI Revolution 3 insights

AutoGrad transformed AI into an engineering problem

Automatic differentiation allowed researchers to experiment rapidly with architectures, nonlinearities, and memory structures without deriving learning rules by hand, enabling the shift from careful mathematical construction to iterative engineering.

Transformers may not be architecturally special

While often cited as a key breakthrough, Beck argues that scaling alone drives capability; Mamba (a state space model) achieves similar performance to Transformers when scaled, suggesting architecture is less important than previously thought.

Engineering solved backpropagation's theoretical limits

Backprop was historically dismissed due to vanishing gradients and biological implausibility, but engineering tricks (residual connections, normalization) made it viable once experimentation became possible through AutoGrad.

🚀 Beyond Function Approximation 3 insights

Current AI lacks world structure

Modern systems are sophisticated function approximators but lack the structured, causal models of the physical world (objects, relations, affordances) that form the "atomic elements of thought" in biological intelligence.

Active inference needs hyperscaling

While current active inference models remain small "toy" grid-world simulations, new mathematical tools for approximate Bayesian inference now enable building brain-like models that can scale to real-world complexity.

AGI requires mirroring brain and world structure

To move beyond current limitations, AI must incorporate how the brain actually works (Bayesian inference, efficient coding) and how the world is actually structured (causal macroscopic physics), not just scale up pattern matching.

Bottom Line

To achieve human-like intelligence, we must move beyond scaling function approximators and build scalable, structured Bayesian models that mirror both brain architecture and the causal structure of the physical world.

Watch on YouTube

More from Machine Learning Street Talk

He won a Nobel here for AlphaFold. Then he left. - John Jumper

Machine Learning Street Talk

He won a Nobel here for AlphaFold. Then he left. - John Jumper

Nobel laureate John Jumper explains how AlphaFold solved the 50-year protein structure prediction problem by collapsing years of experimental work into minutes, while emphasizing its narrow scope as a starting point for biological research rather than a universal model of life.

2 days ago · 9 points

The Ex-Congressman Who Says AI Isn't Unstoppable — Brad Carson

Machine Learning Street Talk

The Ex-Congressman Who Says AI Isn't Unstoppable — Brad Carson

Former Congressman and Pentagon official Brad Carson argues that AI development is not inevitable and can be controlled through strategic regulation, particularly by treating AI as products subject to liability laws rather than granting them human rights, while leveraging chip controls and mandatory testing to shape the technology's future.

25 days ago · 8 points

Inference, not prediction — Prof. Michael I. Jordan on what modern AI is still missing

Machine Learning Street Talk

Inference, not prediction — Prof. Michael I. Jordan on what modern AI is still missing

Professor Michael I. Jordan critiques the hype around AGI and prediction-based LLMs, arguing that modern AI lacks economic and social thinking; he advocates for 'inference' systems grounded in game theory and market dynamics that respect human agency and create collective value.

about 1 month ago · 9 points

Inference, not prediction — Prof. Michael I. Jordan on what modern AI is still missing

Machine Learning Street Talk

Inference, not prediction — Prof. Michael I. Jordan on what modern AI is still missing

Professor Michael I. Jordan critiques AGI as harmful PR that demoralizes young engineers, arguing that true intelligence requires economic and social systems thinking—treating billions of humans as agents in value-creating ecosystems—rather than isolated prediction engines built without intellectual foundations.

about 1 month ago · 9 points

Browse more: 🎙️ Podcasts All Videos All Categories