AutoGrad Changed Everything (Not Transformers) [Dr. Jeff Beck]
TL;DR
Dr. Jeff Beck argues that AutoGrad—not Transformers—enabled the modern AI revolution by turning neural network development into an engineering discipline, but current systems remain limited by function approximation alone; achieving human-like intelligence requires scalable Bayesian models structured like the brain and grounded in the causal physics of the world.
🧠 Bayesian Brain & The Scientific Method 3 insights
Bayesian inference encapsulates scientific inquiry
Beck describes Bayesian methods as the algorithmic embodiment of the scientific method, where new data is compared to existing data to build and test hypotheses through explicit generative models.
The brain performs optimal cue combination
Behavioral experiments show humans and animals integrate sensory information (e.g., visual and auditory cues) based on relative reliability on a trial-by-trial basis, behaving as if they optimally account for uncertainty.
Cognition is primarily filtering, not processing
Approximately 90% of brain function involves deciding what information to ignore; the system uses self-supervised learning to maintain low-level statistical models of the world without conscious perception of all inputs.
⚡ Causality and Modeling Reality 3 insights
Causal models are computational conveniences
Variables like momentum in physics are chosen not because they are directly observed, but because they render models Markovian and computationally tractable; causal structure simplifies prediction and intervention.
Macroscopic causation aligns with affordances
We care about causal relationships at the scale where we can act (macroscopic), not microscopic particle interactions, unless technology (like nuclear engineering) extends our affordances to smaller scales.
Downward causation validates macroscopic variables
A macroscopic description (like 'pressure' or 'culture') is only justified if it exhibits downward causation—meaning the macro-level description summarizes system behavior sufficiently to make microscopic details irrelevant for prediction.
🔧 AutoGrad: The Real AI Revolution 3 insights
AutoGrad transformed AI into an engineering problem
Automatic differentiation allowed researchers to experiment rapidly with architectures, nonlinearities, and memory structures without deriving learning rules by hand, enabling the shift from careful mathematical construction to iterative engineering.
Transformers may not be architecturally special
While often cited as a key breakthrough, Beck argues that scaling alone drives capability; Mamba (a state space model) achieves similar performance to Transformers when scaled, suggesting architecture is less important than previously thought.
Engineering solved backpropagation's theoretical limits
Backprop was historically dismissed due to vanishing gradients and biological implausibility, but engineering tricks (residual connections, normalization) made it viable once experimentation became possible through AutoGrad.
🚀 Beyond Function Approximation 3 insights
Current AI lacks world structure
Modern systems are sophisticated function approximators but lack the structured, causal models of the physical world (objects, relations, affordances) that form the "atomic elements of thought" in biological intelligence.
Active inference needs hyperscaling
While current active inference models remain small "toy" grid-world simulations, new mathematical tools for approximate Bayesian inference now enable building brain-like models that can scale to real-world complexity.
AGI requires mirroring brain and world structure
To move beyond current limitations, AI must incorporate how the brain actually works (Bayesian inference, efficient coding) and how the world is actually structured (causal macroscopic physics), not just scale up pattern matching.
Bottom Line
To achieve human-like intelligence, we must move beyond scaling function approximators and build scalable, structured Bayesian models that mirror both brain architecture and the causal structure of the physical world.
More from Machine Learning Street Talk
View all
He won a Nobel here for AlphaFold. Then he left. - John Jumper
Nobel laureate John Jumper explains how AlphaFold solved the 50-year protein structure prediction problem by collapsing years of experimental work into minutes, while emphasizing its narrow scope as a starting point for biological research rather than a universal model of life.
The Ex-Congressman Who Says AI Isn't Unstoppable — Brad Carson
Former Congressman and Pentagon official Brad Carson argues that AI development is not inevitable and can be controlled through strategic regulation, particularly by treating AI as products subject to liability laws rather than granting them human rights, while leveraging chip controls and mandatory testing to shape the technology's future.
Inference, not prediction — Prof. Michael I. Jordan on what modern AI is still missing
Professor Michael I. Jordan critiques the hype around AGI and prediction-based LLMs, arguing that modern AI lacks economic and social thinking; he advocates for 'inference' systems grounded in game theory and market dynamics that respect human agency and create collective value.
Inference, not prediction — Prof. Michael I. Jordan on what modern AI is still missing
Professor Michael I. Jordan critiques AGI as harmful PR that demoralizes young engineers, arguing that true intelligence requires economic and social systems thinking—treating billions of humans as agents in value-creating ecosystems—rather than isolated prediction engines built without intellectual foundations.