If You Can't See Inside, How Do You Know It's THINKING? [Dr. Jeff Beck]

Machine Learning Street Talk

| Podcasts | January 25, 2026 | 10.6 Thousand views | 46:57

TL;DR

Dr. Jeff Beck argues that agency cannot be verified from external behavior alone, requiring instead evidence of internal planning and counterfactual reasoning, while advocating for energy-based models and joint embedding architectures as biologically plausible alternatives to standard function approximation.

🧠 Agency: Internal Mechanisms vs. External Behavior 4 insights

Agency is a spectrum of computational sophistication

Beck posits that no structural line separates agents from objects; instead, agency exists on a continuum defined by the complexity of internal states, the time scales over which they persist, and the context-dependence of policies.

Planning and counterfactuals define true agency

The defining characteristic of an agent is not sophisticated behavior but the capacity for planning and counterfactual reasoning—internally simulating future consequences of actions—which remains invisible to external observers measuring only input-output mappings.

Physical embodiment is necessary for genuine agency

Beck asserts that a high-fidelity simulation of a brain lacks agency unless physically embedded in the world; only through physical interaction and causal disconnection from immediate environmental impulses can something be considered an agent rather than a sophisticated reflex machine.

Agency requires adopting the intentional stance

Following Dennett, we treat systems 'as if' they are agents when modeling them as executing planning yields the most parsimonious explanation, though we can never confirm agency from behavior alone without inspecting internal computations.

⚡ Energy-Based Models and Bayesian Inference 3 insights

EBMs impose constraints on latent variables

Unlike standard neural networks that optimize only weights for input-output mapping, energy-based models apply cost functions to internal states, requiring dual minimization of both hidden state energy and prediction error.

Energy minimization approximates Bayesian inference

Energy functions correspond to log probabilities in Bayesian frameworks, where minimizing energy equals maximum a posteriori (MAP) estimation without expensive normalization; the Free Energy Principle adds an entropy regularization term to this approach.

Test-time training risks methodological inconsistency

Modern approaches that activate latent variable optimization only during deployment (test-time training) are problematic because the base network was trained without accounting for these dynamics, unlike traditional EBMs where latent optimization occurs throughout training.

🔗 Joint Embeddings and Non-Contrastive Learning 3 insights

JEPA prioritizes latent prediction over pixel reconstruction

Joint Embedding Predictive Architectures compress inputs and outputs into abstract representations before prediction, enabling conceptual understanding rather than forcing models to reconstruct irrelevant pixel-level details.

Non-contrastive methods avoid biological implausibility

By eliminating the need for negative sampling and contrastive divergence through regularization techniques (as in Barlow Twins or VICReg), these architectures avoid biologically implausible backpropagation requirements while preventing representational collapse to trivial solutions.

End-to-end embedding learning replaces pre-processing

Rather than using fixed pre-trained encoders (like VAEs) as separate pre-processing stages, modern approaches aim to learn optimal compression jointly with the prediction task, treating embedding discovery as part of the optimization problem.

Bottom Line

When evaluating AI systems, inspect internal planning mechanisms and counterfactual reasoning capabilities rather than relying solely on behavioral sophistication, and prioritize physically embodied architectures that incorporate energy-based constraints and joint embeddings over pure function approximation.

Watch on YouTube

More from Machine Learning Street Talk

Why AI's "12-Hour" Task Number Is a Mirage — Beth Barnes & David Rein

Machine Learning Street Talk

Why AI's "12-Hour" Task Number Is a Mirage — Beth Barnes & David Rein

Beth Barnes and David Rein expose critical flaws in current AI benchmarks—such as data contamination, shortcutting, and adversarial selection bias—and propose the 'Time Horizon' framework, which measures AI progress by the length of economically relevant tasks models can complete, providing a more stable foundation for forecasting capabilities and risks.

5 days ago · 9 points

Solving the Wrong Problem Works Better - Robert Lange

Machine Learning Street Talk

Solving the Wrong Problem Works Better - Robert Lange

Robert Lange from Sakana AI explains how evolutionary systems like Shinka Evolve demonstrate that scientific breakthroughs require co-evolving problems and solutions through diverse stepping stones, while current LLMs remain constrained by human-defined objectives and fail to generate autonomous novelty.

about 2 months ago · 8 points

"Vibe Coding is a Slot Machine" - Jeremy Howard

Machine Learning Street Talk

"Vibe Coding is a Slot Machine" - Jeremy Howard

Deep learning pioneer Jeremy Howard argues that 'vibe coding' with AI is a dangerous slot machine that produces unmaintainable code through an illusion of control, contrasting it with his philosophy that true software engineering insight emerges from interactive exploration (REPLs/notebooks) and deep engagement with models, drawing on his foundational ULMFiT research to demonstrate how understanding—not gambling—drives sustainable productivity.

2 months ago · 9 points

What If Intelligence Didn't Evolve? It "Was There" From the Start! - Blaise Agüera y Arcas

Machine Learning Street Talk

What If Intelligence Didn't Evolve? It "Was There" From the Start! - Blaise Agüera y Arcas

Blaise Agüera y Arcas argues that intelligence is not an evolutionary invention but a fundamental physical property that emerges through phase transitions from noise to complex programs, with life representing 'embodied computation' where function, not matter, defines living systems.

3 months ago · 9 points

Browse more: 🎙️ Podcasts All Videos All Categories