If You Can't See Inside, How Do You Know It's THINKING? [Dr. Jeff Beck]
TL;DR
Dr. Jeff Beck argues that agency cannot be verified from external behavior alone, requiring instead evidence of internal planning and counterfactual reasoning, while advocating for energy-based models and joint embedding architectures as biologically plausible alternatives to standard function approximation.
đź§ Agency: Internal Mechanisms vs. External Behavior 4 insights
Agency is a spectrum of computational sophistication
Beck posits that no structural line separates agents from objects; instead, agency exists on a continuum defined by the complexity of internal states, the time scales over which they persist, and the context-dependence of policies.
Planning and counterfactuals define true agency
The defining characteristic of an agent is not sophisticated behavior but the capacity for planning and counterfactual reasoning—internally simulating future consequences of actions—which remains invisible to external observers measuring only input-output mappings.
Physical embodiment is necessary for genuine agency
Beck asserts that a high-fidelity simulation of a brain lacks agency unless physically embedded in the world; only through physical interaction and causal disconnection from immediate environmental impulses can something be considered an agent rather than a sophisticated reflex machine.
Agency requires adopting the intentional stance
Following Dennett, we treat systems 'as if' they are agents when modeling them as executing planning yields the most parsimonious explanation, though we can never confirm agency from behavior alone without inspecting internal computations.
⚡ Energy-Based Models and Bayesian Inference 3 insights
EBMs impose constraints on latent variables
Unlike standard neural networks that optimize only weights for input-output mapping, energy-based models apply cost functions to internal states, requiring dual minimization of both hidden state energy and prediction error.
Energy minimization approximates Bayesian inference
Energy functions correspond to log probabilities in Bayesian frameworks, where minimizing energy equals maximum a posteriori (MAP) estimation without expensive normalization; the Free Energy Principle adds an entropy regularization term to this approach.
Test-time training risks methodological inconsistency
Modern approaches that activate latent variable optimization only during deployment (test-time training) are problematic because the base network was trained without accounting for these dynamics, unlike traditional EBMs where latent optimization occurs throughout training.
đź”— Joint Embeddings and Non-Contrastive Learning 3 insights
JEPA prioritizes latent prediction over pixel reconstruction
Joint Embedding Predictive Architectures compress inputs and outputs into abstract representations before prediction, enabling conceptual understanding rather than forcing models to reconstruct irrelevant pixel-level details.
Non-contrastive methods avoid biological implausibility
By eliminating the need for negative sampling and contrastive divergence through regularization techniques (as in Barlow Twins or VICReg), these architectures avoid biologically implausible backpropagation requirements while preventing representational collapse to trivial solutions.
End-to-end embedding learning replaces pre-processing
Rather than using fixed pre-trained encoders (like VAEs) as separate pre-processing stages, modern approaches aim to learn optimal compression jointly with the prediction task, treating embedding discovery as part of the optimization problem.
Bottom Line
When evaluating AI systems, inspect internal planning mechanisms and counterfactual reasoning capabilities rather than relying solely on behavioral sophistication, and prioritize physically embodied architectures that incorporate energy-based constraints and joint embeddings over pure function approximation.
More from Machine Learning Street Talk
View all
Solving the Wrong Problem Works Better - Robert Lange
Robert Lange from Sakana AI explains how evolutionary systems like Shinka Evolve demonstrate that scientific breakthroughs require co-evolving problems and solutions through diverse stepping stones, while current LLMs remain constrained by human-defined objectives and fail to generate autonomous novelty.
"Vibe Coding is a Slot Machine" - Jeremy Howard
Deep learning pioneer Jeremy Howard argues that 'vibe coding' with AI is a dangerous slot machine that produces unmaintainable code through an illusion of control, contrasting it with his philosophy that true software engineering insight emerges from interactive exploration (REPLs/notebooks) and deep engagement with models, drawing on his foundational ULMFiT research to demonstrate how understanding—not gambling—drives sustainable productivity.
What If Intelligence Didn't Evolve? It "Was There" From the Start! - Blaise AgĂĽera y Arcas
Blaise AgĂĽera y Arcas argues that intelligence is not an evolutionary invention but a fundamental physical property that emerges through phase transitions from noise to complex programs, with life representing 'embodied computation' where function, not matter, defines living systems.
Abstraction & Idealization: AI's Plato Problem [Mazviita Chirimuuta]
Mazviita Chirimuuta argues that AI's assumption of discoverable mathematical "source code" underlying messy reality repeats Plato's idealism, warning that scientific abstraction is a practical tool for limited human cognition rather than a window into eternal truths about mind or mechanism.