If You Can't See Inside, How Do You Know It's THINKING? [Dr. Jeff Beck]
TL;DR
Dr. Jeff Beck argues that agency cannot be verified from external behavior alone, requiring instead evidence of internal planning and counterfactual reasoning, while advocating for energy-based models and joint embedding architectures as biologically plausible alternatives to standard function approximation.
đź§ Agency: Internal Mechanisms vs. External Behavior 4 insights
Agency is a spectrum of computational sophistication
Beck posits that no structural line separates agents from objects; instead, agency exists on a continuum defined by the complexity of internal states, the time scales over which they persist, and the context-dependence of policies.
Planning and counterfactuals define true agency
The defining characteristic of an agent is not sophisticated behavior but the capacity for planning and counterfactual reasoning—internally simulating future consequences of actions—which remains invisible to external observers measuring only input-output mappings.
Physical embodiment is necessary for genuine agency
Beck asserts that a high-fidelity simulation of a brain lacks agency unless physically embedded in the world; only through physical interaction and causal disconnection from immediate environmental impulses can something be considered an agent rather than a sophisticated reflex machine.
Agency requires adopting the intentional stance
Following Dennett, we treat systems 'as if' they are agents when modeling them as executing planning yields the most parsimonious explanation, though we can never confirm agency from behavior alone without inspecting internal computations.
⚡ Energy-Based Models and Bayesian Inference 3 insights
EBMs impose constraints on latent variables
Unlike standard neural networks that optimize only weights for input-output mapping, energy-based models apply cost functions to internal states, requiring dual minimization of both hidden state energy and prediction error.
Energy minimization approximates Bayesian inference
Energy functions correspond to log probabilities in Bayesian frameworks, where minimizing energy equals maximum a posteriori (MAP) estimation without expensive normalization; the Free Energy Principle adds an entropy regularization term to this approach.
Test-time training risks methodological inconsistency
Modern approaches that activate latent variable optimization only during deployment (test-time training) are problematic because the base network was trained without accounting for these dynamics, unlike traditional EBMs where latent optimization occurs throughout training.
đź”— Joint Embeddings and Non-Contrastive Learning 3 insights
JEPA prioritizes latent prediction over pixel reconstruction
Joint Embedding Predictive Architectures compress inputs and outputs into abstract representations before prediction, enabling conceptual understanding rather than forcing models to reconstruct irrelevant pixel-level details.
Non-contrastive methods avoid biological implausibility
By eliminating the need for negative sampling and contrastive divergence through regularization techniques (as in Barlow Twins or VICReg), these architectures avoid biologically implausible backpropagation requirements while preventing representational collapse to trivial solutions.
End-to-end embedding learning replaces pre-processing
Rather than using fixed pre-trained encoders (like VAEs) as separate pre-processing stages, modern approaches aim to learn optimal compression jointly with the prediction task, treating embedding discovery as part of the optimization problem.
Bottom Line
When evaluating AI systems, inspect internal planning mechanisms and counterfactual reasoning capabilities rather than relying solely on behavioral sophistication, and prioritize physically embodied architectures that incorporate energy-based constraints and joint embeddings over pure function approximation.
More from Machine Learning Street Talk
View all
He won a Nobel here for AlphaFold. Then he left. - John Jumper
Nobel laureate John Jumper explains how AlphaFold solved the 50-year protein structure prediction problem by collapsing years of experimental work into minutes, while emphasizing its narrow scope as a starting point for biological research rather than a universal model of life.
The Ex-Congressman Who Says AI Isn't Unstoppable — Brad Carson
Former Congressman and Pentagon official Brad Carson argues that AI development is not inevitable and can be controlled through strategic regulation, particularly by treating AI as products subject to liability laws rather than granting them human rights, while leveraging chip controls and mandatory testing to shape the technology's future.
Inference, not prediction — Prof. Michael I. Jordan on what modern AI is still missing
Professor Michael I. Jordan critiques the hype around AGI and prediction-based LLMs, arguing that modern AI lacks economic and social thinking; he advocates for 'inference' systems grounded in game theory and market dynamics that respect human agency and create collective value.
Inference, not prediction — Prof. Michael I. Jordan on what modern AI is still missing
Professor Michael I. Jordan critiques AGI as harmful PR that demoralizes young engineers, arguing that true intelligence requires economic and social systems thinking—treating billions of humans as agents in value-creating ecosystems—rather than isolated prediction engines built without intellectual foundations.