Stanford CS221 | Autumn 2025 | Lecture 12: Bayesian Networks I

| Podcasts | March 09, 2026 | 857 views | 1:17:36

TL;DR

This lecture transitions from model-free and model-based reinforcement learning to probabilistic reasoning, introducing Bayesian networks as a framework for representing uncertain world states. It establishes probability fundamentals—joint distributions, marginalization, and conditioning—using tensor operations (einops) to provide the mathematical foundation for efficient inference in complex domains.

🧠 Model-Based vs. Model-Free Intelligence 3 insights

Model-free methods are direct but inflexible

Approaches like Q-learning compile rewards and transitions into direct predictions, making it impossible to adapt if the reward function changes without completely retraining the model.

Model-based reasoning enables flexible planning

Understanding how the world works allows agents to recompute optimal policies on the fly when objectives change, whereas Q-values permanently bake in the original reward structure.

Bayesian networks model uncertain world states

The lecture shifts from deterministic search and MDPs to representing the world probabilistically, addressing how agents reason under uncertainty about multiple interrelated state variables.

📊 Probability as Tensor Operations 3 insights

Joint distributions represent complete world states

A joint distribution over random variables assigns probabilities to every possible assignment of values, serving as a comprehensive 'source of truth' for all possible world configurations.

Probability tables are multi-dimensional tensors

Joint distributions map directly to tensors where each axis corresponds to a random variable, enabling efficient computation using linear algebra rather than manual table lookups.

Einops notation expresses all probability laws

Marginalization and other operations can be written compactly using einops (einsum) notation, where summing over axes not present in the output label corresponds to marginalizing out variables.

🔍 Core Inference Operations 3 insights

Marginalization collapses uncertain variables

To ignore a variable, sum probabilities over all assignments that differ only in that variable, effectively collapsing the probability table by removing the corresponding dimension.

Conditioning selects and renormalizes evidence

Observing evidence selects compatible assignments from the joint distribution, then divides by the evidence probability to renormalize, yielding a valid probability distribution over remaining variables.

Inference queries act like SQL on databases

Probabilistic inference treats the joint distribution as a database, allowing queries that specify evidence variables and request probabilities for query variables while automatically marginalizing all unmentioned variables.

Bottom Line

To reason under uncertainty, represent the world as a joint probability distribution over random variables and use marginalization to ignore unknowns and conditioning to incorporate evidence—foundations that Bayesian networks will make computationally tractable for complex domains.

More from Stanford Online

View all
Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Building AI Factories
49:48
Stanford Online Stanford Online

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Building AI Factories

Crusoe Energy CEO Chase Lockmiller explains how AI data centers represent history's second-largest infrastructure investment, driven by the economic potential of scalable 'digital labor.' He reveals Crusoe's strategy of building massive AI factories in stranded-power locations like Abilene, Texas, to overcome the industry's critical bottleneck: energized data center capacity.

7 days ago · 9 points
Stanford CS153 Frontier Systems | Scale, AGI, and the Future of Everything
41:10
Stanford Online Stanford Online

Stanford CS153 Frontier Systems | Scale, AGI, and the Future of Everything

Sam Altman explains how AI has fundamentally altered startup economics, enabling small teams to achieve unprecedented scale, while sharing OpenAI's journey from research lab to product company and arguing that pushing systems beyond conventional scaling limits often reveals emergent properties that consensus thinking misses.

9 days ago · 10 points