Stanford CS221 | Autumn 2025 | Lecture 14: Bayesian Networks and Learning

| Podcasts | March 09, 2026 | 853 views | 1:20:30

TL;DR

This lecture explains how to learn Bayesian network parameters from fully observed data through simple counting and normalization, while reviewing probabilistic inference methods and d-separation rules for determining conditional independence.

🔗 Conditional Independence and Inference 3 insights

D-separation determines conditional independence

Variables are conditionally independent given set C if all paths between them are blocked by C according to three graphical patterns: chains, common causes, and common effects.

Explaining away occurs in V-structures

Conditioning on a common effect (like Alarm) or its descendants makes its parents (Burglary and Earthquake) dependent, creating the explaining away phenomenon.

Inference uses exact or sampling methods

Exact inference marginalizes joint probability tables directly, while approximate algorithms like rejection sampling and Gibbs sampling estimate probabilities through simulation.

📊 Parameter Learning from Complete Data 3 insights

Fully observed setting enables direct counting

When training data contains complete assignments to all variables, parameter estimation reduces to counting occurrences and normalizing into probability distributions.

Local distributions estimate independently

Each node's conditional probability table is learned separately by counting only the relevant parent-child value combinations and ignoring other variables.

Multi-parent nodes require stratified counting

For nodes with multiple parents, maintain separate count tables for each parent value combination and normalize each into a conditional distribution.

🎬 Practical Learning Examples 2 insights

Single variables use frequency counts

Learning a standalone movie rating distribution requires simply counting occurrences of each rating value and dividing by the total number of observations.

Conditional probabilities stratify by parents

To learn P(Rating|Genre), count rating occurrences separately within each genre category (Drama vs. Comedy) and normalize within each group independently.

Bottom Line

When data is fully observed, Bayesian network parameter learning requires no iterative optimization—simply count co-occurrences of variables with their parents and normalize these counts into local conditional probability tables.

More from Stanford Online

View all
Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Building AI Factories
49:48
Stanford Online Stanford Online

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Building AI Factories

Crusoe Energy CEO Chase Lockmiller explains how AI data centers represent history's second-largest infrastructure investment, driven by the economic potential of scalable 'digital labor.' He reveals Crusoe's strategy of building massive AI factories in stranded-power locations like Abilene, Texas, to overcome the industry's critical bottleneck: energized data center capacity.

7 days ago · 9 points
Stanford CS153 Frontier Systems | Scale, AGI, and the Future of Everything
41:10
Stanford Online Stanford Online

Stanford CS153 Frontier Systems | Scale, AGI, and the Future of Everything

Sam Altman explains how AI has fundamentally altered startup economics, enabling small teams to achieve unprecedented scale, while sharing OpenAI's journey from research lab to product company and arguing that pushing systems beyond conventional scaling limits often reveals emergent properties that consensus thinking misses.

9 days ago · 10 points