Stanford CS221 | Autumn 2025 | Lecture 14: Bayesian Networks and Learning

| Podcasts | March 09, 2026 | 853 views | 1:20:30

TL;DR

This lecture explains how to learn Bayesian network parameters from fully observed data through simple counting and normalization, while reviewing probabilistic inference methods and d-separation rules for determining conditional independence.

🔗 Conditional Independence and Inference 3 insights

D-separation determines conditional independence

Variables are conditionally independent given set C if all paths between them are blocked by C according to three graphical patterns: chains, common causes, and common effects.

Explaining away occurs in V-structures

Conditioning on a common effect (like Alarm) or its descendants makes its parents (Burglary and Earthquake) dependent, creating the explaining away phenomenon.

Inference uses exact or sampling methods

Exact inference marginalizes joint probability tables directly, while approximate algorithms like rejection sampling and Gibbs sampling estimate probabilities through simulation.

📊 Parameter Learning from Complete Data 3 insights

Fully observed setting enables direct counting

When training data contains complete assignments to all variables, parameter estimation reduces to counting occurrences and normalizing into probability distributions.

Local distributions estimate independently

Each node's conditional probability table is learned separately by counting only the relevant parent-child value combinations and ignoring other variables.

Multi-parent nodes require stratified counting

For nodes with multiple parents, maintain separate count tables for each parent value combination and normalize each into a conditional distribution.

🎬 Practical Learning Examples 2 insights

Single variables use frequency counts

Learning a standalone movie rating distribution requires simply counting occurrences of each rating value and dividing by the total number of observations.

Conditional probabilities stratify by parents

To learn P(Rating|Genre), count rating occurrences separately within each genre category (Drama vs. Comedy) and normalize within each group independently.

Bottom Line

When data is fully observed, Bayesian network parameter learning requires no iterative optimization—simply count co-occurrences of variables with their parents and normalize these counts into local conditional probability tables.

More from Stanford Online

View all
Stanford CS153 Frontier Systems | Andreas Blattmann from Black Forest Labs on Visual Intelligence
1:01:14
Stanford Online Stanford Online

Stanford CS153 Frontier Systems | Andreas Blattmann from Black Forest Labs on Visual Intelligence

Andreas Blattmann, co-founder of Black Forest Labs and co-creator of Stable Diffusion, argues that visual intelligence represents the critical next frontier for AI, requiring a fundamental shift from text-centric unimodal models to multimodal systems trained on 'natural representations' (video, audio, physics) to unlock true reasoning, robotics capabilities, and higher intelligence.

5 days ago · 9 points