Stanford CS221 | Autumn 2025 | Lecture 14: Bayesian Networks and Learning
TL;DR
This lecture explains how to learn Bayesian network parameters from fully observed data through simple counting and normalization, while reviewing probabilistic inference methods and d-separation rules for determining conditional independence.
🔗 Conditional Independence and Inference 3 insights
D-separation determines conditional independence
Variables are conditionally independent given set C if all paths between them are blocked by C according to three graphical patterns: chains, common causes, and common effects.
Explaining away occurs in V-structures
Conditioning on a common effect (like Alarm) or its descendants makes its parents (Burglary and Earthquake) dependent, creating the explaining away phenomenon.
Inference uses exact or sampling methods
Exact inference marginalizes joint probability tables directly, while approximate algorithms like rejection sampling and Gibbs sampling estimate probabilities through simulation.
📊 Parameter Learning from Complete Data 3 insights
Fully observed setting enables direct counting
When training data contains complete assignments to all variables, parameter estimation reduces to counting occurrences and normalizing into probability distributions.
Local distributions estimate independently
Each node's conditional probability table is learned separately by counting only the relevant parent-child value combinations and ignoring other variables.
Multi-parent nodes require stratified counting
For nodes with multiple parents, maintain separate count tables for each parent value combination and normalize each into a conditional distribution.
🎬 Practical Learning Examples 2 insights
Single variables use frequency counts
Learning a standalone movie rating distribution requires simply counting occurrences of each rating value and dividing by the total number of observations.
Conditional probabilities stratify by parents
To learn P(Rating|Genre), count rating occurrences separately within each genre category (Drama vs. Comedy) and normalize within each group independently.
Bottom Line
When data is fully observed, Bayesian network parameter learning requires no iterative optimization—simply count co-occurrences of variables with their parents and normalize these counts into local conditional probability tables.
More from Stanford Online
View all
Stanford CS153 Frontier Systems | Nikhyl Singhal from Skip on Product Management in the AI Era
Nikhyl Singhal argues that product management is evolving from manual information gathering to AI-augmented strategic judgment, requiring PMs to focus on solving genuine customer problems while leveraging AI's ability to synthesize vast customer data streams.
Stanford CS153 Frontier Systems | Amit Jain from Luma AI on Unified Intelligence Systems
Amit Jain details Luma AI's evolution from 3D capture to video generation, revealing how the company learned to build scalable world simulators by designing algorithms around data physics rather than theoretical ideals, ultimately converging on unified intelligence systems that combine language, video, and reasoning.
Stanford CS153 Frontier Systems | Andreas Blattmann from Black Forest Labs on Visual Intelligence
Andreas Blattmann, co-founder of Black Forest Labs and co-creator of Stable Diffusion, argues that visual intelligence represents the critical next frontier for AI, requiring a fundamental shift from text-centric unimodal models to multimodal systems trained on 'natural representations' (video, audio, physics) to unlock true reasoning, robotics capabilities, and higher intelligence.
Stanford CS153 Frontier Systems | Mati Staniszewski from ElevenLabs on The Future of Voice Systems
ElevenLabs CEO Mati Staniszewski explains how the company pivoted from an AI dubbing vision to perfecting text-to-speech by staying close to Discord communities, leveraging open-source research, and running lean to solve the 'one voice' dubbing problem he experienced growing up in Poland.