Stanford CS221 | Autumn 2025 | Lecture 14: Bayesian Networks and Learning
TL;DR
This lecture explains how to learn Bayesian network parameters from fully observed data through simple counting and normalization, while reviewing probabilistic inference methods and d-separation rules for determining conditional independence.
🔗 Conditional Independence and Inference 3 insights
D-separation determines conditional independence
Variables are conditionally independent given set C if all paths between them are blocked by C according to three graphical patterns: chains, common causes, and common effects.
Explaining away occurs in V-structures
Conditioning on a common effect (like Alarm) or its descendants makes its parents (Burglary and Earthquake) dependent, creating the explaining away phenomenon.
Inference uses exact or sampling methods
Exact inference marginalizes joint probability tables directly, while approximate algorithms like rejection sampling and Gibbs sampling estimate probabilities through simulation.
📊 Parameter Learning from Complete Data 3 insights
Fully observed setting enables direct counting
When training data contains complete assignments to all variables, parameter estimation reduces to counting occurrences and normalizing into probability distributions.
Local distributions estimate independently
Each node's conditional probability table is learned separately by counting only the relevant parent-child value combinations and ignoring other variables.
Multi-parent nodes require stratified counting
For nodes with multiple parents, maintain separate count tables for each parent value combination and normalize each into a conditional distribution.
🎬 Practical Learning Examples 2 insights
Single variables use frequency counts
Learning a standalone movie rating distribution requires simply counting occurrences of each rating value and dividing by the total number of observations.
Conditional probabilities stratify by parents
To learn P(Rating|Genre), count rating occurrences separately within each genre category (Drama vs. Comedy) and normalize within each group independently.
Bottom Line
When data is fully observed, Bayesian network parameter learning requires no iterative optimization—simply count co-occurrences of variables with their parents and normalize these counts into local conditional probability tables.
More from Stanford Online
View all
Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Applications, Coding AI
Vercel founder Guillermo Rauch explains how AI coding agents have expanded the software development market by 10-100x, driving a fundamental shift from traditional web services to 'agentic infrastructure' where tokens replace pixels as the primary commodity and deployment becomes the critical value creator.
Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Building AI Factories
Crusoe Energy CEO Chase Lockmiller explains how AI data centers represent history's second-largest infrastructure investment, driven by the economic potential of scalable 'digital labor.' He reveals Crusoe's strategy of building massive AI factories in stranded-power locations like Abilene, Texas, to overcome the industry's critical bottleneck: energized data center capacity.
AI in Healthcare Series: Inside the Rise of AI in Healthcare, Open Evidence and Cyber Risks
Former U.S. Chief Data Scientist DJ Patil warns that healthcare systems are dangerously unprepared for AI-enabled cyberattacks from nation states, while simultaneously seeing rapid democratization of medical knowledge through tools like Open Evidence that are fundamentally reshaping the doctor-patient relationship.
Stanford CS153 Frontier Systems | Scale, AGI, and the Future of Everything
Sam Altman explains how AI has fundamentally altered startup economics, enabling small teams to achieve unprecedented scale, while sharing OpenAI's journey from research lab to product company and arguing that pushing systems beyond conventional scaling limits often reveals emergent properties that consensus thinking misses.