Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 2 - Score matching

Stanford Online

| Podcasts | April 14, 2026 | 18.4 Thousand views | 1:48:48

TL;DR

This lecture introduces score matching as an alternative to DDPM for generative modeling, where instead of predicting noise directly, models estimate the gradient of log probability density (the 'score') to guide sampling from noise toward data distributions using Langevin dynamics.

📐 The Score Function Fundamentals 3 insights

Score defined as gradient of log probability

The score is defined as ∇ₓ log p(x), representing the direction of steepest ascent toward regions of higher probability density in the data space.

Log probability eliminates normalizing constants

Unlike ∇p(x) which requires knowing the intractable normalizing constant Z, the gradient of log p(x) eliminates Z entirely since ∇ log Z = 0.

Numerical stability in low-density regions

Dividing the gradient by the probability density (as in ∇ log p = ∇p/p) prevents numerical instability when p(x) takes very small values in sparse regions of the data space.

🎯 Sampling via Langevin Dynamics 3 insights

Following scores leads to high-density regions

Starting from random noise, iteratively following the score direction moves samples toward regions of higher probability under the data distribution.

Stochastic sampling ensures diversity

Langevin sampling adds a stochastic noise term to the score-following process, preventing mode collapse and ensuring exploration of the full distribution rather than just its highest-density points.

MCMC method with theoretical guarantees

Langevin dynamics is a Markov Chain Monte Carlo method derived from the Fokker-Planck equation that converges to the true data distribution when the score is known accurately.

🧮 Denoising Score Matching Training 3 insights

Unknown true score requires approximation

Since the true data distribution p_data is unknown, the score cannot be computed directly, necessitating methods to estimate it without explicit knowledge of the density.

Gaussian perturbations provide tractable targets

Adding Gaussian noise to data creates a perturbed distribution q_σ(x̃|x) with an analytically known score equal to -(x̃ - x)/σ², enabling supervised learning.

L2 loss on score predictions

The model s_θ is trained to minimize the squared error between its predicted score and the true score of the noised data, effectively learning a denoising score function.

Bottom Line

Train models to estimate the score function (gradient of log probability) of progressively noised data using denoising score matching, then use Langevin dynamics to sample from noise toward the data distribution without computing intractable normalizing constants.

Watch on YouTube

More from Stanford Online

Stanford Robotics Seminar ENGR319 | Spring 2026 | Towards Trustworthy Autonomy

Stanford Online

Stanford Robotics Seminar ENGR319 | Spring 2026 | Towards Trustworthy Autonomy

As learning-based robotics deploy at scale—exemplified by Waymo's 500,000 weekly rides—they face dangerous 'semantic anomalies' where context causes system-level confusion rather than visual novelty. The speaker presents a 'fast and slow' reasoning framework using lightweight embedding models for real-time detection and large language models for safety interventions, enabling trustworthy autonomy without requiring perfect prediction models.

7 days ago · 9 points

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Applications, Coding AI

Stanford Online

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Applications, Coding AI

Vercel founder Guillermo Rauch explains how AI coding agents have expanded the software development market by 10-100x, driving a fundamental shift from traditional web services to 'agentic infrastructure' where tokens replace pixels as the primary commodity and deployment becomes the critical value creator.

21 days ago · 9 points

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Building AI Factories

Stanford Online

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Building AI Factories

Crusoe Energy CEO Chase Lockmiller explains how AI data centers represent history's second-largest infrastructure investment, driven by the economic potential of scalable 'digital labor.' He reveals Crusoe's strategy of building massive AI factories in stranded-power locations like Abilene, Texas, to overcome the industry's critical bottleneck: energized data center capacity.

27 days ago · 9 points

AI in Healthcare Series: Inside the Rise of AI in Healthcare, Open Evidence and Cyber Risks

Stanford Online

AI in Healthcare Series: Inside the Rise of AI in Healthcare, Open Evidence and Cyber Risks

Former U.S. Chief Data Scientist DJ Patil warns that healthcare systems are dangerously unprepared for AI-enabled cyberattacks from nation states, while simultaneously seeing rapid democratization of medical knowledge through tools like Open Evidence that are fundamentally reshaping the doctor-patient relationship.

29 days ago · 10 points

Browse more: 🎙️ Podcasts All Videos All Categories