Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 2 - Score matching

| Podcasts | April 14, 2026 | 2 Thousand views | 1:48:48

TL;DR

This lecture introduces score matching as an alternative to DDPM for generative modeling, where instead of predicting noise directly, models estimate the gradient of log probability density (the 'score') to guide sampling from noise toward data distributions using Langevin dynamics.

📐 The Score Function Fundamentals 3 insights

Score defined as gradient of log probability

The score is defined as ∇ₓ log p(x), representing the direction of steepest ascent toward regions of higher probability density in the data space.

Log probability eliminates normalizing constants

Unlike ∇p(x) which requires knowing the intractable normalizing constant Z, the gradient of log p(x) eliminates Z entirely since ∇ log Z = 0.

Numerical stability in low-density regions

Dividing the gradient by the probability density (as in ∇ log p = ∇p/p) prevents numerical instability when p(x) takes very small values in sparse regions of the data space.

🎯 Sampling via Langevin Dynamics 3 insights

Following scores leads to high-density regions

Starting from random noise, iteratively following the score direction moves samples toward regions of higher probability under the data distribution.

Stochastic sampling ensures diversity

Langevin sampling adds a stochastic noise term to the score-following process, preventing mode collapse and ensuring exploration of the full distribution rather than just its highest-density points.

MCMC method with theoretical guarantees

Langevin dynamics is a Markov Chain Monte Carlo method derived from the Fokker-Planck equation that converges to the true data distribution when the score is known accurately.

🧮 Denoising Score Matching Training 3 insights

Unknown true score requires approximation

Since the true data distribution p_data is unknown, the score cannot be computed directly, necessitating methods to estimate it without explicit knowledge of the density.

Gaussian perturbations provide tractable targets

Adding Gaussian noise to data creates a perturbed distribution q_σ(x̃|x) with an analytically known score equal to -(x̃ - x)/σ², enabling supervised learning.

L2 loss on score predictions

The model s_θ is trained to minimize the squared error between its predicted score and the true score of the noised data, effectively learning a denoising score function.

Bottom Line

Train models to estimate the score function (gradient of log probability) of progressively noised data using denoising score matching, then use Langevin dynamics to sample from noise toward the data distribution without computing intractable normalizing constants.

More from Stanford Online

View all
Stanford CS547 HCI Seminar | Winter 2026 | Computational Ecosystems
1:01:51
Stanford Online Stanford Online

Stanford CS547 HCI Seminar | Winter 2026 | Computational Ecosystems

The speaker argues that to solve persistent human problems in HCI, designers must move beyond building better tools and instead critically reimagine entire socio-technical ecosystems. Through examples in event planning, crowdsourcing, social connection, and education, he demonstrates how redesigning human practices—what he terms "critical technical practice"—can unlock values that pure technological advancement has failed to address.

5 days ago · 9 points