Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 8 - Trending Topics

Stanford Online

| Podcasts | June 01, 2026 | 27.3 Thousand views | 1:49:34

TL;DR

This final lecture synthesizes the evolution of generative modeling from discrete diffusion to continuous flow matching, emphasizing that by 2026 flow matching—specifically rectified flow variants—has become the industry default for efficient image generation.

🔄 Core Generation Paradigms 3 insights

Diffusion predicts noise to reverse corruption

Diffusion models learn a reverse process by minimizing an L2 loss that estimates Gaussian noise added to images, derived from an evidence lower bound (ELBO) on the data likelihood.

Score matching estimates data distribution gradients

Score-based methods compute the gradient of log probability to navigate from noise to data using Langevin dynamics, avoiding intractable normalizing constants while revealing that the score equals negative noise divided by a coefficient.

Flow matching reframes generation as mass transport

Flow matching treats generation as moving probability mass via vector fields (velocities) from a prior to a target distribution, governed by the continuity equation and numerically solved as an ODE.

📈 Continuous Formulations 3 insights

SDEs unify discrete approaches

Stochastic differential equations generalize discrete noising into continuous forward processes with drift and diffusion terms, where DDPM represents variance-preserving and score networks represent variance-exploding formulations.

Reverse processes require score estimation

The reverse-time SDE depends on the score function, meaning models must estimate this quantity to transform noise back into clean images through either stochastic or deterministic trajectories.

Rectified flow enables faster inference

Rectified flow variants straighten probability paths between distributions, allowing high-quality image generation with significantly fewer numerical integration steps than traditional curved diffusion trajectories.

🎯 Latent Representation and Control 3 insights

VAEs compress and structure latent spaces

Variational autoencoders reduce high-dimensional pixel redundancy into compact latent representations by combining reconstruction loss with KL divergence regularization toward a prior distribution.

Classifier-free guidance strengthens alignment

Guidance techniques interpolate between conditional and unconditional model predictions during inference to enhance alignment between generated images and text prompts without requiring separate classifier training.

Multi-modal encoders bridge vision and language

Architectures like CLIP use contrastive learning to align image and text representations in shared spaces, enabling effective conditioning for text-to-image generation systems.

Bottom Line

Master flow matching with rectified flow paths, as it has become the dominant paradigm for image generation by 2026, offering superior inference efficiency compared to traditional diffusion or score-based methods.

Watch on YouTube

More from Stanford Online

Stanford Robotics Seminar ENGR319 | Spring 2026 | Towards Trustworthy Autonomy

Stanford Online

Stanford Robotics Seminar ENGR319 | Spring 2026 | Towards Trustworthy Autonomy

As learning-based robotics deploy at scale—exemplified by Waymo's 500,000 weekly rides—they face dangerous 'semantic anomalies' where context causes system-level confusion rather than visual novelty. The speaker presents a 'fast and slow' reasoning framework using lightweight embedding models for real-time detection and large language models for safety interventions, enabling trustworthy autonomy without requiring perfect prediction models.

11 days ago · 9 points

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Applications, Coding AI

Stanford Online

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Applications, Coding AI

Vercel founder Guillermo Rauch explains how AI coding agents have expanded the software development market by 10-100x, driving a fundamental shift from traditional web services to 'agentic infrastructure' where tokens replace pixels as the primary commodity and deployment becomes the critical value creator.

25 days ago · 9 points

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Building AI Factories

Stanford Online

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Building AI Factories

Crusoe Energy CEO Chase Lockmiller explains how AI data centers represent history's second-largest infrastructure investment, driven by the economic potential of scalable 'digital labor.' He reveals Crusoe's strategy of building massive AI factories in stranded-power locations like Abilene, Texas, to overcome the industry's critical bottleneck: energized data center capacity.

about 1 month ago · 9 points

AI in Healthcare Series: Inside the Rise of AI in Healthcare, Open Evidence and Cyber Risks

Stanford Online

AI in Healthcare Series: Inside the Rise of AI in Healthcare, Open Evidence and Cyber Risks

Former U.S. Chief Data Scientist DJ Patil warns that healthcare systems are dangerously unprepared for AI-enabled cyberattacks from nation states, while simultaneously seeing rapid democratization of medical knowledge through tools like Open Evidence that are fundamentally reshaping the doctor-patient relationship.

about 1 month ago · 10 points

Browse more: 🎙️ Podcasts All Videos All Categories