Stanford CS25: Transformers United V6 I From Representation Learning to World Modeling

Stanford Online

| Podcasts | April 22, 2026 | 6.28 Thousand views | 1:11:03

TL;DR

This lecture explores JEPA (Joint Embedding Predictive Architecture) as an energy-based framework for world modeling that operates in latent space rather than pixels, with Hazel Nam introducing Causal JEPA—a method using object-centric slot attention and aggressive masking to teach models physical object dynamics and interactions.

Watch on YouTube

More from Stanford Online

Stanford CS25: Transformers United V6 I Overview of Transformers

Stanford Online

Stanford CS25: Transformers United V6 I Overview of Transformers

Stanford's CS25 introductory lecture traces the evolution from hand-engineered features to Transformer architectures, explaining how self-attention mechanisms enable parallel processing and long-context modeling, while exploring how billion-parameter language models develop emergent reasoning capabilities through next-token prediction on internet-scale data.

1 day ago · 10 points

Stanford Robotics Seminar ENGR319 | Spring 2026 | Mechanical Intelligence in Locomotion

Stanford Online

Stanford Robotics Seminar ENGR319 | Spring 2026 | Mechanical Intelligence in Locomotion

This seminar introduces 'missile-scale' robotics (~1kg) as a critical gap between micro and macro robots, demonstrating that mechanical redundancy (morphological intelligence) enables reliable locomotion in unpredictable terrain without sensors by applying Shannon's information theory to legged locomotion, while biological gait-switching strategies can overcome inherent speed limitations.

1 day ago · 10 points

Stanford Robotics Seminar ENGR319 | Spring 2026 | Robot Learning from Human Experience

Stanford Online

Stanford Robotics Seminar ENGR319 | Spring 2026 | Robot Learning from Human Experience

This seminar presents a paradigm shift in robot learning by replacing teleoperation with direct capture of human egocentric experience using wearable sensors, demonstrating that scaling human data—combined with alignment techniques like optimal transport—enables dramatic performance gains and zero-shot task transfer to robots.

3 days ago · 9 points

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 5: GPUs, TPUs

Stanford Online

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 5: GPUs, TPUs

This lecture introduces GPU architecture for language model training, explaining the shift from serial CPU execution to parallel GPU throughput, the critical importance of memory hierarchies, and the SIMT programming model essential for efficient deep learning systems.

3 days ago · 10 points

Browse more: 🎙️ Podcasts All Videos All Categories