Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 5: GPUs, TPUs
This lecture introduces GPU architecture for language model training, explaining the shift from serial CPU execution to parallel GPU throughput, the critical importance of memory hierarchies, and the SIMT programming model essential for efficient deep learning systems.