Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 15: Mid/Post-Training
TL;DR
This lecture explains how post-training transforms raw pre-trained models like GPT-3 into instruction-following systems like ChatGPT through supervised fine-tuning and reinforcement learning, emphasizing that high-quality data curation matters more than algorithmic sophistication.
🔄 The Post-Training Imperative 2 insights
Bridging the GPT-3 to ChatGPT Gap
Post-training extracts specific instruction-following behaviors from the broad capabilities of pre-trained models, transforming limited-utility base models into reliable assistants capable of complex prompt completion.
The Two-Phase Recipe
The process involves supervised fine-tuning on human demonstrations followed by reinforcement learning alignment to shape model outputs toward human preferences and safety requirements.
📊 Evolution of SFT Data 3 insights
From FLAN to Synthetic Generation
Early approaches like FLAN repurposed existing NLP benchmarks into multitask formats, while modern methods leverage model-generated synthetic data or distill frontier models to create instruction-following datasets.
The Alpaca Breakthrough
Distilling ChatGPT outputs to fine-tune open models demonstrated that chat-style data reliably induces conversational behavior, sparking open-source efforts to replicate proprietary systems.
Shift Toward Agentic Data
Contemporary SFT pipelines have moved beyond simple chat formats to focus on tool use and agentic capabilities, reflecting the current frontier of language model applications.
⚖️ Data Quality and Scale 2 insights
Quality Over Quantity
Unlike pre-training which requires massive scale, SFT benefits more from fewer high-quality examples because strong pre-trained models generalize effectively from limited demonstrations.
Pitfalls of Legacy Datasets
FLAN's construction from existing benchmarks introduced unnatural formatting and hallucinated summaries, illustrating how deficiencies in source data propagate through post-training.
🔒 The Secretive Frontier 2 insights
Data as Trade Secret
Frontier labs now treat post-training data as competitive intelligence, with detailed annotation guidelines and human feedback processes remaining unpublished unlike earlier academic papers.
Distillation vs Human Data
While open-source models often rely on distillation from proprietary systems, frontier labs employ complex human data collection pipelines that represent the primary differentiator in model capabilities.
Bottom Line
In post-training, meticulous data curation outweighs algorithmic complexity, where small amounts of high-quality human demonstrations extract more useful behaviors from pre-trained models than large-scale noisy datasets.
More from Stanford Online
View all
Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 16: Post-Training - RLVR
This lecture explains why RLHF hits overoptimization limits with learned reward models, and how RLVR (Reinforcement Learning from Verifiable Rewards) enables unlimited compute scaling on verifiable tasks like math and coding through simpler algorithms like GRPO.
Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 14: Data
This lecture details the pre-training data pipeline, covering the transformation of raw HTML and PDFs into linear text and classifier-based filtering strategies to curate domain-specific datasets, while emphasizing the strategic trade-off between data quality and training duration.
Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Infrastructure, Capstone Case
Sachin Katti, OpenAI's head of industrial compute, details the infrastructure economics driving the AI supercycle, explaining how the company plans to scale to 30 gigawatts by 2030 while navigating the shift from training to inference-heavy agentic workloads and managing massive energy and supply chain constraints.
Stanford CS25: Transformers United V6 I Advancing Science and Medicine with Collaborative AI Agents
Google DeepMind researcher Vivek Natarajan discusses the development of Co-Scientist, an AI system designed to act as a collaborative partner for scientific discovery by moving beyond fast System 1 thinking to rigorous System 2 reasoning, emphasizing that true scientific AI requires the generality of human cognition rather than narrow specialization.