Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 15: Mid/Post-Training

Stanford Online

| Podcasts | May 27, 2026 | 17.3 Thousand views | 1:19:55

TL;DR

This lecture explains how post-training transforms raw pre-trained models like GPT-3 into instruction-following systems like ChatGPT through supervised fine-tuning and reinforcement learning, emphasizing that high-quality data curation matters more than algorithmic sophistication.

🔄 The Post-Training Imperative 2 insights

Bridging the GPT-3 to ChatGPT Gap

Post-training extracts specific instruction-following behaviors from the broad capabilities of pre-trained models, transforming limited-utility base models into reliable assistants capable of complex prompt completion.

The Two-Phase Recipe

The process involves supervised fine-tuning on human demonstrations followed by reinforcement learning alignment to shape model outputs toward human preferences and safety requirements.

📊 Evolution of SFT Data 3 insights

From FLAN to Synthetic Generation

Early approaches like FLAN repurposed existing NLP benchmarks into multitask formats, while modern methods leverage model-generated synthetic data or distill frontier models to create instruction-following datasets.

The Alpaca Breakthrough

Distilling ChatGPT outputs to fine-tune open models demonstrated that chat-style data reliably induces conversational behavior, sparking open-source efforts to replicate proprietary systems.

Shift Toward Agentic Data

Contemporary SFT pipelines have moved beyond simple chat formats to focus on tool use and agentic capabilities, reflecting the current frontier of language model applications.

⚖️ Data Quality and Scale 2 insights

Quality Over Quantity

Unlike pre-training which requires massive scale, SFT benefits more from fewer high-quality examples because strong pre-trained models generalize effectively from limited demonstrations.

Pitfalls of Legacy Datasets

FLAN's construction from existing benchmarks introduced unnatural formatting and hallucinated summaries, illustrating how deficiencies in source data propagate through post-training.

🔒 The Secretive Frontier 2 insights

Data as Trade Secret

Frontier labs now treat post-training data as competitive intelligence, with detailed annotation guidelines and human feedback processes remaining unpublished unlike earlier academic papers.

Distillation vs Human Data

While open-source models often rely on distillation from proprietary systems, frontier labs employ complex human data collection pipelines that represent the primary differentiator in model capabilities.

Bottom Line

In post-training, meticulous data curation outweighs algorithmic complexity, where small amounts of high-quality human demonstrations extract more useful behaviors from pre-trained models than large-scale noisy datasets.

Watch on YouTube

More from Stanford Online

Stanford Robotics Seminar ENGR319 | Spring 2026 | Towards Trustworthy Autonomy

Stanford Online

Stanford Robotics Seminar ENGR319 | Spring 2026 | Towards Trustworthy Autonomy

As learning-based robotics deploy at scale—exemplified by Waymo's 500,000 weekly rides—they face dangerous 'semantic anomalies' where context causes system-level confusion rather than visual novelty. The speaker presents a 'fast and slow' reasoning framework using lightweight embedding models for real-time detection and large language models for safety interventions, enabling trustworthy autonomy without requiring perfect prediction models.

4 days ago · 9 points

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Applications, Coding AI

Stanford Online

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Applications, Coding AI

Vercel founder Guillermo Rauch explains how AI coding agents have expanded the software development market by 10-100x, driving a fundamental shift from traditional web services to 'agentic infrastructure' where tokens replace pixels as the primary commodity and deployment becomes the critical value creator.

18 days ago · 9 points

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Building AI Factories

Stanford Online

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Building AI Factories

Crusoe Energy CEO Chase Lockmiller explains how AI data centers represent history's second-largest infrastructure investment, driven by the economic potential of scalable 'digital labor.' He reveals Crusoe's strategy of building massive AI factories in stranded-power locations like Abilene, Texas, to overcome the industry's critical bottleneck: energized data center capacity.

25 days ago · 9 points

AI in Healthcare Series: Inside the Rise of AI in Healthcare, Open Evidence and Cyber Risks

Stanford Online

AI in Healthcare Series: Inside the Rise of AI in Healthcare, Open Evidence and Cyber Risks

Former U.S. Chief Data Scientist DJ Patil warns that healthcare systems are dangerously unprepared for AI-enabled cyberattacks from nation states, while simultaneously seeing rapid democratization of medical knowledge through tools like Open Evidence that are fundamentally reshaping the doctor-patient relationship.

26 days ago · 10 points

Browse more: 🎙️ Podcasts All Videos All Categories