The 20-year journey to fully autonomous cars with Dmitri Dolgov of Waymo
TL;DR
Waymo Co-CEO Dmitri Dolgov details the 20-year technical evolution from Google's self-driving moonshot to 500,000 weekly autonomous rides, explaining why full autonomy requires augmenting end-to-end AI with structured intermediate representations and a 'three teachers' training framework rather than relying solely on scaled-up vision models.
🧠 Technical Architecture & The 'Three Teachers' 3 insights
Multi-modal 360-degree sensing stack
Waymo vehicles use cameras, LiDAR, and radar with full 360-degree coverage, processing all sensor data locally on specialized onboard computers without real-time cloud dependency for safety-critical driving decisions.
Foundation model specialization pipeline
A large off-board foundation model understanding physical and social driving dynamics specializes into three high-capacity 'teachers': the Waymo Driver (backbone), the Simulator (synthetic environment), and the Critic (value judgment system), which are then distilled into smaller models for vehicle inference.
Edge vs. cloud intelligence split
While all real-time driving decisions happen on-device, non-critical post-ride tasks like detecting left items or vehicle cleanliness run on cloud-based models that can request the car return to depot for cleaning.
⚠️ Why Pure End-to-End AI Falls Short 3 insights
The 'talking horse' limitation of VLMs
While off-the-shelf vision-language models fine-tuned to output trajectories can handle nominal driving cases, they remain orders of magnitude away from the safety requirements needed for full autonomy and fail on the long tail of edge cases.
Augmented architecture with structured representations
Waymo combines end-to-end learning with explicit intermediate representations of objects, road geometry, and traffic rules to enable efficient simulation, additional safety validation layers, and reward function specification.
Reinforcement Learning-based Fine-Tuning (RLFT)
Similar to RLHF in LLMs, Waymo uses closed-loop simulation with the Critic model providing reward signals to keep driving behavior in distribution and handle multi-agent social interactions that pure imitation learning cannot capture.
📈 20 Years of Iteration & Scale 3 insights
Beyond scaling laws and dead ends
Achieving full autonomy required specific architectural breakthroughs like transformers, massive compute advances, and domain-specific training recipes rather than simply avoiding wrong paths or relying on data scaling alone.
Hierarchy of optimization goals
The system prioritizes superhuman safety as the primary constraint, followed by smoothness for passenger comfort, predictability for other road users, and social integration into the 'body language' of traffic ecosystems.
Current operational scale
Waymo now provides over 500,000 fully autonomous rides weekly, having evolved from Google's 2009 self-driving project through repeated promotions of Dolgov from early engineer to Co-CEO in 2021.
Bottom Line
Full autonomous driving requires augmenting end-to-end AI with structured intermediate representations and closed-loop simulation to solve the long tail of edge cases, not just scaling up vision-language models.
More from Stripe
View all
Creating prediction markets (and suing the CFTC) with Tarek Mansour and Luana Lopes Lara
Kalshi founders Tarek Mansour and Luana Lopes Lara recount their four-year battle to launch the first CFTC-regulated prediction market in the US, culminating in a lawsuit against their own regulator to offer election contracts, and why their 'permission-first' approach ultimately enabled $10+ billion monthly volumes.
Bret Taylor of Sierra on AI agents, outcome-based pricing, and the OpenAI board
Bret Taylor explores how AI agents are shifting from polished but forgetful tools to messy, context-rich systems that leverage markdown memory and code repository structures, predicting software engineering will evolve from writing code to crafting 'harnesses' of documentation while enterprises move beyond APIs toward agent-accessible infrastructure.
Garrett Langley of Flock Safety on building technology to solve crime
Garrett Langley explains how Flock Safety grew from a neighborhood project solving car break-ins to a $500M ARR company serving 6,000+ cities by building solar-powered license plate cameras, AI search tools, and drones that help law enforcement clear over one million crimes annually through real-time data coordination.
Reiner Pope of MatX on accelerating AI with transformer-optimized chips
Former Google TPU architect Reiner Pope explains how Google's early research and custom silicon investments enabled its AI resurgence, while outlining MatX's strategy to build transformer-specific chips that solve the latency-throughput trade-off through a hybrid memory architecture, requiring $500M and massive supply chain scale to compete with incumbents.