The 20-year journey to fully autonomous cars with Dmitri Dolgov of Waymo
TL;DR
Waymo Co-CEO Dmitri Dolgov details the 20-year technical evolution from Google's self-driving moonshot to 500,000 weekly autonomous rides, explaining why full autonomy requires augmenting end-to-end AI with structured intermediate representations and a 'three teachers' training framework rather than relying solely on scaled-up vision models.
🧠 Technical Architecture & The 'Three Teachers' 3 insights
Multi-modal 360-degree sensing stack
Waymo vehicles use cameras, LiDAR, and radar with full 360-degree coverage, processing all sensor data locally on specialized onboard computers without real-time cloud dependency for safety-critical driving decisions.
Foundation model specialization pipeline
A large off-board foundation model understanding physical and social driving dynamics specializes into three high-capacity 'teachers': the Waymo Driver (backbone), the Simulator (synthetic environment), and the Critic (value judgment system), which are then distilled into smaller models for vehicle inference.
Edge vs. cloud intelligence split
While all real-time driving decisions happen on-device, non-critical post-ride tasks like detecting left items or vehicle cleanliness run on cloud-based models that can request the car return to depot for cleaning.
⚠️ Why Pure End-to-End AI Falls Short 3 insights
The 'talking horse' limitation of VLMs
While off-the-shelf vision-language models fine-tuned to output trajectories can handle nominal driving cases, they remain orders of magnitude away from the safety requirements needed for full autonomy and fail on the long tail of edge cases.
Augmented architecture with structured representations
Waymo combines end-to-end learning with explicit intermediate representations of objects, road geometry, and traffic rules to enable efficient simulation, additional safety validation layers, and reward function specification.
Reinforcement Learning-based Fine-Tuning (RLFT)
Similar to RLHF in LLMs, Waymo uses closed-loop simulation with the Critic model providing reward signals to keep driving behavior in distribution and handle multi-agent social interactions that pure imitation learning cannot capture.
📈 20 Years of Iteration & Scale 3 insights
Beyond scaling laws and dead ends
Achieving full autonomy required specific architectural breakthroughs like transformers, massive compute advances, and domain-specific training recipes rather than simply avoiding wrong paths or relying on data scaling alone.
Hierarchy of optimization goals
The system prioritizes superhuman safety as the primary constraint, followed by smoothness for passenger comfort, predictability for other road users, and social integration into the 'body language' of traffic ecosystems.
Current operational scale
Waymo now provides over 500,000 fully autonomous rides weekly, having evolved from Google's 2009 self-driving project through repeated promotions of Dolgov from early engineer to Co-CEO in 2021.
Bottom Line
Full autonomous driving requires augmenting end-to-end AI with structured intermediate representations and closed-loop simulation to solve the long tail of edge cases, not just scaling up vision-language models.
More from Stripe
View all
A conversation with Manus AI's cofounder and CPO Tao Zhang
Tao Zhang, cofounder and CPO of Manus AI, explains how their autonomous AI agent went viral by demonstrating executable outcomes rather than chat responses, and shares their unconventional product development approach where functional prototyping precedes design and prompts replace traditional interfaces.
Nat Friedman and Daniel Gross in conversation with John and Patrick Collison
AI leaders Nat Friedman and Daniel Gross join Stripe's Collison brothers to discuss how we're in the 'slow' beginning of the singularity, where human bottlenecks still constrain model improvement but will soon give way to AI self-improvement, creating profound economic uncertainty and a new golden age of personal AI agents that fundamentally alter human-technology relationships.
Stripe Sessions 2026 | Indexing the economy
John Collison and Emily Sans present Stripe's economic data revealing a surge in AI-driven business dynamism, debunking myths about a K-shaped recovery while showing how solopreneurs scale faster than ever and commerce shifts toward autonomous agents.
Sam Altman in conversation with Patrick Collison
Sam Altman discusses the recent 'takeoff' moment in AI capabilities driven by coding models crossing subjective thresholds, while outlining OpenAI's evolution into a low-margin infrastructure provider and sharing untold stories from the secret eight-month period when GPT-4 existed only inside the company.