The 20-year journey to fully autonomous cars with Dmitri Dolgov of Waymo
TL;DR
Waymo Co-CEO Dmitri Dolgov details the 20-year technical evolution from Google's self-driving moonshot to 500,000 weekly autonomous rides, explaining why full autonomy requires augmenting end-to-end AI with structured intermediate representations and a 'three teachers' training framework rather than relying solely on scaled-up vision models.
🧠 Technical Architecture & The 'Three Teachers' 3 insights
Multi-modal 360-degree sensing stack
Waymo vehicles use cameras, LiDAR, and radar with full 360-degree coverage, processing all sensor data locally on specialized onboard computers without real-time cloud dependency for safety-critical driving decisions.
Foundation model specialization pipeline
A large off-board foundation model understanding physical and social driving dynamics specializes into three high-capacity 'teachers': the Waymo Driver (backbone), the Simulator (synthetic environment), and the Critic (value judgment system), which are then distilled into smaller models for vehicle inference.
Edge vs. cloud intelligence split
While all real-time driving decisions happen on-device, non-critical post-ride tasks like detecting left items or vehicle cleanliness run on cloud-based models that can request the car return to depot for cleaning.
⚠️ Why Pure End-to-End AI Falls Short 3 insights
The 'talking horse' limitation of VLMs
While off-the-shelf vision-language models fine-tuned to output trajectories can handle nominal driving cases, they remain orders of magnitude away from the safety requirements needed for full autonomy and fail on the long tail of edge cases.
Augmented architecture with structured representations
Waymo combines end-to-end learning with explicit intermediate representations of objects, road geometry, and traffic rules to enable efficient simulation, additional safety validation layers, and reward function specification.
Reinforcement Learning-based Fine-Tuning (RLFT)
Similar to RLHF in LLMs, Waymo uses closed-loop simulation with the Critic model providing reward signals to keep driving behavior in distribution and handle multi-agent social interactions that pure imitation learning cannot capture.
📈 20 Years of Iteration & Scale 3 insights
Beyond scaling laws and dead ends
Achieving full autonomy required specific architectural breakthroughs like transformers, massive compute advances, and domain-specific training recipes rather than simply avoiding wrong paths or relying on data scaling alone.
Hierarchy of optimization goals
The system prioritizes superhuman safety as the primary constraint, followed by smoothness for passenger comfort, predictability for other road users, and social integration into the 'body language' of traffic ecosystems.
Current operational scale
Waymo now provides over 500,000 fully autonomous rides weekly, having evolved from Google's 2009 self-driving project through repeated promotions of Dolgov from early engineer to Co-CEO in 2021.
Bottom Line
Full autonomous driving requires augmenting end-to-end AI with structured intermediate representations and closed-loop simulation to solve the long tail of edge cases, not just scaling up vision-language models.
More from Stripe
View all
A conversation with Alan cofounder and CTO Charles Gorintin
Charles Gorintin, CTO of Alan, recounts the company's decade-long journey from Silicon Valley roots to becoming a European healthtech leader with 4 million members, detailing their strategy of aggressive early internationalization, AI transformation through the medical agent MO, and the strategic imperative of building European technological sovereignty via Mistral.
Barney Hussey-Yeo in conversation with John Collison
Cleo founder Barney Hussey-Yeo discusses building an AI financial assistant since 2016, leveraging humor and proactive agentic technology to optimize financial decisions for the 99% of consumers living paycheck to paycheck, while arguing that vertical AI agents will outperform general LLMs in specialized domains like personal finance.
10 Years of Stripe France: The tech renaissance and what’s next
French tech leaders reflect on the ecosystem's transformation from early 2000s corporate culture to today's AI-driven renaissance, highlighting how reduced capital barriers and improved infrastructure are reshaping entrepreneurship.
Stripe Sessions 2026 | Keynote
Stripe Sessions 2026 marked the company's most ambitious product launch day in history, centered on building economic infrastructure for the AI era. The keynote revealed a parabolic spike in new business formation since January 2026 and introduced tools including the Machine Payment Protocol, Link wallet for agents, and Stripe Projects to enable autonomous agent-to-agent commerce.