The 20-year journey to fully autonomous cars with Dmitri Dolgov of Waymo

| Podcasts | March 24, 2026 | 14.8 Thousand views | 1:02:33

TL;DR

Waymo Co-CEO Dmitri Dolgov details the 20-year technical evolution from Google's self-driving moonshot to 500,000 weekly autonomous rides, explaining why full autonomy requires augmenting end-to-end AI with structured intermediate representations and a 'three teachers' training framework rather than relying solely on scaled-up vision models.

🧠 Technical Architecture & The 'Three Teachers' 3 insights

Multi-modal 360-degree sensing stack

Waymo vehicles use cameras, LiDAR, and radar with full 360-degree coverage, processing all sensor data locally on specialized onboard computers without real-time cloud dependency for safety-critical driving decisions.

Foundation model specialization pipeline

A large off-board foundation model understanding physical and social driving dynamics specializes into three high-capacity 'teachers': the Waymo Driver (backbone), the Simulator (synthetic environment), and the Critic (value judgment system), which are then distilled into smaller models for vehicle inference.

Edge vs. cloud intelligence split

While all real-time driving decisions happen on-device, non-critical post-ride tasks like detecting left items or vehicle cleanliness run on cloud-based models that can request the car return to depot for cleaning.

⚠️ Why Pure End-to-End AI Falls Short 3 insights

The 'talking horse' limitation of VLMs

While off-the-shelf vision-language models fine-tuned to output trajectories can handle nominal driving cases, they remain orders of magnitude away from the safety requirements needed for full autonomy and fail on the long tail of edge cases.

Augmented architecture with structured representations

Waymo combines end-to-end learning with explicit intermediate representations of objects, road geometry, and traffic rules to enable efficient simulation, additional safety validation layers, and reward function specification.

Reinforcement Learning-based Fine-Tuning (RLFT)

Similar to RLHF in LLMs, Waymo uses closed-loop simulation with the Critic model providing reward signals to keep driving behavior in distribution and handle multi-agent social interactions that pure imitation learning cannot capture.

📈 20 Years of Iteration & Scale 3 insights

Beyond scaling laws and dead ends

Achieving full autonomy required specific architectural breakthroughs like transformers, massive compute advances, and domain-specific training recipes rather than simply avoiding wrong paths or relying on data scaling alone.

Hierarchy of optimization goals

The system prioritizes superhuman safety as the primary constraint, followed by smoothness for passenger comfort, predictability for other road users, and social integration into the 'body language' of traffic ecosystems.

Current operational scale

Waymo now provides over 500,000 fully autonomous rides weekly, having evolved from Google's 2009 self-driving project through repeated promotions of Dolgov from early engineer to Co-CEO in 2021.

Bottom Line

Full autonomous driving requires augmenting end-to-end AI with structured intermediate representations and closed-loop simulation to solve the long tail of edge cases, not just scaling up vision-language models.

More from Stripe

View all
A conversation with Manus AI's cofounder and CPO Tao Zhang
38:18
Stripe Stripe

A conversation with Manus AI's cofounder and CPO Tao Zhang

Tao Zhang, cofounder and CPO of Manus AI, explains how their autonomous AI agent went viral by demonstrating executable outcomes rather than chat responses, and shares their unconventional product development approach where functional prototyping precedes design and prompts replace traditional interfaces.

about 21 hours ago · 9 points
Nat Friedman and Daniel Gross in conversation with John and Patrick Collison
55:59
Stripe Stripe

Nat Friedman and Daniel Gross in conversation with John and Patrick Collison

AI leaders Nat Friedman and Daniel Gross join Stripe's Collison brothers to discuss how we're in the 'slow' beginning of the singularity, where human bottlenecks still constrain model improvement but will soon give way to AI self-improvement, creating profound economic uncertainty and a new golden age of personal AI agents that fundamentally alter human-technology relationships.

9 days ago · 10 points
Stripe Sessions 2026 | Indexing the economy
1:04:52
Stripe Stripe

Stripe Sessions 2026 | Indexing the economy

John Collison and Emily Sans present Stripe's economic data revealing a surge in AI-driven business dynamism, debunking myths about a K-shaped recovery while showing how solopreneurs scale faster than ever and commerce shifts toward autonomous agents.

9 days ago · 10 points
Sam Altman in conversation with Patrick Collison
58:47
Stripe Stripe

Sam Altman in conversation with Patrick Collison

Sam Altman discusses the recent 'takeoff' moment in AI capabilities driven by coding models crossing subjective thresholds, while outlining OpenAI's evolution into a low-margin infrastructure provider and sharing untold stories from the secret eight-month period when GPT-4 existed only inside the company.

10 days ago · 10 points