The 20-year journey to fully autonomous cars with Dmitri Dolgov of Waymo

| Podcasts | March 24, 2026 | 4.51 Thousand views | 1:02:33

TL;DR

Waymo Co-CEO Dmitri Dolgov details the 20-year technical evolution from Google's self-driving moonshot to 500,000 weekly autonomous rides, explaining why full autonomy requires augmenting end-to-end AI with structured intermediate representations and a 'three teachers' training framework rather than relying solely on scaled-up vision models.

🧠 Technical Architecture & The 'Three Teachers' 3 insights

Multi-modal 360-degree sensing stack

Waymo vehicles use cameras, LiDAR, and radar with full 360-degree coverage, processing all sensor data locally on specialized onboard computers without real-time cloud dependency for safety-critical driving decisions.

Foundation model specialization pipeline

A large off-board foundation model understanding physical and social driving dynamics specializes into three high-capacity 'teachers': the Waymo Driver (backbone), the Simulator (synthetic environment), and the Critic (value judgment system), which are then distilled into smaller models for vehicle inference.

Edge vs. cloud intelligence split

While all real-time driving decisions happen on-device, non-critical post-ride tasks like detecting left items or vehicle cleanliness run on cloud-based models that can request the car return to depot for cleaning.

⚠️ Why Pure End-to-End AI Falls Short 3 insights

The 'talking horse' limitation of VLMs

While off-the-shelf vision-language models fine-tuned to output trajectories can handle nominal driving cases, they remain orders of magnitude away from the safety requirements needed for full autonomy and fail on the long tail of edge cases.

Augmented architecture with structured representations

Waymo combines end-to-end learning with explicit intermediate representations of objects, road geometry, and traffic rules to enable efficient simulation, additional safety validation layers, and reward function specification.

Reinforcement Learning-based Fine-Tuning (RLFT)

Similar to RLHF in LLMs, Waymo uses closed-loop simulation with the Critic model providing reward signals to keep driving behavior in distribution and handle multi-agent social interactions that pure imitation learning cannot capture.

📈 20 Years of Iteration & Scale 3 insights

Beyond scaling laws and dead ends

Achieving full autonomy required specific architectural breakthroughs like transformers, massive compute advances, and domain-specific training recipes rather than simply avoiding wrong paths or relying on data scaling alone.

Hierarchy of optimization goals

The system prioritizes superhuman safety as the primary constraint, followed by smoothness for passenger comfort, predictability for other road users, and social integration into the 'body language' of traffic ecosystems.

Current operational scale

Waymo now provides over 500,000 fully autonomous rides weekly, having evolved from Google's 2009 self-driving project through repeated promotions of Dolgov from early engineer to Co-CEO in 2021.

Bottom Line

Full autonomous driving requires augmenting end-to-end AI with structured intermediate representations and closed-loop simulation to solve the long tail of edge cases, not just scaling up vision-language models.

More from Stripe

View all
Bret Taylor of Sierra on AI agents, outcome-based pricing, and the OpenAI board
1:41:42
Stripe Stripe

Bret Taylor of Sierra on AI agents, outcome-based pricing, and the OpenAI board

Bret Taylor explores how AI agents are shifting from polished but forgetful tools to messy, context-rich systems that leverage markdown memory and code repository structures, predicting software engineering will evolve from writing code to crafting 'harnesses' of documentation while enterprises move beyond APIs toward agent-accessible infrastructure.

15 days ago · 9 points
Garrett Langley of Flock Safety on building technology to solve crime
1:44:46
Stripe Stripe

Garrett Langley of Flock Safety on building technology to solve crime

Garrett Langley explains how Flock Safety grew from a neighborhood project solving car break-ins to a $500M ARR company serving 6,000+ cities by building solar-powered license plate cameras, AI search tools, and drones that help law enforcement clear over one million crimes annually through real-time data coordination.

20 days ago · 10 points
Reiner Pope of MatX on accelerating AI with transformer-optimized chips
1:13:18
Stripe Stripe

Reiner Pope of MatX on accelerating AI with transformer-optimized chips

Former Google TPU architect Reiner Pope explains how Google's early research and custom silicon investments enabled its AI resurgence, while outlining MatX's strategy to build transformer-specific chips that solve the latency-throughput trade-off through a hybrid memory architecture, requiring $500M and massive supply chain scale to compete with incumbents.

27 days ago · 9 points