Robots Are Finally Starting to Work

| Business & Entrepreneurship | April 16, 2026 | 40.1 Thousand views | 49:27

TL;DR

Physical Intelligence co-founder Quan Vang explains why robotics is approaching its 'GPT-1 moment,' where cross-embodiment AI models trained on diverse hardware are beginning to exhibit emergent zero-shot capabilities and scaling laws previously unseen in the field.

🤖 The Cross-Embodiment Breakthrough 3 insights

Generalist models outperform specialists

The Open X-Embodiment paper demonstrated that a single policy trained across 10 different robot platforms performed 50% better than individual models optimized for specific hardware, proving scaling laws apply to robotics.

Cross-embodiment solves hardware drift

Unlike single-robot approaches that suffer from hardware variations and software changes every few months, training on diverse platforms forces models to learn abstract 'robot control' rather than specific motor mappings.

Emergent zero-shot capabilities appearing

Current models can now perform complex precision tasks and multi-object reasoning zero-shot—capabilities that required hundreds of hours of data collection just last year.

🧠 From Language Models to Physical Control 3 insights

Vision-language models enable semantic control

Research like RT2 and PaLM-E showed that adapting powerful vision-language models to output robot actions transfers internet-scale knowledge to low-level control, allowing commands like 'move the can to Taylor Swift' without specific training.

SayCan bridged planning and semantics

SayCan was the first demonstration of using language models for robotic planning, leveraging common sense knowledge to reduce the need for task-specific robot data.

The three-pillar hierarchy

Robotics requires solving semantics (now unlocked via LLMs), planning, and real-time control—the latter being the final frontier where vision-language models are now making breakthroughs.

📊 Deployment Strategy & Data Reality 3 insights

The data capture problem vs. generation problem

While robotic data is constantly generated in labs and industry, there has been no incentive to capture it in standardized formats; Open X-Embodiment was the first effort to aggregate multi-robot data but remains 'a drop in the bucket' compared to what's needed.

Mixed autonomy enables immediate scaling

Current systems are deployable today using a 'peeling an onion' approach: deploy a strong base model with human oversight for edge cases, then improve through real-world exposure rather than waiting for full autonomy.

Real-world validation with YC companies

Physical Intelligence partnered with Weave to deploy laundry-folding robots in commercial laundromats, demonstrating that current models can handle deformable objects and unstructured environments with human assistance.

Bottom Line

Founders should start building vertical robotics applications now using mixed autonomy systems, as cross-embodiment AI models are rapidly generalizing across hardware and approaching the capability to perform complex physical tasks without task-specific training.

More from Y Combinator

View all
This Startup Secretly Detects Fraud For Fortune 500s
31:24
Y Combinator Y Combinator

This Startup Secretly Detects Fraud For Fortune 500s

Variance emerges from stealth with a $21 million Series A to scale AI agents that automate fraud detection and compliance for Fortune 500s like GoFundMe, replacing human analysts with self-healing systems capable of detecting complex abuse networks in real-time while processing petabytes of data with just five engineers.

19 days ago · 10 points
François Chollet: ARC-AGI-3, Beyond Deep Learning & A New Approach To ML
57:24
Y Combinator Y Combinator

François Chollet: ARC-AGI-3, Beyond Deep Learning & A New Approach To ML

François Chollet predicts AGI will arrive around 2030 but argues current deep learning is fundamentally inefficient; through his lab NDIA, he is pioneering symbolic program synthesis as a more optimal alternative focused on human-like skill acquisition efficiency, while acknowledging that LLM-based systems will first dominate domains with verifiable reward signals like coding.

23 days ago · 10 points
Agents For Non-Technical Users
39:33
Y Combinator Y Combinator

Agents For Non-Technical Users

Emergent founders Makund and Madav Jar discuss pivoting from enterprise testing tools to a consumer AI platform that enables non-technical users to build production-ready software, achieving 7 million apps built in 8 months by architecting proprietary infrastructure and hiding technical complexity.

about 1 month ago · 10 points
How To Build The Future: Max Hodak
53:21
Y Combinator Y Combinator

How To Build The Future: Max Hodak

Max Hodak argues that BCIs represent a fundamental shift from incremental biotech to a precise engineering paradigm capable of restoring senses and potentially extending human lifespan to centuries, with current technology already restoring vision to the blind and paving the way for cognitive enhancement.

about 1 month ago · 9 points