Intro to NVIDIA Cosmos with Ming-Yu ft. Superintelligence | Cosmos Labs

| Podcasts | February 10, 2026 | 226 Thousand views | 49:03

TL;DR

NVIDIA Cosmos is an open world foundation model that generates synthetic training environments to solve the data scarcity bottleneck in physical AI, essentially creating 'The Matrix for robots' where machines learn visual-motor skills through interactive simulation before real-world deployment.

🌍 The Data Scarcity Challenge 2 insights

Real-world data collection is prohibitively expensive

Collecting sufficient training data for physical AI is slow, costly, and never comprehensive enough to cover the diverse, unpredictable nature of the physical world.

Visual world requires richer representation than language

Unlike LLMs trained on internet text, physical AI demands world foundation models because visual-motor skills are difficult to describe in language but easy to model through pixels and physics.

🏗️ Three-Pillar Architecture 3 insights

Cosmos Predict generates future world states

This base model creates video predictions from current images and actions, enabling trajectory forecasting and interactive closed-loop simulations where the world responds to robot actions.

Cosmos Transfer closes the sim-to-real gap

It transforms outputs from physics simulators like Isaac Sim into photorealistic video while maintaining Newtonian physics accuracy, and can augment demonstrations across diverse environments.

Cosmos Reason provides video evaluation

Acting as a visual language model, it analyzes whether tasks are completed successfully, assigns reward signals for training, and breaks down edge cases into familiar physical interactions.

🤖 Cosmos Policy & Interactive Learning 3 insights

Video-native policies outperform VLM-based approaches

Cosmos Policy is built atop video models rather than visual language models, enabling more precise prediction of pixel-level dynamics crucial for robot control.

Model predictive control through value functions

The system simulates multiple possible action sequences, assigns values to predicted outcomes, and selects optimal trajectories similar to having a chess engine for physical tasks.

Training on worlds enables interactive learning

Unlike passive data consumption, this approach allows models to learn from customized interactive experiences where actions change world states and generate environmental feedback.

🔓 Open Source Imperative 2 insights

Physical AI requires hardware customization

Because robots have diverse sensor configurations ranging from three to seven cameras with various LiDAR setups, open weights and architecture are essential for developers to configure models to their specific hardware.

Domain specialization through post-training

Developers can specialize the 8B parameter models for specific environments like warehouses or manufacturing lines by post-training on targeted data rather than relying solely on zero-shot prompting.

Bottom Line

Developers should leverage Cosmos as an open foundation to generate unlimited synthetic training data and run closed-loop simulations for their specific robotic configurations, post-training the models on domain-specific scenarios to overcome the prohibitive costs of real-world data collection.

More from NVIDIA AI Podcast

View all
Apr 14 - Jetson AI Lab Research Group Call - Tensor RT Edge LLM on Jetson & Culture
51:38
NVIDIA AI Podcast NVIDIA AI Podcast

Apr 14 - Jetson AI Lab Research Group Call - Tensor RT Edge LLM on Jetson & Culture

NVIDIA researchers Lynn Chai and Luc introduce TensorRT Edge LLM, a purpose-built inference engine for deploying large language models on Jetson edge devices, showcasing NVFP4 quantization and speculative decoding techniques that achieve up to 7x faster prefill speeds and 500 tokens per second generation while previewing a simplified vLLM-style Python API coming soon.

5 days ago · 10 points
March 10 - Jetson AI Lab Research Group Call - Lightning talks
55:28
NVIDIA AI Podcast NVIDIA AI Podcast

March 10 - Jetson AI Lab Research Group Call - Lightning talks

This Jetson AI Lab Research Group call features lightning talks on open-source hardware for remote Jetson access, a real-time emotional AI engine for robots running entirely on Jetson Nano, and updates to the Jetson AI Lab model repository with new performance benchmarks and deployment guides.

5 days ago · 8 points
Feb 10 - Jetson AI Lab Research Group Call - Drones on Jetson & Isaac Lab on DGX Spark
57:34
NVIDIA AI Podcast NVIDIA AI Podcast

Feb 10 - Jetson AI Lab Research Group Call - Drones on Jetson & Isaac Lab on DGX Spark

Cameron Rose presents 'Operation Squirrel,' an autonomous drone project using Jetson Orin Nano for real-time target tracking and dynamic payload delivery. The system uses a modular C++ software stack with TensorRT-optimized YOLO and OSNet running at 21 FPS, communicating via UART with a flight controller to maintain following distance through velocity commands.

5 days ago · 9 points