Moonlake: Multimodal, Interactive, and Efficient World Models — with Fan-yun Sun and Chris Manning

Latent Space

| Podcasts | April 02, 2026 | 6.41 Thousand views | 1:06:48

TL;DR

Moonlake founders Fan-yun Sun and Chris Manning argue that true world models require action-conditioned symbolic reasoning about physics and consequences, not just pixel prediction, enabling spatial intelligence with orders of magnitude less data than pure scaling approaches.

🌍 What World Models Actually Are 2 insights

Action-conditioned prediction separates world models from video generators

Unlike Sora-style video generators that predict pixels, true world models must predict the consequences of specific actions minutes into the future, requiring understanding of 3D physics and object permanence.

Long-term consistency requires semantic abstraction

Maintaining coherent game states or simulations for extended periods requires abstract symbolic representations rather than processing raw pixels, as evidenced by human cognitive processing that filters most visual input.

🏗️ Structure vs. Scale Thesis 2 insights

Symbolic abstraction enables five orders of magnitude efficiency gains

While not rejecting the bitter lesson entirely, Moonlake bets that structured reasoning traces incorporating geometry, physics, and affordances can achieve what pixel-only models require exponentially more data and compute to learn.

Current video data lacks essential action labels

Mining observational videos from the internet fails to capture the actions causing state changes, making it difficult to learn causal relationships without expensive action-conditioned simulation data.

💬 The Role of Language in Spatial Intelligence 2 insights

Language serves as a cognitive tool for abstraction

Following philosopher Dan Dennett, Manning argues language provides unique symbolic knowledge representation that enabled human evolutionary advantages in planning and tool-building beyond what vision alone provides.

Philosophical divergence from LeCun's visual-centric JEPA

Moonlake fundamentally disagrees with Yann LeCun's dismissal of symbolic representations, asserting that latent visual abstractions alone cannot achieve the causal reasoning and long-term planning necessary for embodied AI.

Bottom Line

Building world models requires structured symbolic reasoning about actions and physics rather than pure pixel prediction, leveraging language as a cognitive tool to achieve efficient, consistent spatial intelligence.

Watch on YouTube

More from Latent Space

The Agent-Native Cloud: 3M Users, 100K Signups/Wk, Data Centers, & Death PRs — Jake Cooper, Railway

Latent Space

The Agent-Native Cloud: 3M Users, 100K Signups/Wk, Data Centers, & Death PRs — Jake Cooper, Railway

Jake Cooper, founder of Railway, explains how the 'agent-native cloud' hit 3 million users and 100,000 weekly signups by betting that manual coding is obsolete, detailing their journey from a $500K/month free tier loss to bare metal infrastructure ownership.

about 15 hours ago · 9 points

The Next War Is Already Here — Yaroslav Azhnyuk, The Fourth Law & Noah Smith, Noahpinion

Latent Space

The Next War Is Already Here — Yaroslav Azhnyuk, The Fourth Law & Noah Smith, Noahpinion

Yaroslav Azhnyuk, former pet-tech founder turned defense entrepreneur, explains how The Fourth Law is building AI-powered autonomous drones to defend Ukraine, arguing that software-defined warfare and mass manufacturing scale have fundamentally rewritten the rules of military power.

3 days ago · 8 points

Inside Abridge: The AI Listening to 100 Million Doctor Visits — Abridge's Janie Lee & Chai Asawa

Latent Space

Inside Abridge: The AI Listening to 100 Million Doctor Visits — Abridge's Janie Lee & Chai Asawa

Abridge is transforming from an AI documentation tool into a comprehensive clinical intelligence layer that uses ambient listening and deep EHR integration to deliver proactive decision support, aiming to eliminate physician burnout while catching critical clinical and administrative issues before the patient leaves the room.

7 days ago · 10 points

🔬Top Black Holes Physicist: GPT5 can do Vibe Physics, here's what I found

Latent Space

🔬Top Black Holes Physicist: GPT5 can do Vibe Physics, here's what I found

Physicist Alex Lubyansky discusses how GPT-5 and reasoning models like o3 have achieved superhuman capabilities in theoretical physics, solving the year-long mystery of single minus gluon tree amplitudes and reproducing complex research in minutes rather than months.

16 days ago · 9 points

Browse more: 🎙️ Podcasts All Videos All Categories