How Google DeepMind Runs Agents at Scale — KP Sawhney & Ian Ballantyne, Google DeepMind

AI Engineer

| Podcasts | May 24, 2026 | 5.89 Thousand views

TL;DR

Google DeepMind engineers Ian Ballantyne and KP Sawhney demonstrate their internal "Antigravity" agent platform, revealing how the organization manages massive-scale deployment through strict quota controls, hybrid model architectures, and collaborative multi-agent workflows while grappling with token consumption costs and evaluation complexity.

🛠️ Antigravity: Internal Agent Infrastructure 3 insights

IDE-integrated agent manager with browser control

Antigravity provides a VS Code-style interface where agents can spawn multiple instances, control browsers, inspect DOM elements, and capture screenshots or videos of interactions.

Human-in-the-loop workflow

The system generates editable implementation plans and scratchpad notes that allow users to review, modify, or reject agent actions before proceeding.

Built-in planning and tool use

Agents autonomously create to-do lists and utilize internal tools to analyze specifications, write code, and verify implementations through browser automation.

⚡ Scaling Challenges & Resource Management 3 insights

Token hunger as primary bottleneck

Managing per-user and per-team quotas is critical at Google scale to prevent power users from overwhelming systems with token-hungry agentic workflows.

Hybrid model routing strategy

DeepMind mixes free local models like Gemma 4 for general tasks with advanced APIs for specific components to optimize costs and quota efficiency.

Incompatibility of subscription models

Traditional subscription pricing breaks down for agentic systems, necessitating new metering approaches as brute-force quota limits are currently the primary defense.

🔮 Evolution of Deep Research & Multi-Agent Systems 3 insights

Shift from monolithic to collaborative architecture

Deep Research is evolving from passing massive text blobs between stages to agents collaborating via shared filesystem workspaces.

Darwinian skill library approach

Internal users contribute to a sprawling library of skills where only the best survive through organizational natural selection, democratizing expert knowledge.

Human-supervised digital assembly lines

The future vision involves humans overseeing parallel agent tracks that communicate efficiently, moving beyond current single-agent limitations.

🔍 Evaluation & Observability 3 insights

Custom trajectory stores and tracing

DeepMind built internal tools including agent trajectory stores to diagnose exact failure points, looping behaviors, and hierarchical drill-downs to raw model requests.

Expensive evaluation infrastructure

Testing requires sandboxed environments and mock TPUs to validate agent harnesses without consuming scarce production compute resources.

Skills preferred over MCP

While both are supported, KP Sawhney advocates for skills combined with guardrail CLI interactions over MCP, viewing the latter as potentially transient except for authentication.

Bottom Line

Organizations scaling agents must implement strict quota controls and hybrid model routing while transitioning from monolithic architectures to collaborative multi-agent systems with shared workspaces to manage costs and improve reliability.

Watch on YouTube

More from AI Engineer

Think You Can Build a Game with AI? Think Again! - Danielle An & David Hoe, Meta

AI Engineer

Think You Can Build a Game with AI? Think Again! - Danielle An & David Hoe, Meta

Meta engineers Danielle An and David Hoe argue that while AI has democratized basic game creation, true differentiation requires human taste, cohesive aesthetics powered by key art anchoring, and innovative runtime LLMs that enable unscripted, dynamically personalized gameplay experiences previously impossible in traditional development.

about 22 hours ago · 10 points

Beyond the Harness: A Journey Towards Adaptative Engineering - Rajiv Chandegra, Annicha Labs

AI Engineer

Beyond the Harness: A Journey Towards Adaptative Engineering - Rajiv Chandegra, Annicha Labs

Rajiv Chandegra introduces 'adaptive engineering,' a paradigm shift from fixed AI harnesses (like Cursor or Claude Code) to dynamic, self-organizing systems that emerge during runtime, enabling AI to handle complex, real-world messes beyond deterministic software environments.

1 day ago · 9 points

What if the harness mattered more than the model? - Aditya Bhargava, Etsy

AI Engineer

What if the harness mattered more than the model? - Aditya Bhargava, Etsy

Aditya Bhargava argues that sophisticated agent harnesses can compensate for weaker open-source models, enabling local AI to match proprietary performance while reducing vendor dependency.

1 day ago · 9 points

Frontier results, on device - RL Nabors, Arize

AI Engineer

Frontier results, on device - RL Nabors, Arize

Rachel Lee Neighbors introduces a framework for replacing expensive cloud-based frontier models with Small Language Models (SLMs) running on-device, demonstrating how a systematic 'prototype big, deploy small' approach using evaluation tools like Phoenix can cut inference costs to zero while maintaining 90% accuracy and enabling offline functionality.

10 days ago · 10 points

Browse more: 🎙️ Podcasts All Videos All Categories