How Google DeepMind Runs Agents at Scale — KP Sawhney & Ian Ballantyne, Google DeepMind

| Podcasts | May 24, 2026 | 2.56 Thousand views

TL;DR

Google DeepMind engineers Ian Ballantyne and KP Sawhney demonstrate their internal "Antigravity" agent platform, revealing how the organization manages massive-scale deployment through strict quota controls, hybrid model architectures, and collaborative multi-agent workflows while grappling with token consumption costs and evaluation complexity.

🛠️ Antigravity: Internal Agent Infrastructure 3 insights

IDE-integrated agent manager with browser control

Antigravity provides a VS Code-style interface where agents can spawn multiple instances, control browsers, inspect DOM elements, and capture screenshots or videos of interactions.

Human-in-the-loop workflow

The system generates editable implementation plans and scratchpad notes that allow users to review, modify, or reject agent actions before proceeding.

Built-in planning and tool use

Agents autonomously create to-do lists and utilize internal tools to analyze specifications, write code, and verify implementations through browser automation.

Scaling Challenges & Resource Management 3 insights

Token hunger as primary bottleneck

Managing per-user and per-team quotas is critical at Google scale to prevent power users from overwhelming systems with token-hungry agentic workflows.

Hybrid model routing strategy

DeepMind mixes free local models like Gemma 4 for general tasks with advanced APIs for specific components to optimize costs and quota efficiency.

Incompatibility of subscription models

Traditional subscription pricing breaks down for agentic systems, necessitating new metering approaches as brute-force quota limits are currently the primary defense.

🔮 Evolution of Deep Research & Multi-Agent Systems 3 insights

Shift from monolithic to collaborative architecture

Deep Research is evolving from passing massive text blobs between stages to agents collaborating via shared filesystem workspaces.

Darwinian skill library approach

Internal users contribute to a sprawling library of skills where only the best survive through organizational natural selection, democratizing expert knowledge.

Human-supervised digital assembly lines

The future vision involves humans overseeing parallel agent tracks that communicate efficiently, moving beyond current single-agent limitations.

🔍 Evaluation & Observability 3 insights

Custom trajectory stores and tracing

DeepMind built internal tools including agent trajectory stores to diagnose exact failure points, looping behaviors, and hierarchical drill-downs to raw model requests.

Expensive evaluation infrastructure

Testing requires sandboxed environments and mock TPUs to validate agent harnesses without consuming scarce production compute resources.

Skills preferred over MCP

While both are supported, KP Sawhney advocates for skills combined with guardrail CLI interactions over MCP, viewing the latter as potentially transient except for authentication.

Bottom Line

Organizations scaling agents must implement strict quota controls and hybrid model routing while transitioning from monolithic architectures to collaborative multi-agent systems with shared workspaces to manage costs and improve reliability.

More from AI Engineer

View all
Your Agent Is an Infinite Canvas — RL Nabors, Dressed for Space
AI Engineer AI Engineer

Your Agent Is an Infinite Canvas — RL Nabors, Dressed for Space

Rachel Lee Neighbors argues that chat interfaces are merely a transitional phase like the CLI was to GUI, demonstrating how HTTP-based MCP servers and interactive MCP apps can turn agents into an 'infinite canvas' for rich web experiences while eliminating inefficient DOM scraping through emerging Web MCP standards.

1 day ago · 9 points
Fast Models Need Slow Developers — Sarah Chieng, Cerebras
AI Engineer AI Engineer

Fast Models Need Slow Developers — Sarah Chieng, Cerebras

As AI coding models like Codex Spark reach 1,200 tokens per second—20x faster than current standards—developers must abandon bad habits formed during the era of slow inference. This talk outlines a practical playbook for "slow development": orchestrating fast models for execution while using slower, smarter models for planning, and treating AI as a real-time pair programmer requiring constant verification and strict context management.

3 days ago · 9 points
Let's go Bananas with GenMedia — Guillaume Vernade, Google DeepMind
1:17:14
AI Engineer AI Engineer

Let's go Bananas with GenMedia — Guillaume Vernade, Google DeepMind

Guillaume Vernade from Google DeepMind demonstrates how to build multimodal content pipelines using the new GenMedia suite (Nano Banana 2, Veo 3.1, and Lyria) via the Gemini Developer API, showcasing a live workshop that transforms text into illustrated books with AI-generated images, video, and music.

7 days ago · 10 points