Building Conversational Agents — Thor Schaeff and Philipp Schmid, Google DeepMind

| Podcasts | April 30, 2026 | 6.13 Thousand views | 1:47:34

TL;DR

Google DeepMind engineers Thor Schaeff and Philipp Schmid demonstrate building conversational agents using the new Gemini Interactions API, a unified interface that supports both direct model inference and complex autonomous agents like Deep Research with server-side state management and asynchronous execution.

🔄 The Interactions API Evolution 3 insights

Industry-standard interface migration

The Interactions API replaces the proto-specific generateContent endpoint with a format resembling OpenAI and Anthropic APIs, using unified content blocks for text, images, audio, and function calls.

Unified surface for models and agents

A single API endpoint handles both simple model calls (Gemini Flash) and autonomous agents (Deep Research), enabling seamless chaining where research outputs can feed directly into image generation models.

Flexible state management

Developers can leverage server-side state via previous interaction IDs to eliminate manual history management, or opt for client-side context control when custom context engineering is required.

Performance & Architecture 3 insights

Dramatic caching cost reduction

Server-side state preservation improves cache hit rates by 2-3x compared to client-managed context, reducing input token costs by 90% since the system no longer breaks cache keys with context modifications.

Native asynchronous execution

Built-in background processing and webhook support allow agents to run multi-minute research tasks without maintaining long-lived HTTP connections, addressing production reliability concerns.

Enhanced tool composition

New tool combination capabilities allow chaining Google Search with custom functions in a single request, alongside support for remote MCP (Model Context Protocol) servers.

🛠️ Practical Development Workflow 3 insights

AI-assisted coding paradigm

The workshop emphasizes using AI coding agents (Cursor, Windsurf, Gemini CLI) rather than manual coding, leveraging pre-built 'Agent Skills' to scaffold the entire development environment instantly.

Simplified action-reaction loops

The new 'requires_action' pattern makes function calling explicit—developers check output types for function calls, execute locally, and append results until the model generates a final response.

Zero-barrier experimentation

All examples run on the free tier requiring only a Google account via AI.dev (Google AI Studio), with no credit card required for API access during development.

Bottom Line

Migrate to the Interactions API for production agents to leverage server-side state and asynchronous processing, which eliminates manual context management while reducing costs through superior implicit caching.

More from AI Engineer

View all
LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize
AI Engineer AI Engineer

LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize

Dat Ngo from Arize AI explains how modern AI systems require reimagined observability and evaluation patterns built on OpenTelemetry to manage non-deterministic agents, emphasizing that the future of AI engineering lies in automated experimentation flywheels that eliminate manual dashboard work.

12 days ago · 9 points
Text Diffusion — Brendon Dillon, Google DeepMind
AI Engineer AI Engineer

Text Diffusion — Brendon Dillon, Google DeepMind

Google DeepMind researcher Brendon Dillon explains text diffusion as a parallel alternative to autoregressive language models that iteratively denoises random tokens rather than generating sequentially, offering significantly lower latency and unique capabilities like self-correction and adaptive computation, though currently limited by high serving costs for large batches.

15 days ago · 8 points