Building Conversational Agents — Thor Schaeff and Philipp Schmid, Google DeepMind
TL;DR
Google DeepMind engineers Thor Schaeff and Philipp Schmid demonstrate building conversational agents using the new Gemini Interactions API, a unified interface that supports both direct model inference and complex autonomous agents like Deep Research with server-side state management and asynchronous execution.
🔄 The Interactions API Evolution 3 insights
Industry-standard interface migration
The Interactions API replaces the proto-specific generateContent endpoint with a format resembling OpenAI and Anthropic APIs, using unified content blocks for text, images, audio, and function calls.
Unified surface for models and agents
A single API endpoint handles both simple model calls (Gemini Flash) and autonomous agents (Deep Research), enabling seamless chaining where research outputs can feed directly into image generation models.
Flexible state management
Developers can leverage server-side state via previous interaction IDs to eliminate manual history management, or opt for client-side context control when custom context engineering is required.
⚡ Performance & Architecture 3 insights
Dramatic caching cost reduction
Server-side state preservation improves cache hit rates by 2-3x compared to client-managed context, reducing input token costs by 90% since the system no longer breaks cache keys with context modifications.
Native asynchronous execution
Built-in background processing and webhook support allow agents to run multi-minute research tasks without maintaining long-lived HTTP connections, addressing production reliability concerns.
Enhanced tool composition
New tool combination capabilities allow chaining Google Search with custom functions in a single request, alongside support for remote MCP (Model Context Protocol) servers.
🛠️ Practical Development Workflow 3 insights
AI-assisted coding paradigm
The workshop emphasizes using AI coding agents (Cursor, Windsurf, Gemini CLI) rather than manual coding, leveraging pre-built 'Agent Skills' to scaffold the entire development environment instantly.
Simplified action-reaction loops
The new 'requires_action' pattern makes function calling explicit—developers check output types for function calls, execute locally, and append results until the model generates a final response.
Zero-barrier experimentation
All examples run on the free tier requiring only a Google account via AI.dev (Google AI Studio), with no credit card required for API access during development.
Bottom Line
Migrate to the Interactions API for production agents to leverage server-side state and asynchronous processing, which eliminates manual context management while reducing costs through superior implicit caching.
More from AI Engineer
View all
The Production AI Playbook: Deploying Agents at Enterprise Scale — Sandipan Bhaumik, Databricks
Sandipan Bhaumik from Databricks presents a battle-tested five-pillar framework for deploying enterprise AI agents, arguing that starting with model selection leads to inevitable production failures while proper evaluation, observability, and data governance determine success at scale.
Sovereign Escape Velocity: Ownership w Open Models — Gus Martins, & Ian Ballantyne, Google DeepMind
Google DeepMind's Gus Martins and Ian Ballantyne introduce Gemma 4, a family of open models (2B to 31B parameters) that deliver frontier-level intelligence with disproportionate efficiency, enabling sovereign AI ownership through local deployment, Apache 2.0 licensing, and on-device capabilities.
LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize
Dat Ngo from Arize AI explains how modern AI systems require reimagined observability and evaluation patterns built on OpenTelemetry to manage non-deterministic agents, emphasizing that the future of AI engineering lies in automated experimentation flywheels that eliminate manual dashboard work.
Text Diffusion — Brendon Dillon, Google DeepMind
Google DeepMind researcher Brendon Dillon explains text diffusion as a parallel alternative to autoregressive language models that iteratively denoises random tokens rather than generating sequentially, offering significantly lower latency and unique capabilities like self-correction and adaptive computation, though currently limited by high serving costs for large batches.