Building Conversational Agents — Thor Schaeff and Philipp Schmid, Google DeepMind

AI Engineer

| Podcasts | April 30, 2026 | 4.72 Thousand views | 1:47:34

TL;DR

Google DeepMind engineers Thor Schaeff and Philipp Schmid demonstrate building conversational agents using the new Gemini Interactions API, a unified interface that supports both direct model inference and complex autonomous agents like Deep Research with server-side state management and asynchronous execution.

🔄 The Interactions API Evolution 3 insights

Industry-standard interface migration

The Interactions API replaces the proto-specific generateContent endpoint with a format resembling OpenAI and Anthropic APIs, using unified content blocks for text, images, audio, and function calls.

Unified surface for models and agents

A single API endpoint handles both simple model calls (Gemini Flash) and autonomous agents (Deep Research), enabling seamless chaining where research outputs can feed directly into image generation models.

Flexible state management

Developers can leverage server-side state via previous interaction IDs to eliminate manual history management, or opt for client-side context control when custom context engineering is required.

⚡ Performance & Architecture 3 insights

Dramatic caching cost reduction

Server-side state preservation improves cache hit rates by 2-3x compared to client-managed context, reducing input token costs by 90% since the system no longer breaks cache keys with context modifications.

Native asynchronous execution

Built-in background processing and webhook support allow agents to run multi-minute research tasks without maintaining long-lived HTTP connections, addressing production reliability concerns.

Enhanced tool composition

New tool combination capabilities allow chaining Google Search with custom functions in a single request, alongside support for remote MCP (Model Context Protocol) servers.

🛠️ Practical Development Workflow 3 insights

AI-assisted coding paradigm

The workshop emphasizes using AI coding agents (Cursor, Windsurf, Gemini CLI) rather than manual coding, leveraging pre-built 'Agent Skills' to scaffold the entire development environment instantly.

Simplified action-reaction loops

The new 'requires_action' pattern makes function calling explicit—developers check output types for function calls, execute locally, and append results until the model generates a final response.

Zero-barrier experimentation

All examples run on the free tier requiring only a Google account via AI.dev (Google AI Studio), with no credit card required for API access during development.

Bottom Line

Migrate to the Interactions API for production agents to leverage server-side state and asynchronous processing, which eliminates manual context management while reducing costs through superior implicit caching.

Watch on YouTube

More from AI Engineer

Human-in-the-Loop Automation with n8n — Liam McGarrigle

AI Engineer

Human-in-the-Loop Automation with n8n — Liam McGarrigle

Liam McGarrigle demonstrates building AI agents in n8n using visual workflows, emphasizing transparent orchestration over black-box automation through configurable memory, chat triggers, and tool integration for practical business applications.

about 7 hours ago · 9 points

Mastering AI Pricing: Flexible & Agile Monetization — Mayank Pant, Stripe

AI Engineer

Mastering AI Pricing: Flexible & Agile Monetization — Mayank Pant, Stripe

AI companies are growing three times faster than traditional SaaS but face unique pricing challenges due to unpredictable compute costs and razor-thin margins, requiring a shift from static subscription models to flexible hybrid pricing that prioritizes rapid iteration and customer-perceived value over technical metrics.

1 day ago · 10 points

Shipping complex AI applications — Braintrust & Trainline

AI Engineer

Shipping complex AI applications — Braintrust & Trainline

This workshop demonstrates how to bridge the gap between AI prototypes and production systems using Brain Trust's observability platform, featuring Trainline's experience deploying multi-agent AI applications serving 27 million users.

1 day ago · 10 points

Replacing 12K LoC with a 200 LoC Skill — David Gomes, Cursor

AI Engineer

Replacing 12K LoC with a 200 LoC Skill — David Gomes, Cursor

David Gomes from Cursor details how they replaced 15,000 lines of complex git work tree management code with a 200-line markdown skill using agent primitives, drastically reducing maintenance while enabling multi-repo support and flexible model comparisons, though requiring new approaches to ensure agent isolation.

3 days ago · 10 points

Browse more: 🎙️ Podcasts All Videos All Categories