Build & deploy AI-powered apps — Paige Bailey, Google DeepMind
TL;DR
Paige Bailey demonstrates Google DeepMind's rapid release of the Gemini 3.1 model series and AI Studio tools, showcasing how developers can leverage multimodal capabilities, sandboxed code execution, and real-time screen sharing to build production AI applications with exceptional cost efficiency.
🚀 Google's Expanded Model Portfolio 3 insights
Gemini 3.1 Series Launch
Google released Gemini 3.1 Pro (largest/most capable), Flash (production workhorse), and Flashlight (ultra-fast/cheap) within the last month, with Augment Code replatforming their entire agent system to Pro for cost-performance optimization.
True Multimodal Architecture
Unlike competitors limited to text/code outputs, Gemini processes and generates video, images, audio, text, and code simultaneously, including interleaved formats and PDFs with embedded images.
Specialized Creative Models
New releases include Nano Banana 2 for image generation and editing, VO3.1 Light for video generation at compelling cost profiles, multimodal embeddings supporting all content types in the same space, and Genie 3 for dynamic world-building.
🛠️ AI Studio Development Features 4 insights
Sandboxed Code Execution
AI Studio provides isolated Python environments with pre-installed data science libraries, enabling models to write and execute code for computer vision tasks like drawing bounding boxes without local security risks.
Native YouTube Video Analysis
Developers can analyze videos directly via URL at 1 frame per second, processing 5-minute clips (consuming ~27,600 tokens) to extract structured data like timestamped tables with automatic Google Search grounding for citations.
Side-by-Side Model Comparison
Compare mode allows testing multiple models simultaneously with identical prompts and tools to evaluate tradeoffs between accuracy, speed, and token cost (e.g., 3.1 Flashlight vs 3 Flash).
URL Context Grounding
Users can add public URLs to ground responses with inline citations, while Vertex AI offers similar retrieval capabilities for internal documents without requiring vector database infrastructure.
💰 Real-Time Applications & Economics 3 insights
Gemini Live Interface
Supports real-time screen sharing, video feeds, and audio conversations with multilingual support, custom function calling, and automatic Google Search integration for dynamic assistance.
Fractional Penny Economics
Gemini 3.1 Flashlight performs complex vision analysis and object detection (e.g., identifying Lego bricks) for well under one cent, making high-frequency multimodal applications economically viable for production.
Configurable Inference Levels
Models offer minimal/low/medium/high thinking configurations to control token consumption and latency, with Flashlight optimized for minimal thinking to maximize speed without sacrificing accuracy on routine tasks.
Bottom Line
Start building with Gemini 3.1 Flashlight in AI Studio, enabling code execution and grounding tools to deploy production-grade multimodal applications at costs under a penny per request.
More from AI Engineer
View all
Human-in-the-Loop Automation with n8n — Liam McGarrigle
Liam McGarrigle demonstrates building AI agents in n8n using visual workflows, emphasizing transparent orchestration over black-box automation through configurable memory, chat triggers, and tool integration for practical business applications.
Mastering AI Pricing: Flexible & Agile Monetization — Mayank Pant, Stripe
AI companies are growing three times faster than traditional SaaS but face unique pricing challenges due to unpredictable compute costs and razor-thin margins, requiring a shift from static subscription models to flexible hybrid pricing that prioritizes rapid iteration and customer-perceived value over technical metrics.
Shipping complex AI applications — Braintrust & Trainline
This workshop demonstrates how to bridge the gap between AI prototypes and production systems using Brain Trust's observability platform, featuring Trainline's experience deploying multi-agent AI applications serving 27 million users.
Building Conversational Agents — Thor Schaeff and Philipp Schmid, Google DeepMind
Google DeepMind engineers Thor Schaeff and Philipp Schmid demonstrate building conversational agents using the new Gemini Interactions API, a unified interface that supports both direct model inference and complex autonomous agents like Deep Research with server-side state management and asynchronous execution.