Codex and Subagents — Vaibhav Srivastav & Katia Gil Guzman, OpenAI
TL;DR
OpenAI's Katia Gil Guzman and Vaibhav Srivastav unveil Codex's evolution into a full software engineering agent, demonstrating new plugins that bundle workflows, automations for background task scheduling, and subagent capabilities powered by mini models to handle complex parallel development tasks.
🚀 Model Infrastructure & Performance 4 insights
Hierarchical model stack for different workloads
Codex utilizes GPT 5.4 for complex long-running tasks while new Mini and Nano models handle short tasks and cost-effective subagent parallelization.
Subagent parallelization capabilities
Mini models enable distribution of complex feature requests across multiple concurrent subagents, significantly accelerating development workflows.
Websocket infrastructure for speed
New websocket connections deliver 1.75x faster tokens, with an optional fast mode providing 2x speed improvements for real-time interactions.
Unified agent harness architecture
A safety-embedded wrapper manages tool execution, environment setup, and long-running task evaluation beyond raw model capabilities.
🔌 Plugins and Workflow Automation 3 insights
Plugin architecture bundles integrations
Plugins combine skills (reusable instructions), apps (external service connections), and MCP servers into single installable workflow packages.
Scheduled background automations
Users can configure recurring tasks like daily Slack message summaries or Gmail priority filtering to run automatically without manual intervention.
Visual development tooling
Game development plugins integrate Playwright Interactive for browser-based debugging and Imagen for automated generation of sprites and visual assets.
💻 Developer Experience and Scale 3 insights
Parallel development via git work trees
The Codex app supports native work trees enabling simultaneous work on multiple features within the same project without context switching.
Native Windows sandbox support
Codex now offers native Windows sandbox capabilities, making it the first agent harness to provide native sandboxing on Windows alongside macOS.
Massive developer adoption
Codex has grown to 3 million weekly active users, more than tripling since January, with integrations across IDEs, CLI, Slack, and GitHub.
Bottom Line
Developers should immediately adopt Codex's plugin system to bundle repetitive workflows and configure automations for routine tasks like email triage and Slack monitoring, effectively creating a 24/7 software engineering assistant that handles background maintenance while you focus on complex architecture decisions.
More from AI Engineer
View all
The Production AI Playbook: Deploying Agents at Enterprise Scale — Sandipan Bhaumik, Databricks
Sandipan Bhaumik from Databricks presents a battle-tested five-pillar framework for deploying enterprise AI agents, arguing that starting with model selection leads to inevitable production failures while proper evaluation, observability, and data governance determine success at scale.
Sovereign Escape Velocity: Ownership w Open Models — Gus Martins, & Ian Ballantyne, Google DeepMind
Google DeepMind's Gus Martins and Ian Ballantyne introduce Gemma 4, a family of open models (2B to 31B parameters) that deliver frontier-level intelligence with disproportionate efficiency, enabling sovereign AI ownership through local deployment, Apache 2.0 licensing, and on-device capabilities.
LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize
Dat Ngo from Arize AI explains how modern AI systems require reimagined observability and evaluation patterns built on OpenTelemetry to manage non-deterministic agents, emphasizing that the future of AI engineering lies in automated experimentation flywheels that eliminate manual dashboard work.
Text Diffusion — Brendon Dillon, Google DeepMind
Google DeepMind researcher Brendon Dillon explains text diffusion as a parallel alternative to autoregressive language models that iteratively denoises random tokens rather than generating sequentially, offering significantly lower latency and unique capabilities like self-correction and adaptive computation, though currently limited by high serving costs for large batches.