Building durable Agents with Workflow DevKit & AI SDK - Peter Wielander, Vercel
TL;DR
Peter Wielander demonstrates how Vercel's open-source Workflow DevKit transforms local AI agents into production-grade systems by wrapping AI SDK code in durable, retryable workflows with minimal refactoring, enabling automatic observability and resumable streams without managing queues or databases.
🏗️ Workflow Pattern & Architecture 3 insights
Eliminating production boilerplate
The workflow pattern removes the need to manually wire up queues, databases, and error handling for long-running agents by providing a deterministic orchestration layer that persists state and retries failed steps automatically.
Deterministic step isolation
The `use workflow` directive compiles agent loops into isolated bundles that separate LLM calls and tool executions into discrete, replayable steps with cached inputs and outputs.
Serverless durability
Each marked step runs in an isolated serverless instance, allowing agents to execute for hours or days across multiple invocations without losing context or state.
🛠️ Implementation & Migration 3 insights
Minimal code changes required
Migrate existing AI SDK agents by simply adding the `use workflow` directive to the orchestration function and wrapping tool executions with `use step`, or use the pre-built `DurableAgent` class to handle LLM calls automatically.
Framework integration
The kit provides Next.js helpers (`withWorkflow`) and TypeScript compiler plugins that maintain deterministic execution while preserving compatibility with existing frontends and any cloud provider.
Stream persistence
The `getWritable` API creates durable streams that exist independently of the API handler lifecycle, allowing tools to write data packets that persist even if the client disconnects.
✨ Production Features & Observability 3 insights
Built-in local debugging
Running `npx workflow web` launches a local observability UI that visualizes every workflow run, step execution, input/output data, and associated events without additional instrumentation.
Resumable client sessions
Streams remain active on the backend even after client disconnections, enabling users to reconnect and resume exactly where the agent left off, critical for long-running coding or research tasks.
Human-in-the-loop support
The framework supports suspending workflows at any step to await human approval via webhooks or manual resume commands, making it easy to implement safety checks for production agents.
Bottom Line
Add Workflow DevKit to your AI SDK projects using simple `use workflow` and `use step` directives to automatically gain production durability, resumable streams, and step-by-step observability without rewriting agent logic or managing infrastructure.
More from AI Engineer
View all
The Production AI Playbook: Deploying Agents at Enterprise Scale — Sandipan Bhaumik, Databricks
Sandipan Bhaumik from Databricks presents a battle-tested five-pillar framework for deploying enterprise AI agents, arguing that starting with model selection leads to inevitable production failures while proper evaluation, observability, and data governance determine success at scale.
Sovereign Escape Velocity: Ownership w Open Models — Gus Martins, & Ian Ballantyne, Google DeepMind
Google DeepMind's Gus Martins and Ian Ballantyne introduce Gemma 4, a family of open models (2B to 31B parameters) that deliver frontier-level intelligence with disproportionate efficiency, enabling sovereign AI ownership through local deployment, Apache 2.0 licensing, and on-device capabilities.
LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize
Dat Ngo from Arize AI explains how modern AI systems require reimagined observability and evaluation patterns built on OpenTelemetry to manage non-deterministic agents, emphasizing that the future of AI engineering lies in automated experimentation flywheels that eliminate manual dashboard work.
Text Diffusion — Brendon Dillon, Google DeepMind
Google DeepMind researcher Brendon Dillon explains text diffusion as a parallel alternative to autoregressive language models that iteratively denoises random tokens rather than generating sequentially, offering significantly lower latency and unique capabilities like self-correction and adaptive computation, though currently limited by high serving costs for large batches.