Fast Models Need Slow Developers — Sarah Chieng, Cerebras
TL;DR
As AI coding models like Codex Spark reach 1,200 tokens per second—20x faster than current standards—developers must abandon bad habits formed during the era of slow inference. This talk outlines a practical playbook for "slow development": orchestrating fast models for execution while using slower, smarter models for planning, and treating AI as a real-time pair programmer requiring constant verification and strict context management.
⚡ The Infrastructure Behind the Speed Surge 3 insights
Hardware breaks the memory wall
New architectures like Cerebras' wafer-scale engine use on-chip SRAM instead of off-chip HBM to eliminate memory bandwidth bottlenecks, while disaggregated inference separates compute-bound prefill from memory-bound decode onto specialized hardware.
Stack-wide optimizations compound
Efficiency gains come simultaneously from model architectures like Mixture of Experts (MoE) and pruning, plus inference techniques like KV cache reuse that minimize redundant computations.
The 20x danger multiplier
Without changing habits developed for 50 token/sec models, developers will generate massive amounts of unverified technical debt 20 times faster, turning agent swarms into instant spaghetti code.
🎯 Orchestrating the Fast and the Slow 3 insights
Strategic model pairing
Use larger, more intelligent models for complex planning and long-horizon workflows, then deploy fast models like Codex Spark as pure executors for sub-tasks to maximize both quality and speed.
Codify success into skills
Capture successful AI trajectories as reusable "skills" using slower planning models, then have fast agents execute these verified patterns autonomously in the background.
Cherry-picking induces taste
Leverage extreme speed to generate 15 to 75 variations of UI or design elements simultaneously, then manually select the best to artificially inject "taste" that models lack without exhaustive prompt engineering.
🧠 Real-Time Collaboration & Context Discipline 3 insights
Shift from batch to interactive
Treat fast models as real-time pair programmers; sit with the code, ask questions, and actively steer implementation rather than spawning agents and walking away.
Validation becomes free
At 1,200 tokens/sec, exhaustive validation—test suites, linting, diff reviews, and browser QA—should be baked into every step instead of deferred to pre-commit.
Externalize memory immediately
With context windows filling 20x faster (compaction in 30 seconds vs. 10 minutes), break tasks into bounded goals and use persistent files (agents.md, plan.md, progress.md, verify.md) to maintain state across sessions.
Bottom Line
Adopt a "slow developer" mindset by using fast models for execution only under tight human supervision and strict constraints, while externalizing context to prevent information loss in high-speed sessions.
More from AI Engineer
View all
Let's go Bananas with GenMedia — Guillaume Vernade, Google DeepMind
Guillaume Vernade from Google DeepMind demonstrates how to build multimodal content pipelines using the new GenMedia suite (Nano Banana 2, Veo 3.1, and Lyria) via the Gemini Developer API, showcasing a live workshop that transforms text into illustrated books with AI-generated images, video, and music.
Build Agents That Run for Hours (Without Losing the Plot) — Ash Prabaker & Andrew Wilson, Anthropic
Anthropic engineers Ash Prabakar and Andrew Wilson explain how to build AI agents that run for hours or days by combining model improvements with strategic 'harness' scaffolding that solves context limitations, planning failures, and unreliable self-evaluation through persistent state management, verification loops, and deterministic orchestration patterns.
Beyond Code Coverage: Functionality Testing with Playwright — Marlene Mhangami, Microsoft
Marlene Mhangami presents data showing GitHub code creation accelerating to 14 billion projected commits in 2026, driven by AI agents. She argues that true productivity gains require clean codebases and advocates for behavior-driven test development using Playwright with AI agents, where developers focus on refactoring while AI handles test generation and initial code implementation.
Ship Real Agents: Hands-On Evals for Agentic Applications — Laurie Voss, Arize
Laurie Voss presents a practical framework for evaluating AI agents, emphasizing the shift from manual 'vibe checks' to automated test suites that combine code evals, LLM judges, and human validation to catch cascading failures in production systems.