Cursor's Third Era: Cloud Agents — ft. Sam Whitmore, Jonas Nelle, Cursor
TL;DR
Cursor launched Cloud Agents that provide AI models with full virtual machine access to autonomously write, test, and demonstrate code through video recordings, shifting from simple code generation to end-to-end software engineering workflows.
🖥️ Full-Computer Architecture 3 insights
Brain in a box virtual machine approach
Agents now run on full VMs with complete computer use (pixels in, coordinates out) rather than just reading code, enabled by Autotab integration and proper DevX setup like a human developer.
Autonomous end-to-end testing on dev servers
Agents automatically start dev servers and test changes for 30+ minutes, returning with verified PRs rather than untested code suggestions, using default prompting calibrated to change complexity.
Multi-model synergistic base layer approach
The system leverages strengths from different model providers as base layers, creating outputs better than any single unified model tier could achieve alone.
🎥 Visual Verification System 3 insights
Video demonstrations accelerate code review process
Every agent session generates a video recording showing the implemented feature in action, serving as an entry point for review that is faster than reading large diffs.
Complete VNC remote desktop environment access
Developers get full remote control of the agent's VM to hover, type, and interact with the live environment via VNC before deciding to merge or request iterations.
Zero-prompt intelligent testing strategies
Agents autonomously determine how to test changes—such as opening Chrome DevTools to inject 5,000 characters to test error limits—without explicit human instructions.
🚀 Advanced Agentic Workflows 3 insights
Automated bug reproduction and fix verification
The /repro command enables agents to reproduce bugs on video, fix them, and demonstrate the fix, reducing complex bug resolution from hours to 90-second review cycles.
Parallel agent swarms increase development throughput
The next major unlock involves parallelizing work through swarms of agents to dramatically increase throughput rather than just making single agents faster.
Recursive agent debugging and Datadog integration
Cloud agents can spin up sub-agents to debug themselves using Datadog MCP integration and explore logs, though recursive agent spawning is currently disabled.
Bottom Line
Start using full-computer agents with visual verification workflows immediately, as models like Claude 3.5 Sonnet and Codex 53 have crossed the threshold to autonomously handle end-to-end development including testing and bug reproduction.
More from Latent Space
View all
🔬There Is No AlphaFold for Materials — AI for Materials Discovery with Heather Kulik
MIT professor Heather Kulik explains how AI discovered quantum phenomena to create 4x tougher polymers and why materials science lacks an 'AlphaFold' equivalent due to missing experimental datasets, emphasizing that domain expertise remains essential to validate AI predictions in chemistry.
Dreamer: the Agent OS for Everyone — David Singleton
David Singleton introduces Dreamer as an 'Agent OS' that combines a personal AI Sidekick with a marketplace of tools and agents, enabling both non-technical users and engineers to build, customize, and deploy AI applications through natural language while maintaining privacy through centralized, OS-level architecture.
Why Anthropic Thinks AI Should Have Its Own Computer — Felix Rieseberg of Claude Cowork/Code
Anthropic's Felix Rieseberg explains why AI agents need their own virtual computers to be effective, arguing that confining Claude to chat interfaces severely limits capability. He details how this philosophy shaped Claude Cowork and why product development is shifting from lengthy planning to rapidly building multiple prototypes simultaneously.
⚡️Monty: the ultrafast Python interpreter by Agents for Agents — Samuel Colvin, Pydantic
Samuel Colvin from Pydantic introduces Monty, a Rust-based Python interpreter designed specifically for AI agents that achieves sub-microsecond execution latency by running in-process, bridging the gap between rigid tool calling and heavy containerized sandboxes.