Building Towards Self-Driving Codebases with Long-Running, Asynchronous Agents
TL;DR
Cursor co-founder Aman traces AI coding's evolution from autocomplete to synchronous agents, outlining the shift toward long-running async cloud agents that use multi-agent architectures to overcome context limits, and predicting a future of self-driving codebases with self-healing systems and minimal human intervention.
☁️ The Shift to Async Cloud Agents 3 insights
Escaping local compute constraints
Cloud agents run in dedicated VMs with full desktop environments, enabling long-running tasks, resource-intensive testing, and computer use capabilities impossible on local machines.
Rapid internal adoption at Cursor
As of February 2025, 30% of merged PRs at Cursor originated from cloud agents, including complex refactors like a 25x performance improvement migrating video rendering from React to Rust.
Artifact-based review workflows
Engineers increasingly review artifacts like video demos of features and research reports rather than raw code diffs, making iteration tractable despite agents producing 3-4x more code than synchronous methods.
🤖 Multi-Agent Architecture 3 insights
Solving the train-test time mismatch
Single agents fail on multi-million token trajectories due to RL training limits, necessitating hierarchical systems where a main planner delegates to sub-agents handling shorter, in-distribution tasks.
Model specialization by capability
Cursor's architecture uses OpenAI models for high-level orchestration while deploying Gemini and Anthropic models for multimodal tasks like computer use and UI generation.
Optimized inference for sub-tasks
Delegated sub-agent tasks often require smaller, faster models rather than frontier models, delivering equivalent performance with significantly reduced latency and cost.
🚗 The Self-Driving Codebase 3 insights
Autonomous self-healing systems
Event-driven automations allow agents to fix issues from error trackers or pager alerts and merge code without human review, with the goal of agents becoming primary on-call responders.
Full-project generation capability
A one-week experiment building a functional web browser consumed billions of tokens and tens of thousands of dollars in compute, demonstrating the feasibility of zero-intervention development for complex software.
Proactive infrastructure monitoring
Agents continuously monitor ML training runs via weights and biases logs to catch degradation and prevent crashes before humans are alerted.
Bottom Line
Engineering teams should prepare for self-driving codebases by adopting cloud-based async agents with robust multi-agent orchestration and shifting review workflows from code inspection to artifact validation.
More from NVIDIA AI Podcast
View all
Accelerate AI through Open Source Inference | NVIDIA GTC
Industry leaders from NVIDIA, Hugging Face, Mistral AI, Black Forest Labs, and Lightricks discuss how open-source inference optimization—spanning quantization, latent compression, and Mixture of Experts architectures—is enabling both massive trillion-parameter models and efficient edge deployment while driving the shift toward sovereign AI and local data control.
Reinforcement Learning at Scale: Engineering the Next Generation of Intelligence
Former OpenAI researchers now leading frontier startups explain how reinforcement learning has evolved from game-playing agents to powering enterprise automation and scientific discovery, requiring new scaling paradigms focused on inference compute and long-horizon reasoning rather than just pre-training FLOPs.
Teach AI to Code in Every Language with NVIDIA NeMo | NVIDIA GTC
NVIDIA researchers demonstrate training a multilingual code generation model from scratch using 43x less data than typical foundation models, achieving 38.87% accuracy on HumanEval+ while supporting English/Spanish and Python/Rust through efficient data curation and checkpoint merging.
Advancing to AI's Next Frontier: Insights From Jeff Dean and Bill Dally
Google's Jeff Dean and NVIDIA's Bill Dally discuss the rapid evolution toward autonomous AI agents capable of multi-day tasks and self-improvement, while detailing the radical hardware shifts—toward 'speed of light' latency and specialized inference chips—required to power this next frontier.