Building Towards Self-Driving Codebases with Long-Running, Asynchronous Agents

| Podcasts | April 12, 2026 | 19.8 Thousand views | 37:49

TL;DR

Cursor co-founder Aman traces AI coding's evolution from autocomplete to synchronous agents, outlining the shift toward long-running async cloud agents that use multi-agent architectures to overcome context limits, and predicting a future of self-driving codebases with self-healing systems and minimal human intervention.

☁️ The Shift to Async Cloud Agents 3 insights

Escaping local compute constraints

Cloud agents run in dedicated VMs with full desktop environments, enabling long-running tasks, resource-intensive testing, and computer use capabilities impossible on local machines.

Rapid internal adoption at Cursor

As of February 2025, 30% of merged PRs at Cursor originated from cloud agents, including complex refactors like a 25x performance improvement migrating video rendering from React to Rust.

Artifact-based review workflows

Engineers increasingly review artifacts like video demos of features and research reports rather than raw code diffs, making iteration tractable despite agents producing 3-4x more code than synchronous methods.

🤖 Multi-Agent Architecture 3 insights

Solving the train-test time mismatch

Single agents fail on multi-million token trajectories due to RL training limits, necessitating hierarchical systems where a main planner delegates to sub-agents handling shorter, in-distribution tasks.

Model specialization by capability

Cursor's architecture uses OpenAI models for high-level orchestration while deploying Gemini and Anthropic models for multimodal tasks like computer use and UI generation.

Optimized inference for sub-tasks

Delegated sub-agent tasks often require smaller, faster models rather than frontier models, delivering equivalent performance with significantly reduced latency and cost.

🚗 The Self-Driving Codebase 3 insights

Autonomous self-healing systems

Event-driven automations allow agents to fix issues from error trackers or pager alerts and merge code without human review, with the goal of agents becoming primary on-call responders.

Full-project generation capability

A one-week experiment building a functional web browser consumed billions of tokens and tens of thousands of dollars in compute, demonstrating the feasibility of zero-intervention development for complex software.

Proactive infrastructure monitoring

Agents continuously monitor ML training runs via weights and biases logs to catch degradation and prevent crashes before humans are alerted.

Bottom Line

Engineering teams should prepare for self-driving codebases by adopting cloud-based async agents with robust multi-agent orchestration and shifting review workflows from code inspection to artifact validation.

More from NVIDIA AI Podcast

View all
Build Video Analytics AI Agents with Skills
59:53
NVIDIA AI Podcast NVIDIA AI Podcast

Build Video Analytics AI Agents with Skills

NVIDIA introduces the Video Search and Summarization (VSS) blueprint for building vision AI agents that process billions of camera streams using vision language models and a new 'skills' framework, enabling deep video search and summarization 60x faster than manual review.

15 days ago · 9 points
Ask the Experts: Nemotron 3 Nano Omni | Nemotron Labs
48:56
NVIDIA AI Podcast NVIDIA AI Podcast

Ask the Experts: Nemotron 3 Nano Omni | Nemotron Labs

NVIDIA researchers detail the development of Nemotron 3 Nano Omni, explaining how they evolved a text-only model into a multimodal system capable of processing vision, audio, and video through progressive training stages while maintaining the hybrid Mamba-Transformer architecture.

16 days ago · 10 points
Apr 14 - Jetson AI Lab Research Group Call - Tensor RT Edge LLM on Jetson & Culture
51:38
NVIDIA AI Podcast NVIDIA AI Podcast

Apr 14 - Jetson AI Lab Research Group Call - Tensor RT Edge LLM on Jetson & Culture

NVIDIA researchers Lynn Chai and Luc introduce TensorRT Edge LLM, a purpose-built inference engine for deploying large language models on Jetson edge devices, showcasing NVFP4 quantization and speculative decoding techniques that achieve up to 7x faster prefill speeds and 500 tokens per second generation while previewing a simplified vLLM-style Python API coming soon.

23 days ago · 10 points
March 10 - Jetson AI Lab Research Group Call - Lightning talks
55:28
NVIDIA AI Podcast NVIDIA AI Podcast

March 10 - Jetson AI Lab Research Group Call - Lightning talks

This Jetson AI Lab Research Group call features lightning talks on open-source hardware for remote Jetson access, a real-time emotional AI engine for robots running entirely on Jetson Nano, and updates to the Jetson AI Lab model repository with new performance benchmarks and deployment guides.

23 days ago · 8 points