Replacing 12K LoC with a 200 LoC Skill — David Gomes, Cursor

| Podcasts | April 30, 2026 | 14.6 Thousand views

TL;DR

David Gomes from Cursor details how they replaced 15,000 lines of complex git work tree management code with a 200-line markdown skill using agent primitives, drastically reducing maintenance while enabling multi-repo support and flexible model comparisons, though requiring new approaches to ensure agent isolation.

📝 The Markdown Refactor 3 insights

15,000 lines reduced to 200

Cursor deleted a 12,000-15,000 line implementation managing git work trees and replaced it with a ~200 line markdown skill using existing 'Agent skills' and 'sub agents' primitives.

Prompts as executable logic

The new 'best-of-n' comparison feature shrank from 4,000 lines of code to just 40 lines of markdown instructions that orchestrate sub-agents.

Backend-controlled iteration

Features were implemented as slash commands rather than skills to allow server-side prompt updates without requiring users to update their Cursor client.

⚙️ Technical Architecture 3 insights

Sub-agent orchestration

The parent agent spawns isolated sub-agents in separate git work trees purely through instructions, enabling parallel execution across multiple models like GPT, Claude, and Grok simultaneously.

Cross-platform prompt engineering

The skill includes specific instructions for Windows, Linux, and macOS while handling user-configured setup scripts and aggressive prompting to prevent directory escaping.

Dynamic context loading

Unlike hardcoded features, these commands load prompts into context only when invoked, making them lightweight and modular.

⚖️ Capabilities vs. Trade-offs 4 insights

New multi-repo support

The skill-based approach enables work trees across multiple repositories simultaneously—a feature that was technically impossible in the previous hardcoded implementation.

Mid-chat flexibility

Users can now switch to work tree mode halfway through a conversation using slash commands, whereas the previous UI-based approach required setting the mode before starting.

Vibes-based isolation risks

Without physical isolation, models occasionally 'escape' their work trees during long sessions, with smaller models like Haiku deviating more frequently than Composer or Grok.

Discoverability cost

Moving from a prominent UI dropdown to slash commands makes the advanced feature harder to find, though acceptable given its power-user target audience.

🚀 Future Improvements 3 insights

Eval-driven refinement

The team is building automated evals using Braintrust to test agent isolation, revealing performance differences between models and informing prompt improvements.

RL training for Composer

Work tree isolation tasks are being added to the reinforcement learning pipeline for future Composer model versions to improve adherence to directory constraints.

Cursor 3.0 native integration

A proper native work trees implementation is planned for the new Cursor 3.0 agent window, alongside exploration of non-git parallelization primitives to reduce disk usage.

Bottom Line

Complex software features can be replaced with lightweight LLM prompts using existing primitives, but this architecture requires robust evaluation frameworks and model-specific training to ensure reliability without traditional code-based guardrails.

More from AI Engineer

View all
LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize
AI Engineer AI Engineer

LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize

Dat Ngo from Arize AI explains how modern AI systems require reimagined observability and evaluation patterns built on OpenTelemetry to manage non-deterministic agents, emphasizing that the future of AI engineering lies in automated experimentation flywheels that eliminate manual dashboard work.

12 days ago · 9 points
Text Diffusion — Brendon Dillon, Google DeepMind
AI Engineer AI Engineer

Text Diffusion — Brendon Dillon, Google DeepMind

Google DeepMind researcher Brendon Dillon explains text diffusion as a parallel alternative to autoregressive language models that iteratively denoises random tokens rather than generating sequentially, offering significantly lower latency and unique capabilities like self-correction and adaptive computation, though currently limited by high serving costs for large batches.

15 days ago · 8 points