Replacing 12K LoC with a 200 LoC Skill — David Gomes, Cursor
TL;DR
David Gomes from Cursor details how they replaced 15,000 lines of complex git work tree management code with a 200-line markdown skill using agent primitives, drastically reducing maintenance while enabling multi-repo support and flexible model comparisons, though requiring new approaches to ensure agent isolation.
📝 The Markdown Refactor 3 insights
15,000 lines reduced to 200
Cursor deleted a 12,000-15,000 line implementation managing git work trees and replaced it with a ~200 line markdown skill using existing 'Agent skills' and 'sub agents' primitives.
Prompts as executable logic
The new 'best-of-n' comparison feature shrank from 4,000 lines of code to just 40 lines of markdown instructions that orchestrate sub-agents.
Backend-controlled iteration
Features were implemented as slash commands rather than skills to allow server-side prompt updates without requiring users to update their Cursor client.
⚙️ Technical Architecture 3 insights
Sub-agent orchestration
The parent agent spawns isolated sub-agents in separate git work trees purely through instructions, enabling parallel execution across multiple models like GPT, Claude, and Grok simultaneously.
Cross-platform prompt engineering
The skill includes specific instructions for Windows, Linux, and macOS while handling user-configured setup scripts and aggressive prompting to prevent directory escaping.
Dynamic context loading
Unlike hardcoded features, these commands load prompts into context only when invoked, making them lightweight and modular.
⚖️ Capabilities vs. Trade-offs 4 insights
New multi-repo support
The skill-based approach enables work trees across multiple repositories simultaneously—a feature that was technically impossible in the previous hardcoded implementation.
Mid-chat flexibility
Users can now switch to work tree mode halfway through a conversation using slash commands, whereas the previous UI-based approach required setting the mode before starting.
Vibes-based isolation risks
Without physical isolation, models occasionally 'escape' their work trees during long sessions, with smaller models like Haiku deviating more frequently than Composer or Grok.
Discoverability cost
Moving from a prominent UI dropdown to slash commands makes the advanced feature harder to find, though acceptable given its power-user target audience.
🚀 Future Improvements 3 insights
Eval-driven refinement
The team is building automated evals using Braintrust to test agent isolation, revealing performance differences between models and informing prompt improvements.
RL training for Composer
Work tree isolation tasks are being added to the reinforcement learning pipeline for future Composer model versions to improve adherence to directory constraints.
Cursor 3.0 native integration
A proper native work trees implementation is planned for the new Cursor 3.0 agent window, alongside exploration of non-git parallelization primitives to reduce disk usage.
Bottom Line
Complex software features can be replaced with lightweight LLM prompts using existing primitives, but this architecture requires robust evaluation frameworks and model-specific training to ensure reliability without traditional code-based guardrails.
More from AI Engineer
View all
Human-in-the-Loop Automation with n8n — Liam McGarrigle
Liam McGarrigle demonstrates building AI agents in n8n using visual workflows, emphasizing transparent orchestration over black-box automation through configurable memory, chat triggers, and tool integration for practical business applications.
Mastering AI Pricing: Flexible & Agile Monetization — Mayank Pant, Stripe
AI companies are growing three times faster than traditional SaaS but face unique pricing challenges due to unpredictable compute costs and razor-thin margins, requiring a shift from static subscription models to flexible hybrid pricing that prioritizes rapid iteration and customer-perceived value over technical metrics.
Shipping complex AI applications — Braintrust & Trainline
This workshop demonstrates how to bridge the gap between AI prototypes and production systems using Brain Trust's observability platform, featuring Trainline's experience deploying multi-agent AI applications serving 27 million users.
Building Conversational Agents — Thor Schaeff and Philipp Schmid, Google DeepMind
Google DeepMind engineers Thor Schaeff and Philipp Schmid demonstrate building conversational agents using the new Gemini Interactions API, a unified interface that supports both direct model inference and complex autonomous agents like Deep Research with server-side state management and asynchronous execution.