Mergeable by default: Building the context engine to save time and tokens — Peter Werry, Unblocked
TL;DR
Peter Werry argues that as AI agents move toward autonomous 'YOLO mode' execution, simple RAG and MCP connections fail to provide adequate organizational context, creating bottlenecks and 'satisfaction of search' failures where agents stop at superficial answers instead of understanding the historical 'why' behind code decisions.
🔄 The Human Bottleneck Problem 3 insights
Humans become cognitive bottlenecks
With parallel agents and YOLO mode execution, engineers cannot manage the context switching required to manually feed information to multiple simultaneous background processes.
Organizational knowledge requires battle scars
True context includes institutional memory of incidents, outages, and historical decisions—not just current code state—enabling agents to understand why systems work the way they do.
Background agents are inevitable
As code intelligence reaches exponential improvement, the limiting factor becomes context delivery, requiring engines that can operate autonomously without human intermediaries.
❌ Three Myths of Context Provision 3 insights
Naive RAG causes satisfaction of search
Simple vector search leads agents to stop at first findings (like radiologists missing secondary issues), overlooking critical context buried in Slack threads or incident reports.
MCP connections lack understanding
Wiring up data sources provides access but fails to reveal relationships between systems, historical motivations for changes, or the reasons behind architectural decisions.
Larger context windows don't solve reasoning
Even million-token windows cannot fit entire organizational contexts, and size doesn't help agents determine truth versus outdated information or reason across disparate sources.
🏗️ Building a True Context Engine 3 insights
Prevent satisfaction of search
Engines must surface previously rejected solutions, analyze deletion history, and understand user intent rather than stopping at the first compiling code solution.
Resolve conflicts beyond recency
Truth determination requires identifying contradictions between documentation and code, recognizing that main branch isn't always the future source of truth, and learning from user corrections.
Enforce access controls at the core
Context engines must respect permissions like private Slack channels, ensuring sensitive information only surfaces for authorized users while maintaining strict privacy boundaries.
Bottom Line
Organizations must build context engines that resolve conflicts between data sources, preserve institutional knowledge of past failures, and enforce access controls to prepare for autonomous background agents that understand the 'why' behind decisions, not just the 'what' of current code.
More from AI Engineer
View all
Training an LLM from Scratch, Locally — Angelos Perivolaropoulos, ElevenLabs
Angelos Perivolaropoulos from ElevenLabs demonstrates how to train a GPT-2 style language model from scratch using only PyTorch and minimal dependencies, revealing that modern LLM development relies 80% on training methodology and optimization rather than architectural novelty.
Skill Issue: How We Used AI to Make Agents Actually Good at Supabase — Pedro Rodrigues, Supabase
Pedro Rodrigues from Supabase details how structured 'skills'—markdown-based instruction sets with progressive disclosure—dramatically improve AI agent performance with complex products, distinguishing them from MCP tools and establishing an evaluation-driven development framework for systematic testing.
Ralph Loops: Build Dumb AI Loops That Ship — Chris Parsons, Cherrypick
Chris Parsons introduces 'Ralph Loops'—a minimalist automation approach where repeatedly prompting an AI agent with the same task outperforms complex orchestration workflows, leveraging the model's self-correction to ship better code with less maintenance.
TLMs: Tiny LLMs and Agents on Edge Devices with LiteRT-LM — Cormac Brick, Google
Cormac Brick from Google AI Edge introduces Tiny LLMs (TLMs) and on-device agent capabilities powered by LiteRT-LM and the new Gemma 4 models, demonstrating how fine-tuned small models (100M-4B parameters) can now deliver sophisticated AI experiences—including multimodal reasoning and tool use—directly on mobile phones, laptops, and even Raspberry Pis without cloud dependency.