Mergeable by default: Building the context engine to save time and tokens — Peter Werry, Unblocked

AI Engineer

| Podcasts | May 03, 2026 | 7.95 Thousand views | 1:41:25

TL;DR

Peter Werry argues that as AI agents move toward autonomous 'YOLO mode' execution, simple RAG and MCP connections fail to provide adequate organizational context, creating bottlenecks and 'satisfaction of search' failures where agents stop at superficial answers instead of understanding the historical 'why' behind code decisions.

🔄 The Human Bottleneck Problem 3 insights

Humans become cognitive bottlenecks

With parallel agents and YOLO mode execution, engineers cannot manage the context switching required to manually feed information to multiple simultaneous background processes.

Organizational knowledge requires battle scars

True context includes institutional memory of incidents, outages, and historical decisions—not just current code state—enabling agents to understand why systems work the way they do.

Background agents are inevitable

As code intelligence reaches exponential improvement, the limiting factor becomes context delivery, requiring engines that can operate autonomously without human intermediaries.

❌ Three Myths of Context Provision 3 insights

Naive RAG causes satisfaction of search

Simple vector search leads agents to stop at first findings (like radiologists missing secondary issues), overlooking critical context buried in Slack threads or incident reports.

MCP connections lack understanding

Wiring up data sources provides access but fails to reveal relationships between systems, historical motivations for changes, or the reasons behind architectural decisions.

Larger context windows don't solve reasoning

Even million-token windows cannot fit entire organizational contexts, and size doesn't help agents determine truth versus outdated information or reason across disparate sources.

🏗️ Building a True Context Engine 3 insights

Prevent satisfaction of search

Engines must surface previously rejected solutions, analyze deletion history, and understand user intent rather than stopping at the first compiling code solution.

Resolve conflicts beyond recency

Truth determination requires identifying contradictions between documentation and code, recognizing that main branch isn't always the future source of truth, and learning from user corrections.

Enforce access controls at the core

Context engines must respect permissions like private Slack channels, ensuring sensitive information only surfaces for authorized users while maintaining strict privacy boundaries.

Bottom Line

Organizations must build context engines that resolve conflicts between data sources, preserve institutional knowledge of past failures, and enforce access controls to prepare for autonomous background agents that understand the 'why' behind decisions, not just the 'what' of current code.

Watch on YouTube

More from AI Engineer

Training an LLM from Scratch, Locally — Angelos Perivolaropoulos, ElevenLabs

AI Engineer

Training an LLM from Scratch, Locally — Angelos Perivolaropoulos, ElevenLabs

Angelos Perivolaropoulos from ElevenLabs demonstrates how to train a GPT-2 style language model from scratch using only PyTorch and minimal dependencies, revealing that modern LLM development relies 80% on training methodology and optimization rather than architectural novelty.

about 13 hours ago · 7 points

Skill Issue: How We Used AI to Make Agents Actually Good at Supabase — Pedro Rodrigues, Supabase

AI Engineer

Skill Issue: How We Used AI to Make Agents Actually Good at Supabase — Pedro Rodrigues, Supabase

Pedro Rodrigues from Supabase details how structured 'skills'—markdown-based instruction sets with progressive disclosure—dramatically improve AI agent performance with complex products, distinguishing them from MCP tools and establishing an evaluation-driven development framework for systematic testing.

about 15 hours ago · 10 points

Ralph Loops: Build Dumb AI Loops That Ship — Chris Parsons, Cherrypick

AI Engineer

Ralph Loops: Build Dumb AI Loops That Ship — Chris Parsons, Cherrypick

Chris Parsons introduces 'Ralph Loops'—a minimalist automation approach where repeatedly prompting an AI agent with the same task outperforms complex orchestration workflows, leveraging the model's self-correction to ship better code with less maintenance.

about 17 hours ago · 9 points

TLMs: Tiny LLMs and Agents on Edge Devices with LiteRT-LM — Cormac Brick, Google

AI Engineer

TLMs: Tiny LLMs and Agents on Edge Devices with LiteRT-LM — Cormac Brick, Google

Cormac Brick from Google AI Edge introduces Tiny LLMs (TLMs) and on-device agent capabilities powered by LiteRT-LM and the new Gemma 4 models, demonstrating how fine-tuned small models (100M-4B parameters) can now deliver sophisticated AI experiences—including multimodal reasoning and tool use—directly on mobile phones, laptops, and even Raspberry Pis without cloud dependency.

1 day ago · 10 points

Browse more: 🎙️ Podcasts All Videos All Categories