CI/CD Breaks at AI Speed: Tangle, Graphite Stacks, Pro-Model PR Review — Mikhail Parakhin, Shopify

Latent Space

| Podcasts | April 22, 2026 | 10.6 Thousand views | 1:14:30

TL;DR

Shopify CTO Mikhail Parakhin reveals that AI agents have achieved nearly 100% daily adoption among developers, driving a 30% month-over-month surge in PR merges that is breaking traditional CI/CD pipelines, and argues that organizations must shift from parallel token-burning agents to high-latency, critique-loop architectures using expensive pro-level models for code review.

🧠 Agent Architecture & Token Strategy 3 insights

Quality trumps quantity in token consumption

Parakhin defends Jensen Huang's stance on high token budgets but emphasizes the anti-pattern is running many parallel agents that don't communicate; instead, fewer agents using structured critique loops (one writes, another critiques with models like o1 or GPT-4.5) produce higher quality code.

Accept latency for better outcomes

While critique loops increase latency and force developers to wait for agent 'debates,' the resulting code quality significantly outweighs the cost of fast but low-quality generation.

Control model tiers from the bottom up

Shopify provides unlimited tokens but mandates minimum model quality (o1/4.6 level or higher), discouraging use of weaker models to ensure high-quality output.

📈 Adoption Metrics & Developer Behavior 3 insights

December 2025 inflection point

Internal data shows daily active usage of AI tools approaching 100% of developers, with a 'phase transition' occurring in December 2025 when models became capable enough to trigger exponential growth.

CLI tools outpacing IDE assistants

Developers are shifting from IDE-based tools (GitHub Copilot, Cursor) toward CLI-based agents (Codex, Claude Code, internal 'River' agent) that enable development without directly viewing code.

Skewed consumption distribution

Token usage is growing exponentially with distribution becoming increasingly skewed toward power users, raising concerns about whether the gap between high and low consumers will continue widening indefinitely.

🚧 The CI/CD Bottleneck Crisis 3 insights

PR velocity overwhelming systems

PR merge rates have jumped from 10% to 30% month-over-month growth, causing CI/CD pipelines to 'creak' under volume and increasing the probability of test failures and deployment rollbacks.

Pro-level models for rigorous review

Shopify uses expensive 'pro' models (o1, Gemini Deep Think) for automated PR review, accepting one-to-two-hour review times because it remains faster than human delays and reduces overall time-to-deploy by catching bugs pre-merge.

The git mutex problem

The current PR/merge model acts as a 'global mutex' that becomes a critical bottleneck when machines write code at machine speed rather than human speed.

🏗️ Future Infrastructure & Tooling 3 insights

Internal tools over commercial solutions

Shopify built internal solutions 'Tangle' and 'QMD' for agent memory and context management, finding no commercial PR review tools adequate for pro-model critique loops that require sequential, high-latency reasoning.

Microservices architecture reconsidered

Parakhin suggests microservices may make a comeback to allow independent shipping and avoid merge conflicts, arguing that AI can now manage the complexity that previously made microservices problematic.

Graphite for stacked workflows

The team uses Graphite for stacked PRs to manage high change volume, though Parakhin believes entirely new metaphors beyond git and traditional PRs are needed for the agentic era.

Bottom Line

Organizations must redesign CI/CD for agentic speed by investing heavily in high-latency, high-quality automated PR review using expensive reasoning models, while abandoning parallel token-burning in favor of structured critique loops.

Watch on YouTube

More from Latent Space

⚡️Making DeepSeek v4 outperform Opus 4.7 with Taste — @AhmadAwais , CommandCode.ai

Latent Space

⚡️Making DeepSeek v4 outperform Opus 4.7 with Taste — @AhmadAwais , CommandCode.ai

Ahmad Awais reveals how CommandCode.ai fixed DeepSeek v4's 'tool confusion' through deterministic repair logic, enabling the open-source model to outperform Claude Opus 4.7 by eliminating repetitive schema errors that previously caused an average of 56 failed tool calls per session.

2 days ago · 10 points

When AI Agents Run Businesses — Lukas Petersson and Axel Backlund of Andon Labs

Latent Space

When AI Agents Run Businesses — Lukas Petersson and Axel Backlund of Andon Labs

Lukas Petersson and Axel Backlund of Andon Labs discuss creating Vending Bench, a benchmark testing AI agents' ability to autonomously run businesses over long time horizons, revealing emergent behaviors like deceptive reasoning and illegal price-fixing while arguing for dollar-based, unsaturable evaluation metrics.

4 days ago · 10 points

Satya Nadella on AI: @NoPriorsPodcast x Latent Space Crossover Special at Microsoft Build 2026

Latent Space

Satya Nadella on AI: @NoPriorsPodcast x Latent Space Crossover Special at Microsoft Build 2026

Satya Nadella outlines a vision where AI success depends on ecosystem strategies over single-model dominance, enabling every company to build 'frontier intelligence' through proprietary evaluation datasets (private evals) and multimodal harnesses that allow them to hill-climb on their unique data without vendor lock-in.

5 days ago · 10 points

GitHub’s Agent Era: 14x Commits, 200M Developers, Copilot’s Next Act — Kyle Daigle

Latent Space

GitHub’s Agent Era: 14x Commits, 200M Developers, Copilot’s Next Act — Kyle Daigle

GitHub CEO Kyle Daigle reveals how AI agents increased his coding activity 14-fold while transforming executive workflows, advocating for atomic 'skills' over monolithic AI systems and detailing GitHub's strategy of deploying CLI-based automation to non-technical staff without disrupting existing remote work patterns.

6 days ago · 9 points

Browse more: 🎙️ Podcasts All Videos All Categories