CI/CD Breaks at AI Speed: Tangle, Graphite Stacks, Pro-Model PR Review — Mikhail Parakhin, Shopify

| Podcasts | April 22, 2026 | 4.28 Thousand views | 1:14:30

TL;DR

Shopify CTO Mikhail Parakhin reveals that AI agents have achieved nearly 100% daily adoption among developers, driving a 30% month-over-month surge in PR merges that is breaking traditional CI/CD pipelines, and argues that organizations must shift from parallel token-burning agents to high-latency, critique-loop architectures using expensive pro-level models for code review.

🧠 Agent Architecture & Token Strategy 3 insights

Quality trumps quantity in token consumption

Parakhin defends Jensen Huang's stance on high token budgets but emphasizes the anti-pattern is running many parallel agents that don't communicate; instead, fewer agents using structured critique loops (one writes, another critiques with models like o1 or GPT-4.5) produce higher quality code.

Accept latency for better outcomes

While critique loops increase latency and force developers to wait for agent 'debates,' the resulting code quality significantly outweighs the cost of fast but low-quality generation.

Control model tiers from the bottom up

Shopify provides unlimited tokens but mandates minimum model quality (o1/4.6 level or higher), discouraging use of weaker models to ensure high-quality output.

📈 Adoption Metrics & Developer Behavior 3 insights

December 2025 inflection point

Internal data shows daily active usage of AI tools approaching 100% of developers, with a 'phase transition' occurring in December 2025 when models became capable enough to trigger exponential growth.

CLI tools outpacing IDE assistants

Developers are shifting from IDE-based tools (GitHub Copilot, Cursor) toward CLI-based agents (Codex, Claude Code, internal 'River' agent) that enable development without directly viewing code.

Skewed consumption distribution

Token usage is growing exponentially with distribution becoming increasingly skewed toward power users, raising concerns about whether the gap between high and low consumers will continue widening indefinitely.

🚧 The CI/CD Bottleneck Crisis 3 insights

PR velocity overwhelming systems

PR merge rates have jumped from 10% to 30% month-over-month growth, causing CI/CD pipelines to 'creak' under volume and increasing the probability of test failures and deployment rollbacks.

Pro-level models for rigorous review

Shopify uses expensive 'pro' models (o1, Gemini Deep Think) for automated PR review, accepting one-to-two-hour review times because it remains faster than human delays and reduces overall time-to-deploy by catching bugs pre-merge.

The git mutex problem

The current PR/merge model acts as a 'global mutex' that becomes a critical bottleneck when machines write code at machine speed rather than human speed.

🏗️ Future Infrastructure & Tooling 3 insights

Internal tools over commercial solutions

Shopify built internal solutions 'Tangle' and 'QMD' for agent memory and context management, finding no commercial PR review tools adequate for pro-model critique loops that require sequential, high-latency reasoning.

Microservices architecture reconsidered

Parakhin suggests microservices may make a comeback to allow independent shipping and avoid merge conflicts, arguing that AI can now manage the complexity that previously made microservices problematic.

Graphite for stacked workflows

The team uses Graphite for stacked PRs to manage high change volume, though Parakhin believes entirely new metaphors beyond git and traditional PRs are needed for the agentic era.

Bottom Line

Organizations must redesign CI/CD for agentic speed by investing heavily in high-latency, high-quality automated PR review using expensive reasoning models, while abandoning parallel token-burning in favor of structured critique loops.

More from Latent Space

View all
Extreme Harness Engineering for the 1B token/day Dark Factory — Ryan Lopopolo, OpenAI Frontier
1:17:54
Latent Space Latent Space

Extreme Harness Engineering for the 1B token/day Dark Factory — Ryan Lopopolo, OpenAI Frontier

Ryan Lopopolo reveals how OpenAI's Frontier team built a 'Dark Factory' processing 1 billion tokens daily, generating over 1 million lines of code from zero human-written code in 5 months. By treating human attention as the only scarce resource and enforcing strict constraints like sub-minute builds, the team shifted from manual coding to autonomous agents that write, review, and merge their own code.

17 days ago · 10 points