CI/CD Breaks at AI Speed: Tangle, Graphite Stacks, Pro-Model PR Review — Mikhail Parakhin, Shopify
TL;DR
Shopify CTO Mikhail Parakhin reveals that AI agents have achieved nearly 100% daily adoption among developers, driving a 30% month-over-month surge in PR merges that is breaking traditional CI/CD pipelines, and argues that organizations must shift from parallel token-burning agents to high-latency, critique-loop architectures using expensive pro-level models for code review.
🧠 Agent Architecture & Token Strategy 3 insights
Quality trumps quantity in token consumption
Parakhin defends Jensen Huang's stance on high token budgets but emphasizes the anti-pattern is running many parallel agents that don't communicate; instead, fewer agents using structured critique loops (one writes, another critiques with models like o1 or GPT-4.5) produce higher quality code.
Accept latency for better outcomes
While critique loops increase latency and force developers to wait for agent 'debates,' the resulting code quality significantly outweighs the cost of fast but low-quality generation.
Control model tiers from the bottom up
Shopify provides unlimited tokens but mandates minimum model quality (o1/4.6 level or higher), discouraging use of weaker models to ensure high-quality output.
📈 Adoption Metrics & Developer Behavior 3 insights
December 2025 inflection point
Internal data shows daily active usage of AI tools approaching 100% of developers, with a 'phase transition' occurring in December 2025 when models became capable enough to trigger exponential growth.
CLI tools outpacing IDE assistants
Developers are shifting from IDE-based tools (GitHub Copilot, Cursor) toward CLI-based agents (Codex, Claude Code, internal 'River' agent) that enable development without directly viewing code.
Skewed consumption distribution
Token usage is growing exponentially with distribution becoming increasingly skewed toward power users, raising concerns about whether the gap between high and low consumers will continue widening indefinitely.
🚧 The CI/CD Bottleneck Crisis 3 insights
PR velocity overwhelming systems
PR merge rates have jumped from 10% to 30% month-over-month growth, causing CI/CD pipelines to 'creak' under volume and increasing the probability of test failures and deployment rollbacks.
Pro-level models for rigorous review
Shopify uses expensive 'pro' models (o1, Gemini Deep Think) for automated PR review, accepting one-to-two-hour review times because it remains faster than human delays and reduces overall time-to-deploy by catching bugs pre-merge.
The git mutex problem
The current PR/merge model acts as a 'global mutex' that becomes a critical bottleneck when machines write code at machine speed rather than human speed.
🏗️ Future Infrastructure & Tooling 3 insights
Internal tools over commercial solutions
Shopify built internal solutions 'Tangle' and 'QMD' for agent memory and context management, finding no commercial PR review tools adequate for pro-model critique loops that require sequential, high-latency reasoning.
Microservices architecture reconsidered
Parakhin suggests microservices may make a comeback to allow independent shipping and avoid merge conflicts, arguing that AI can now manage the complexity that previously made microservices problematic.
Graphite for stacked workflows
The team uses Graphite for stacked PRs to manage high change volume, though Parakhin believes entirely new metaphors beyond git and traditional PRs are needed for the agentic era.
Bottom Line
Organizations must redesign CI/CD for agentic speed by investing heavily in high-latency, high-quality automated PR review using expensive reasoning models, while abandoning parallel token-burning in favor of structured critique loops.
More from Latent Space
View all
🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik
Noetik is tackling the 95% failure rate of cancer clinical trials by training transformers on proprietary multimodal patient tumor data to identify hidden biological subtypes and match therapies to responsive populations, moving beyond simplistic biomarkers and outdated cell lines.
Notion’s Sarah Sachs & Simon Last on Custom Agents, Evals, and the Future of Work
Notion's AI leads Sarah Sachs and Simon Last detail their three-year journey to launch custom agents, revealing how they navigated premature model capabilities, built a culture of radical iteration, and balance immediate utility with forward-looking bets on software factories and MCP integration.
⚡️ The best engineers don't write the most code. They delete the most code. — Stay Sassy
The Stay SaaSy crew explains how AI consumption-based pricing is forcing companies to manage individual employee token budgets like departmental budgets, creating complex ROI calculations and flipping traditional build-vs-buy economics as engineering costs shift from headcount to compute.
Extreme Harness Engineering for the 1B token/day Dark Factory — Ryan Lopopolo, OpenAI Frontier
Ryan Lopopolo reveals how OpenAI's Frontier team built a 'Dark Factory' processing 1 billion tokens daily, generating over 1 million lines of code from zero human-written code in 5 months. By treating human attention as the only scarce resource and enforcing strict constraints like sub-minute builds, the team shifted from manual coding to autonomous agents that write, review, and merge their own code.