Beyond Bigger Models: Recursion As The Next Scaling Law In AI
TL;DR
Recursion at inference time—rather than simply scaling model size—may be the next breakthrough in AI reasoning. Recent research on Hierarchical Reasoning Models (HRM) and Tiny Recursive Models (TRM) demonstrates that recursive architectures using shared weights can solve complex reasoning benchmarks like Arc Prize with minimal parameters, outperforming massive traditional LLMs.
⚠️ The Fundamental Flaw in Modern LLMs 2 insights
One-shot processing hits theoretical limits
Transformers process inputs in parallel without iterative compression, making them theoretically incapable of solving incompressible problems like sorting or Sudoku in a single pass due to computational lower bounds (e.g., the n log n comparison limit for sorting).
Memory without compression
Unlike RNNs which compress information into hidden states, LLMs must retain the entire input context (equivalent to a full novel) for every token generation, lacking the latent reasoning capabilities inherent in recursive models.
🧩 Hierarchical Reasoning Models (HRM) 2 insights
Tiny model beats massive LLMs on reasoning
HRM uses only 27 million parameters trained on just 1,000 Arc Prize puzzles with no pre-training, achieving approximately 70% accuracy on Arc Prize 1 where OpenAI's O3 scored 0%.
Three-level hierarchical recursion
The architecture applies identical weights recursively across three nested loops: low-level high-frequency processing, high-level low-frequency processing, and outer refinement steps, inspired by brain wave patterns operating at different frequencies.
⚡ Training Breakthroughs and TRM 2 insights
Bypassing backprop through time
HRM uses Deep Equilibrium (DEQ) learning with stop gradients, treating different hidden state 'carry' values as separate batch samples rather than backpropagating through all recursive steps, avoiding the vanishing gradient problems that plagued traditional RNNs.
TRM challenges the DEQ assumption
The Tiny Recursive Models paper reveals that HRM's DEQ math is insufficient to explain its performance, demonstrating that backpropagating through the full deep recursion actually improves results significantly.
🧬 Bio-Inspiration vs. Computational Reality 2 insights
Brain waves inspire architecture
HRM's hierarchical frequency approach draws from neuroscience observations that different brain regions operate at different frequencies, though the actual optimization mechanism may differ from biological processes.
Bio-plausibility as inspiration, not constraint
While biological analogies help generate ideas, computationally efficient solutions often diverge from bio-plausible mechanisms, as seen in the evolution from AlexNet's bio-inspired features to simpler, deeper architectures that run better on GPUs.
Bottom Line
Prioritize inference-time recursion and hierarchical reasoning over raw parameter count—using shared-weight iterative processing with variable scoping achieves superior reasoning performance with minimal training data and compute.
More from Y Combinator
View all
Groww: If Your Customers Don't Love It or Hate It, You've Already Lost
Groww founder Lalit Keshre shares how pivoting from a failed robo-advisor to a transparent investment marketplace enabled generational consumer fintech growth through obsessive customer focus, extreme product reactions, and delayed monetization.
5 Papers That Show Where AI Research Is Heading Right Now
Researchers argue that achieving AGI requires moving beyond human-generated training data toward AlphaZero-style self-play methods, while highlighting critical unsolved challenges in learning efficiency per sample and per watt. A detailed presentation demonstrates that protein biology models now follow the same predictable scaling laws as language models, with the ESMC model showing continuous improvement when trained on 2.8 billion sequences compared to previous plateaus at 50 million.
How Meesho Became India’s Biggest Shopping App
Meesho founder Vidit Aatrey details how the company pivoted from a failed local shopping app to India's largest e-commerce platform with 250 million users, achieving product-market fit by empowering WhatsApp-based resellers and focusing on value-conscious consumers in 'mass India.'
The CEO Must Be the Chief AI Officer
Brex CEO Pedro Franchesci argues that CEOs must personally serve as Chief AI Officers to transform their companies, shifting from treating AI as rigid, expensive tools (Foxconn factories) to autonomous 'virtual employees' (Eselin Institute) secured via network-layer controls, while overcoming conservative token consumption mindsets to unlock 10x productivity.