Beyond Bigger Models: Recursion As The Next Scaling Law In AI
TL;DR
Recursion at inference time—rather than simply scaling model size—may be the next breakthrough in AI reasoning. Recent research on Hierarchical Reasoning Models (HRM) and Tiny Recursive Models (TRM) demonstrates that recursive architectures using shared weights can solve complex reasoning benchmarks like Arc Prize with minimal parameters, outperforming massive traditional LLMs.
⚠️ The Fundamental Flaw in Modern LLMs 2 insights
One-shot processing hits theoretical limits
Transformers process inputs in parallel without iterative compression, making them theoretically incapable of solving incompressible problems like sorting or Sudoku in a single pass due to computational lower bounds (e.g., the n log n comparison limit for sorting).
Memory without compression
Unlike RNNs which compress information into hidden states, LLMs must retain the entire input context (equivalent to a full novel) for every token generation, lacking the latent reasoning capabilities inherent in recursive models.
🧩 Hierarchical Reasoning Models (HRM) 2 insights
Tiny model beats massive LLMs on reasoning
HRM uses only 27 million parameters trained on just 1,000 Arc Prize puzzles with no pre-training, achieving approximately 70% accuracy on Arc Prize 1 where OpenAI's O3 scored 0%.
Three-level hierarchical recursion
The architecture applies identical weights recursively across three nested loops: low-level high-frequency processing, high-level low-frequency processing, and outer refinement steps, inspired by brain wave patterns operating at different frequencies.
⚡ Training Breakthroughs and TRM 2 insights
Bypassing backprop through time
HRM uses Deep Equilibrium (DEQ) learning with stop gradients, treating different hidden state 'carry' values as separate batch samples rather than backpropagating through all recursive steps, avoiding the vanishing gradient problems that plagued traditional RNNs.
TRM challenges the DEQ assumption
The Tiny Recursive Models paper reveals that HRM's DEQ math is insufficient to explain its performance, demonstrating that backpropagating through the full deep recursion actually improves results significantly.
🧬 Bio-Inspiration vs. Computational Reality 2 insights
Brain waves inspire architecture
HRM's hierarchical frequency approach draws from neuroscience observations that different brain regions operate at different frequencies, though the actual optimization mechanism may differ from biological processes.
Bio-plausibility as inspiration, not constraint
While biological analogies help generate ideas, computationally efficient solutions often diverge from bio-plausible mechanisms, as seen in the evolution from AlexNet's bio-inspired features to simpler, deeper architectures that run better on GPUs.
Bottom Line
Prioritize inference-time recursion and hierarchical reasoning over raw parameter count—using shared-weight iterative processing with variable scoping achieves superior reasoning performance with minimal training data and compute.
More from Y Combinator
View all
How to Build the Future: Demis Hassabis
Demis Hassabis predicts AGI by around 2030 and argues that while current large-scale pre-training and reinforcement learning form the foundation, breakthroughs in continual learning, memory consolidation, and introspective reasoning are still required to achieve true artificial general intelligence.
The $9B startup that wants to create a billion new developers
Replit CEO Amjad Msad explains how the $9 billion startup evolved from a browser IDE into an AI-native 'vibe coding' platform that eliminates traditional coding entirely, enabling non-technical domain experts to build production-grade software through natural language and visual interfaces.
How Stripe Built Their New Website
Stripe Head of Design Katie Dill details the year-long process behind rebuilding Stripe's homepage after six years, shifting from a static payment-processor narrative to an interactive manifesto showcasing their full financial infrastructure platform through intentional UX, progressive disclosure, and meticulous animation craftsmanship.
Robots Are Finally Starting to Work
Physical Intelligence co-founder Quan Vang explains why robotics is approaching its 'GPT-1 moment,' where cross-embodiment AI models trained on diverse hardware are beginning to exhibit emergent zero-shot capabilities and scaling laws previously unseen in the field.