⚡️ Reverse Engineering OpenAI's Training Data — Pratyush Maini, Datology
TL;DR
Pratyush Maini from Datology demonstrates how the 'seahorse emoji' query acts as a diagnostic probe to reverse-engineer when frontier labs began injecting reasoning traces into mid-training data, revealing that self-correction capabilities have shifted from post-training additions to core foundation model ingredients.
🔍 The Seahorse Emoji Investigation 3 insights
Simple query exposes training data evolution
Asking models 'Is there a seahorse emoji?' triggers endless yes/no self-correction loops in GPT-4.1+ and Olmo 3.1, but produces short definitive answers in earlier models, serving as an unexpected probe for reasoning data integration.
Behavior timeline tracks o1 influence
The recursive self-correction phenomenon emerged between December 2024 and May 2025, approximately four months after OpenAI's o1 release, indicating rapid incorporation of reasoning traces into non-thinking model training pipelines.
Mandela effect triggers model uncertainty
The seahorse emoji question specifically elicits this behavior because internet discourse contains conflicting answers, creating the ambiguity necessary to trigger self-reflection capabilities baked into the model weights.
🧠 Reasoning in Foundation Models 3 insights
Mid-training now includes thinking traces
Using Olmo Trace to analyze open-weight models confirms that instruct variants without post-training reasoning still exhibit self-correction due to intentional addition of thinking traces during mid-training phases.
Capabilities shift from cosmetic to core
The investigation reveals a fundamental architectural shift where target capabilities like self-reflection are now embedded directly into foundation models rather than being added during post-training fine-tuning.
Single backbone requires foundation reasoning
Frontier labs now prefer unified backbones where foundation models possess reasoning ingredients necessary for downstream fine-tuning, eliminating the traditional strict separation between general pre-training and specialized post-training.
⚠️ Memorization and Benchmark Leakage 2 insights
Models regurgitate exam questions verbatim
Multiple frontier models complete exact JEE exam questions from just the first two words, indicating severe overfitting on benchmark data during final training stages across multiple epochs.
Memorization scales with size and recency
Larger models demonstrate stronger memorization, with recent models exhibiting memorization phenomena at 20B active parameters that previously required 72B, suggesting MoE architectures may route queries to memorized expert indices.
Bottom Line
Foundation model training has fundamentally shifted to bake reasoning and self-correction capabilities into mid-training data, making these behaviors core to the base architecture rather than post-training overlays.
More from Latent Space
View all
🔬There Is No AlphaFold for Materials — AI for Materials Discovery with Heather Kulik
MIT professor Heather Kulik explains how AI discovered quantum phenomena to create 4x tougher polymers and why materials science lacks an 'AlphaFold' equivalent due to missing experimental datasets, emphasizing that domain expertise remains essential to validate AI predictions in chemistry.
Dreamer: the Agent OS for Everyone — David Singleton
David Singleton introduces Dreamer as an 'Agent OS' that combines a personal AI Sidekick with a marketplace of tools and agents, enabling both non-technical users and engineers to build, customize, and deploy AI applications through natural language while maintaining privacy through centralized, OS-level architecture.
Why Anthropic Thinks AI Should Have Its Own Computer — Felix Rieseberg of Claude Cowork/Code
Anthropic's Felix Rieseberg explains why AI agents need their own virtual computers to be effective, arguing that confining Claude to chat interfaces severely limits capability. He details how this philosophy shaped Claude Cowork and why product development is shifting from lengthy planning to rapidly building multiple prototypes simultaneously.
⚡️Monty: the ultrafast Python interpreter by Agents for Agents — Samuel Colvin, Pydantic
Samuel Colvin from Pydantic introduces Monty, a Rust-based Python interpreter designed specifically for AI agents that achieves sub-microsecond execution latency by running in-process, bridging the gap between rigid tool calling and heavy containerized sandboxes.