Cooking with OpenAI’s Research Chief: AGI, o1, Evals, and Scaling Laws — Mark Chen
TL;DR
OpenAI Chief Research Officer Mark Chen discusses the company's research philosophy while cooking Korean tofu stew, emphasizing that scaling laws remain robust, reinforcement learning excels in objective domains, and successful research organizations balance top-down vision with bottom-up conviction.
🎯 Breaking Into AI Research 3 insights
Replicate papers to build research taste
Chen advises aspiring researchers to replicate papers they admire down to exact training curves and loss metrics, as this reveals hidden techniques and develops intuition better than formal credentials.
Trading skills transfer to research
His quantitative trading background provided valuable experience in "brutal hard optimization" and attention to detail in environments where you cannot cheat the metrics.
PhDs are optional
OpenAI prioritizes creative problem-solving over formal credentials, actively hiring researchers without traditional ML backgrounds and training them internally.
🧪 RL, Evaluation, and Superhuman Capabilities 3 insights
RL thrives on objective truth
Reinforcement learning excels in domains with clear right/wrong answers like math and coding but struggles in subjective fields like creative writing where evaluation criteria vary between experts.
Moving beyond human benchmarks
As models saturate standardized tests like the IMO, evaluation shifts toward real-world research capabilities, novel theorem discovery, and generating cross-domain insights that exceed top human experts.
Coding tests real-world capability
Chen identifies software engineering as the critical domain for testing long-horizon, high-context learning in practical settings.
📊 Scaling Laws and Research Conviction 3 insights
Scaling laws remain valid
Chen strongly rejects "pre-training is dead" narratives, noting that scaling laws have persisted across nearly ten orders of magnitude and will continue through improved engineering and data techniques.
Betting on reasoning required conviction
Developing the o1 model faced internal resistance because the existing pre-training plus post-training paradigm was already effective, requiring significant vision to pivot toward reasoning architectures.
Stable roadmap, flexible execution
OpenAI maintains consistent high-level research pillars (pre-training, RL, alignment) but reassesses implementation details and compute allocation every one to two months.
🏗️ Managing Research Teams 3 insights
Meritocratic leadership
Research managers are selected from the organization's best researchers, enabling effective top-down steering through earned credibility rather than hierarchical authority.
Embrace being wrong
The culture encourages bottom-up innovation where any researcher can redirect the roadmap by presenting compelling evidence that challenges leadership assumptions.
Directive compute allocation
The company focuses resources on three to five major bets per organization while preserving flexible compute pools for emergent ideas from individual researchers.
Bottom Line
Organizations should maintain unwavering conviction in high-level scaling principles while remaining organizationally flexible to pivot execution strategies when researchers present compelling evidence.
More from Latent Space
View all
The Agent Cloud: Databricks’ Bet on the Future of AI — Matei Zaharia and Reynold Xin
Matei Zaharia and Reynold Xin detail Databricks' open-source 'Agent Cloud' platform (Omnigen), arguing that standardized protocols and persistent infrastructure—not just better models—will determine which enterprises successfully deploy collaborative, secure AI agents at scale.
AI Security After Codex and Claude Code — Zico Kolter & Matt Fredrikson, Gray Swan
Gray Swan co-founders Zico Kolter and Matt Fredrikson explain why AI systems require a fundamentally different security approach than traditional software, highlighting how their automated red teaming system 'Shade' has begun to outperform human experts at finding model vulnerabilities. They emphasize the urgent need to treat AI agents as inherently untrusted entities capable of correlated failures across the software ecosystem.
⚡️Every product of the future will be a living system — Ronak Malde, Trajectory.ai
Ronak Malde explains leaving DeepMind (and $2 billion in acquisition earnings) to found Trajectory.ai, arguing that AI products must evolve from static tools into "living systems" that continually learn from real-world user corrections across enterprise verticals like legal and finance.
The AI Frontier: from FLOPs to Megawatts — Anjney Midha, AMP
Anjney Midha argues that AI infrastructure is facing a crisis of inefficiency and cultural misalignment, proposing that compute be treated as a utility through an Independent System Operator model that pools multi-cloud resources while embedding community incentives directly into unit economics.