🔬 The Bitter Lesson is Coming for Proteins - Alex Rives, BioHub
TL;DR
Alex Rives demonstrates how the 'bitter lesson' of AI scaling applies to protein biology, showing that massive transformer models trained on billions of evolutionary sequences develop emergent world models capable of predicting structure, function, and designing novel antibodies without prior biological knowledge.
📈 Scaling Laws & The Bitter Lesson 3 insights
Scaling drives capability emergence in protein models
Training transformers on billions of protein sequences follows the same empirical scaling laws as language models, with new biological prediction capabilities emerging as model size increases by orders of magnitude.
Evolution provides the training signal
The model learns by predicting masked amino acids across evolutionary sequences, capturing the constraints that govern protein structure and function without explicit biological priors.
Proteins follow distributional patterns
Like words in language, amino acids appear in context sets determined by underlying biological constraints, allowing statistical pattern recognition to reveal hidden variables of protein biology.
🧬 ESMC Cambrian Architecture 3 insights
6B parameter world model open-sourced
ESMC represents a fourth-generation protein language model released under MIT license, trained on 6.8 billion sequences and predicting structures for 1.1 billion representative protein clusters.
Inverse design enables protein engineering
The model functions as a searchable world model that can be inverted to design novel proteins including antibodies and single-chain variable fragments (scFVs) meeting specific functional criteria.
Comprehensive coverage of protein space
The database clusters sequences at 70% identity to resolve structures covering the full diversity of known proteins, adding hundreds of millions of new structural predictions to scientific knowledge.
🔍 Emergent Biological Understanding 3 insights
Hierarchical features mirror biological knowledge
Sparse autoencoder analysis reveals the model developed a hierarchical feature space corresponding to biochemical properties, structural motifs, and functional themes discovered through centuries of biological research.
Cross-evolutionary pattern recognition
The model identifies functional relationships like the nucleophilic elbow across evolutionarily distant protein families, using shared latent variables to represent convergent biological solutions.
Compression drives biological insight
To solve the complex task of predicting masked amino acids across diverse contexts, the model develops hidden variables representing fundamental constraints like structural contacts and functional requirements.
Bottom Line
Train massive transformer models on evolutionary protein sequence data at scale to unlock emergent world models capable of predicting structure, function, and designing novel therapeutic proteins without requiring handcrafted biological priors.
More from Latent Space
View all
AI Agents Need Computers: 74% MoM Growth, 850K/Day Runs, & New Agent Cloud — Ivan Burazin, Daytona
Daytona CEO Ivan Burazin explains their pivot from human developer environments to 'composable computers' for AI agents, revealing how bare-metal, stateful infrastructure (not ephemeral VMs) unlocked 74% month-over-month growth and massive enterprise demand.
The Agent-Native Cloud: 3M Users, 100K Signups/Wk, Data Centers, & Death PRs — Jake Cooper, Railway
Jake Cooper, founder of Railway, explains how the 'agent-native cloud' hit 3 million users and 100,000 weekly signups by betting that manual coding is obsolete, detailing their journey from a $500K/month free tier loss to bare metal infrastructure ownership.
The Next War Is Already Here — Yaroslav Azhnyuk, The Fourth Law & Noah Smith, Noahpinion
Yaroslav Azhnyuk, former pet-tech founder turned defense entrepreneur, explains how The Fourth Law is building AI-powered autonomous drones to defend Ukraine, arguing that software-defined warfare and mass manufacturing scale have fundamentally rewritten the rules of military power.
Inside Abridge: The AI Listening to 100 Million Doctor Visits — Abridge's Janie Lee & Chai Asawa
Abridge is transforming from an AI documentation tool into a comprehensive clinical intelligence layer that uses ambient listening and deep EHR integration to deliver proactive decision support, aiming to eliminate physician burnout while catching critical clinical and administrative issues before the patient leaves the room.