🔬 "The Most Innovative Diffusion Research Is Happening in Drug Discovery, Not Image Generation"
TL;DR
Evan Fineberg and Sergey Udov of Genesis Molecular AI discuss how diffusion models have pivoted from image generation to drive breakthroughs in 3D protein structure prediction. They detail how their Pearl model applies LLM-style scaling strategies—including synthetic physics-based training data and inference-time 'thinking'—to solve the historically intractable challenge of predicting how small molecules bind to proteins.
🧬 Diffusion's Pivot from Images to Drug Discovery 2 insights
Drug discovery drives diffusion innovation
The most cutting-edge diffusion research now occurs in 3D molecular structure prediction rather than image generation, establishing drug discovery as the new frontier for generative AI primitives.
GANs failed for molecular applications
While generative adversarial networks showed promise for images in 2017-2018, they proved ineffective for proteins and drug discovery, with diffusion models providing the necessary computational primitive for the space.
⚡ Scaling Molecular AI Like Large Language Models 3 insights
Three-stage scaling adapted from LLMs
Genesis applies pre-training scaling, post-training refinement, and inference-time computing to molecular models, mirroring the development path of Llama and other language models.
Synthetic data via physics simulations
Unlike proteins, small molecules can be accurately modeled with physics simulations, allowing Genesis to generate unlimited synthetic training data beyond the limited 200,000 structures in public databases.
Inference-time thinking in structural space
Pearl uses 'thinking tokens' adapted to molecular coordinates, allowing the model to iteratively refine crystal structure representations with physics-based guidance during inference.
🔑 Solving the Protein-Ligand Binding Challenge 2 insights
Cracking the 'key and lock' problem
Pearl predicts 3D coordinates of protein-small molecule complexes with sufficient accuracy to determine binding potency, overcoming a barrier that previously required expensive, months-long laboratory experiments.
Navigating 10^60 molecular possibilities
The model addresses the vast search space of drug-like molecules by using diffusion to move beyond pattern matching toward generalizable physical predictions for previously undruggable targets.
Bottom Line
Apply LLM scaling strategies—specifically synthetic data generation and inference-time computing—to domain-specific physics problems, as demonstrated by using diffusion models with physics-guided 'thinking' to predict molecular binding and unlock new drug discovery targets.
More from Latent Space
View all
Cooking with OpenAI’s Research Chief: AGI, o1, Evals, and Scaling Laws — Mark Chen
OpenAI Chief Research Officer Mark Chen discusses the company's research philosophy while cooking Korean tofu stew, emphasizing that scaling laws remain robust, reinforcement learning excels in objective domains, and successful research organizations balance top-down vision with bottom-up conviction.
The Agent Cloud: Databricks’ Bet on the Future of AI — Matei Zaharia and Reynold Xin
Matei Zaharia and Reynold Xin detail Databricks' open-source 'Agent Cloud' platform (Omnigen), arguing that standardized protocols and persistent infrastructure—not just better models—will determine which enterprises successfully deploy collaborative, secure AI agents at scale.
AI Security After Codex and Claude Code — Zico Kolter & Matt Fredrikson, Gray Swan
Gray Swan co-founders Zico Kolter and Matt Fredrikson explain why AI systems require a fundamentally different security approach than traditional software, highlighting how their automated red teaming system 'Shade' has begun to outperform human experts at finding model vulnerabilities. They emphasize the urgent need to treat AI agents as inherently untrusted entities capable of correlated failures across the software ecosystem.
⚡️Every product of the future will be a living system — Ronak Malde, Trajectory.ai
Ronak Malde explains leaving DeepMind (and $2 billion in acquisition earnings) to found Trajectory.ai, arguing that AI products must evolve from static tools into "living systems" that continually learn from real-world user corrections across enterprise verticals like legal and finance.