π¬ Training Transformers to solve 95% failure rate of Cancer Trials β Ron Alfa & Daniel Bear, Noetik
TL;DR
Noetik is tackling the 95% failure rate of cancer clinical trials by training transformers on proprietary multimodal patient tumor data to identify hidden biological subtypes and match therapies to responsive populations, moving beyond simplistic biomarkers and outdated cell lines.
π― The Patient Selection Problem 3 insights
Cancer trials fail due to patient mismatch, not drug quality
Ron Alfa argues that 90-95% of oncology drug failures stem from inability to identify responsive patient subpopulations rather than pharmacological flaws in the molecules themselves.
Traditional biomarkers are too simplistic for complex biology
Current clinical methods rely on single mutations or protein stains that miss the rich multimodal patterns determining therapeutic response.
True cancer subtypes remain largely unknown
Pathology-based classifications like 'lung cancer' mask multiple distinct functional subtypes that require data-driven approaches to identify properly.
π¬ Data Generation Strategy 3 insights
Building proprietary datasets from fresh human tumors
Noetik built an in-house lab to process thousands of patient samples into spatial arrays, rejecting decades-old immortalized cell lines that fail to reflect actual human tumor biology.
Intentional dataset design beats brute force collection
Following the Protein Data Bank and ImageNet models, they curate high-quality, multimodal data at scale rather than cobbling together existing public repositories.
Imaging enables scalable, information-dense profiling
Spatial array imaging captures many patients per slide, providing rich visual biological data at significantly lower cost than sequencing runs.
π€ AI Architecture & Applications 3 insights
Self-supervised transformers learn biological subtypes unbiased
Models identify therapeutically relevant cancer subtypes directly from patient data without preconceptions about whether drivers are genetic, immune, or spatial.
Dual-use models for discovery and trial rescue
The same architecture enables reverse translation for target discovery and retrospective analysis of failed Phase 2/3 trials to design better patient cohorts.
Scaling toward generalizable cancer models
With several hundred patients per major indication, models generalize across cancer types, though Daniel Bear notes biology remains complex and requires continued data expansion.
Bottom Line
Pharmaceutical companies must abandon outdated cell lines and simplistic single-biomarker approaches in favor of training transformers on large-scale, multimodal patient tumor datasets to identify true biological subtypes and match therapies to responsive populations before initiating trials.
More from Latent Space
View all
Notionβs Sarah Sachs & Simon Last on Custom Agents, Evals, and the Future of Work
Notion's AI leads Sarah Sachs and Simon Last detail their three-year journey to launch custom agents, revealing how they navigated premature model capabilities, built a culture of radical iteration, and balance immediate utility with forward-looking bets on software factories and MCP integration.
β‘οΈ The best engineers don't write the most code. They delete the most code. β Stay Sassy
The Stay SaaSy crew explains how AI consumption-based pricing is forcing companies to manage individual employee token budgets like departmental budgets, creating complex ROI calculations and flipping traditional build-vs-buy economics as engineering costs shift from headcount to compute.
Extreme Harness Engineering for the 1B token/day Dark Factory β Ryan Lopopolo, OpenAI Frontier
Ryan Lopopolo reveals how OpenAI's Frontier team built a 'Dark Factory' processing 1 billion tokens daily, generating over 1 million lines of code from zero human-written code in 5 months. By treating human attention as the only scarce resource and enforcing strict constraints like sub-minute builds, the team shifted from manual coding to autonomous agents that write, review, and merge their own code.
Marc Andreessen introspects on Death of the Browser, Pi + OpenClaw, and Why "This Time Is Different"
Marc Andreessen frames artificial intelligence as an '80-year overnight success,' arguing that while the field has cycled through boom-bust periods since 1943, the current convergence of LLMs, reasoning models, agents, and recursive self-improvement represents a permanent inflection point where the technology finally 'works' at scale, justifying the view that 'this time is different' for builders and investors.