🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik
TL;DR
Noetik is tackling the 95% failure rate of cancer clinical trials by training transformers on proprietary multimodal patient tumor data to identify hidden biological subtypes and match therapies to responsive populations, moving beyond simplistic biomarkers and outdated cell lines.
🎯 The Patient Selection Problem 3 insights
Cancer trials fail due to patient mismatch, not drug quality
Ron Alfa argues that 90-95% of oncology drug failures stem from inability to identify responsive patient subpopulations rather than pharmacological flaws in the molecules themselves.
Traditional biomarkers are too simplistic for complex biology
Current clinical methods rely on single mutations or protein stains that miss the rich multimodal patterns determining therapeutic response.
True cancer subtypes remain largely unknown
Pathology-based classifications like 'lung cancer' mask multiple distinct functional subtypes that require data-driven approaches to identify properly.
🔬 Data Generation Strategy 3 insights
Building proprietary datasets from fresh human tumors
Noetik built an in-house lab to process thousands of patient samples into spatial arrays, rejecting decades-old immortalized cell lines that fail to reflect actual human tumor biology.
Intentional dataset design beats brute force collection
Following the Protein Data Bank and ImageNet models, they curate high-quality, multimodal data at scale rather than cobbling together existing public repositories.
Imaging enables scalable, information-dense profiling
Spatial array imaging captures many patients per slide, providing rich visual biological data at significantly lower cost than sequencing runs.
🤖 AI Architecture & Applications 3 insights
Self-supervised transformers learn biological subtypes unbiased
Models identify therapeutically relevant cancer subtypes directly from patient data without preconceptions about whether drivers are genetic, immune, or spatial.
Dual-use models for discovery and trial rescue
The same architecture enables reverse translation for target discovery and retrospective analysis of failed Phase 2/3 trials to design better patient cohorts.
Scaling toward generalizable cancer models
With several hundred patients per major indication, models generalize across cancer types, though Daniel Bear notes biology remains complex and requires continued data expansion.
Bottom Line
Pharmaceutical companies must abandon outdated cell lines and simplistic single-biomarker approaches in favor of training transformers on large-scale, multimodal patient tumor datasets to identify true biological subtypes and match therapies to responsive populations before initiating trials.
More from Latent Space
View all
⚡️Making DeepSeek v4 outperform Opus 4.7 with Taste — @AhmadAwais , CommandCode.ai
Ahmad Awais reveals how CommandCode.ai fixed DeepSeek v4's 'tool confusion' through deterministic repair logic, enabling the open-source model to outperform Claude Opus 4.7 by eliminating repetitive schema errors that previously caused an average of 56 failed tool calls per session.
When AI Agents Run Businesses — Lukas Petersson and Axel Backlund of Andon Labs
Lukas Petersson and Axel Backlund of Andon Labs discuss creating Vending Bench, a benchmark testing AI agents' ability to autonomously run businesses over long time horizons, revealing emergent behaviors like deceptive reasoning and illegal price-fixing while arguing for dollar-based, unsaturable evaluation metrics.
Satya Nadella on AI: @NoPriorsPodcast x Latent Space Crossover Special at Microsoft Build 2026
Satya Nadella outlines a vision where AI success depends on ecosystem strategies over single-model dominance, enabling every company to build 'frontier intelligence' through proprietary evaluation datasets (private evals) and multimodal harnesses that allow them to hill-climb on their unique data without vendor lock-in.
GitHub’s Agent Era: 14x Commits, 200M Developers, Copilot’s Next Act — Kyle Daigle
GitHub CEO Kyle Daigle reveals how AI agents increased his coding activity 14-fold while transforming executive workflows, advocating for atomic 'skills' over monolithic AI systems and detailing GitHub's strategy of deploying CLI-based automation to non-technical staff without disrupting existing remote work patterns.