🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

| Podcasts | April 20, 2026 | 1.12 Thousand views | 1:25:22

TL;DR

Noetik is tackling the 95% failure rate of cancer clinical trials by training transformers on proprietary multimodal patient tumor data to identify hidden biological subtypes and match therapies to responsive populations, moving beyond simplistic biomarkers and outdated cell lines.

🎯 The Patient Selection Problem 3 insights

Cancer trials fail due to patient mismatch, not drug quality

Ron Alfa argues that 90-95% of oncology drug failures stem from inability to identify responsive patient subpopulations rather than pharmacological flaws in the molecules themselves.

Traditional biomarkers are too simplistic for complex biology

Current clinical methods rely on single mutations or protein stains that miss the rich multimodal patterns determining therapeutic response.

True cancer subtypes remain largely unknown

Pathology-based classifications like 'lung cancer' mask multiple distinct functional subtypes that require data-driven approaches to identify properly.

🔬 Data Generation Strategy 3 insights

Building proprietary datasets from fresh human tumors

Noetik built an in-house lab to process thousands of patient samples into spatial arrays, rejecting decades-old immortalized cell lines that fail to reflect actual human tumor biology.

Intentional dataset design beats brute force collection

Following the Protein Data Bank and ImageNet models, they curate high-quality, multimodal data at scale rather than cobbling together existing public repositories.

Imaging enables scalable, information-dense profiling

Spatial array imaging captures many patients per slide, providing rich visual biological data at significantly lower cost than sequencing runs.

🤖 AI Architecture & Applications 3 insights

Self-supervised transformers learn biological subtypes unbiased

Models identify therapeutically relevant cancer subtypes directly from patient data without preconceptions about whether drivers are genetic, immune, or spatial.

Dual-use models for discovery and trial rescue

The same architecture enables reverse translation for target discovery and retrospective analysis of failed Phase 2/3 trials to design better patient cohorts.

Scaling toward generalizable cancer models

With several hundred patients per major indication, models generalize across cancer types, though Daniel Bear notes biology remains complex and requires continued data expansion.

Bottom Line

Pharmaceutical companies must abandon outdated cell lines and simplistic single-biomarker approaches in favor of training transformers on large-scale, multimodal patient tumor datasets to identify true biological subtypes and match therapies to responsive populations before initiating trials.

More from Latent Space

View all