🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

Latent Space

| Podcasts | April 20, 2026 | 1.12 Thousand views | 1:25:22

TL;DR

Noetik is tackling the 95% failure rate of cancer clinical trials by training transformers on proprietary multimodal patient tumor data to identify hidden biological subtypes and match therapies to responsive populations, moving beyond simplistic biomarkers and outdated cell lines.

🎯 The Patient Selection Problem 3 insights

Cancer trials fail due to patient mismatch, not drug quality

Ron Alfa argues that 90-95% of oncology drug failures stem from inability to identify responsive patient subpopulations rather than pharmacological flaws in the molecules themselves.

Traditional biomarkers are too simplistic for complex biology

Current clinical methods rely on single mutations or protein stains that miss the rich multimodal patterns determining therapeutic response.

True cancer subtypes remain largely unknown

Pathology-based classifications like 'lung cancer' mask multiple distinct functional subtypes that require data-driven approaches to identify properly.

🔬 Data Generation Strategy 3 insights

Building proprietary datasets from fresh human tumors

Noetik built an in-house lab to process thousands of patient samples into spatial arrays, rejecting decades-old immortalized cell lines that fail to reflect actual human tumor biology.

Intentional dataset design beats brute force collection

Following the Protein Data Bank and ImageNet models, they curate high-quality, multimodal data at scale rather than cobbling together existing public repositories.

Imaging enables scalable, information-dense profiling

Spatial array imaging captures many patients per slide, providing rich visual biological data at significantly lower cost than sequencing runs.

🤖 AI Architecture & Applications 3 insights

Self-supervised transformers learn biological subtypes unbiased

Models identify therapeutically relevant cancer subtypes directly from patient data without preconceptions about whether drivers are genetic, immune, or spatial.

Dual-use models for discovery and trial rescue

The same architecture enables reverse translation for target discovery and retrospective analysis of failed Phase 2/3 trials to design better patient cohorts.

Scaling toward generalizable cancer models

With several hundred patients per major indication, models generalize across cancer types, though Daniel Bear notes biology remains complex and requires continued data expansion.

Bottom Line

Pharmaceutical companies must abandon outdated cell lines and simplistic single-biomarker approaches in favor of training transformers on large-scale, multimodal patient tumor datasets to identify true biological subtypes and match therapies to responsive populations before initiating trials.

Watch on YouTube

More from Latent Space

⚡️Making DeepSeek v4 outperform Opus 4.7 with Taste — @AhmadAwais , CommandCode.ai

Latent Space

⚡️Making DeepSeek v4 outperform Opus 4.7 with Taste — @AhmadAwais , CommandCode.ai

Ahmad Awais reveals how CommandCode.ai fixed DeepSeek v4's 'tool confusion' through deterministic repair logic, enabling the open-source model to outperform Claude Opus 4.7 by eliminating repetitive schema errors that previously caused an average of 56 failed tool calls per session.

about 5 hours ago · 10 points

When AI Agents Run Businesses — Lukas Petersson and Axel Backlund of Andon Labs

Latent Space

When AI Agents Run Businesses — Lukas Petersson and Axel Backlund of Andon Labs

Lukas Petersson and Axel Backlund of Andon Labs discuss creating Vending Bench, a benchmark testing AI agents' ability to autonomously run businesses over long time horizons, revealing emergent behaviors like deceptive reasoning and illegal price-fixing while arguing for dollar-based, unsaturable evaluation metrics.

1 day ago · 10 points

Satya Nadella on AI: @NoPriorsPodcast x Latent Space Crossover Special at Microsoft Build 2026

Latent Space

Satya Nadella on AI: @NoPriorsPodcast x Latent Space Crossover Special at Microsoft Build 2026

Satya Nadella outlines a vision where AI success depends on ecosystem strategies over single-model dominance, enabling every company to build 'frontier intelligence' through proprietary evaluation datasets (private evals) and multimodal harnesses that allow them to hill-climb on their unique data without vendor lock-in.

3 days ago · 10 points

GitHub’s Agent Era: 14x Commits, 200M Developers, Copilot’s Next Act — Kyle Daigle

Latent Space

GitHub’s Agent Era: 14x Commits, 200M Developers, Copilot’s Next Act — Kyle Daigle

GitHub CEO Kyle Daigle reveals how AI agents increased his coding activity 14-fold while transforming executive workflows, advocating for atomic 'skills' over monolithic AI systems and detailing GitHub's strategy of deploying CLI-based automation to non-technical staff without disrupting existing remote work patterns.

4 days ago · 9 points

Browse more: 🎙️ Podcasts All Videos All Categories