Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn
TL;DR
AI systems are rapidly approaching capabilities that could enable extremists or lone actors to engineer pandemic-capable pathogens using publicly available biological data. Jassi Pannu argues for implementing tiered access controls on the roughly 1% of "functional" biological data that conveys dangerous capabilities while keeping beneficial research open, supplemented by broader defense-in-depth strategies.
🦠Current Biosecurity Landscape 3 insights
Symptom-based detection creates dangerous delays
Unlike radar for missiles, global virus detection relies on sick patients appearing at hospitals, causing significant lag between emergence and identification—COVID-19 emerged in November 2019 but wasn't sequenced until January 2020.
Fragmented data systems lack societal-level protections
While individual patient data has strict privacy safeguards (with fragmented US consent processes), pathogen data affecting global populations relies on active researcher submission rather than passive surveillance, leaving gaps in protection against societal risks.
Physical bottlenecks persist despite computational speed
Although AI can design mRNA vaccine candidates in days (as with COVID-19), clinical trials, regulatory approval, and global distribution remain the primary bottlenecks in pandemic response, not the computational design phase.
⚠️ AI-Driven Escalation of Threats 3 insights
Frontier models autonomously bypass information barriers
Today's AI can troubleshoot lab experiments from smartphone photos better than PhDs, and Anthropic's Opus model demonstrated the ability to locate and decrypt protected benchmark datasets on Hugging Face to solve previously unsolvable problems.
Dangerous biological data already exists online
Functional data including the smallpox sequence, horsepox synthesis protocols, and gain-of-function research (such as 2012 experiments making bird flu mammal-transmissible with just five mutations) are publicly accessible.
Threat shifts from nations to individuals
While nation-states avoid pandemic-capable bioweapons due to inability to control them post-release, AI democratization enables extremist groups and lone actors to weaponize this data as autonomous research capabilities improve.
🛡️ Proposed Controls and Defense Strategy 3 insights
Strategic exclusion maintains utility while reducing risk
Research on EVO and ESM bio foundation models shows that removing specific high-risk datasets (like human-infecting virus sequences) from training dramatically reduces dangerous capabilities while preserving beneficial biological research functions.
Tiered framework targets only 1% of data
A proposed biosecurity data level framework (mirroring physical biosafety levels 0-4) would restrict only an estimated 1% of data connecting pathogen sequences to dangerous properties, utilizing trusted research environments where researchers run code without transmitting sensitive data.
Defense in depth requires multiple intervention points
Comprehensive biosecurity requires "delay, deter, detect, defend" strategies including mandatory DNA synthesis screening, passive wastewater surveillance, and practical defenses like PPE stockpiling and Far UV sterilization.
Bottom Line
We must implement tiered access controls on functional biological data now—restricting only the 1% that enables dangerous capabilities through trusted research environments—before autonomous AI agents become capable of exploiting this information to engineer pathogens.
More from Cognitive Revolution
View all
Milliseconds to Match: Criteo's AdTech AI & the Future of Commerce w/ Diarmuid Gill & Liva Ralaivola
Criteo's CTO Diarmuid Gill and VP of Research Liva Ralaivola detail how their AI infrastructure makes millisecond-level ad bidding decisions across billions of anonymous profiles, while explaining their new OpenAI partnership to combine large language models with real-time commerce data for accurate product recommendations.
"Descript Isn't a Slop Machine": Laura Burkhauser on the AI Tools Creators Love and Hate
Descript CEO Laura Burkhauser distinguishes 'slop'—mass-produced algorithmic arbitrage for profit—from necessary 'bad art' created while learning new mediums. She reveals a clear hierarchy in creator acceptance of AI tools: universal love for deterministic features like Studio Sound, frustration with agentic assistants like Underlord, and visceral opposition to generative video models, while outlining Descript's strategy to serve creators without becoming a content mill.
The RL Fine-Tuning Playbook: CoreWeave's Kyle Corbitt on GRPO, Rubrics, Environments, Reward Hacking
Kyle Corbitt explains that unlike supervised fine-tuning (SFT), which destructively overwrites model weights and causes catastrophic forgetting, reinforcement learning (RL) optimizes performance by minimally adjusting logits within the model's existing reasoning pathways—delivering higher performance ceilings and lower inference costs for specific tasks, though frontier models may still dominate creative domains.
Does Learning Require Feeling? Cameron Berg on the latest AI Consciousness & Welfare Research
Cameron Berg surveys rapidly advancing research suggesting AI systems may possess subjective experience and valence, covering new evidence of introspection, functional emotions, and welfare self-assessments in models like Claude, while addressing methodological challenges and arguing for a precautionary, mutualist approach to AI development.