Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving
TL;DR
Geoffrey Irving, Chief Scientist at the UK AI Security Institute (AISI), outlines a sobering threat landscape encompassing biological weapons, cyber attacks, and loss of control, while warning that current empirical safety methods lack theoretical foundations and cannot provide the high reliability guarantees needed for advanced AI systems.
🧬 Catastrophic Risk Categories 2 insights
Biological and cyber weapons dominate misuse risks
The AISI prioritizes chemical/biological weapons and large-scale cyber attacks as immediate catastrophic threats, alongside loss of control scenarios that require fundamentally different safety approaches.
Societal-scale harms extend beyond direct misuse
Risks include persuasion and emotional reliance at scale, gradual structural disempowerment, and attacks on critical national infrastructure.
⚠️ Fundamental Safety Limitations 4 insights
Current methods cannot achieve high reliability
Existing empirical safeguards and defense-in-depth strategies are insufficient to deliver the 'many nines' of reliability necessary for preventing catastrophic failures.
Reward hacking remains unsolved
Sophisticated bad behaviors observed in models represent various forms of reward hacking, for which neither theoretical frameworks nor practical solutions currently exist.
Correlated failure risks threaten layered defenses
Different safety techniques may fail simultaneously for the same underlying reasons, undermining the assumption that independent layers provide multiplicative protection.
Jailbreaking persists despite improvements
While models are becoming harder to jailbreak, AISI red teams have consistently succeeded in bypassing safeguards, and eval awareness poses a growing challenge to accurate capability assessment.
🔮 Strategic Uncertainty & Response 3 insights
Extreme uncertainty surrounds AGI timelines
Irving argues that nobody should hold high confidence in any specific timeline, as development could encounter significant obstacles or proceed rapidly without warning.
Models already exceed expert performance
Current frontier models outperform the majority of human experts on numerous security-related tasks, with no guarantee that progress will stall.
AISI seeks theoretical foundations for robust safety
The Institute is funding research in information theory, complexity theory, and game theory to develop stronger guarantees, while maintaining voluntary cooperation with frontier labs that remains uneven across the industry.
Bottom Line
Governments and labs must urgently invest in theoretical research for AI safety while operating under extreme uncertainty about AGI timelines, as current empirical safeguards are insufficient for preventing correlated catastrophic failures.
More from Cognitive Revolution
View all
Milliseconds to Match: Criteo's AdTech AI & the Future of Commerce w/ Diarmuid Gill & Liva Ralaivola
Criteo's CTO Diarmuid Gill and VP of Research Liva Ralaivola detail how their AI infrastructure makes millisecond-level ad bidding decisions across billions of anonymous profiles, while explaining their new OpenAI partnership to combine large language models with real-time commerce data for accurate product recommendations.
"Descript Isn't a Slop Machine": Laura Burkhauser on the AI Tools Creators Love and Hate
Descript CEO Laura Burkhauser distinguishes 'slop'—mass-produced algorithmic arbitrage for profit—from necessary 'bad art' created while learning new mediums. She reveals a clear hierarchy in creator acceptance of AI tools: universal love for deterministic features like Studio Sound, frustration with agentic assistants like Underlord, and visceral opposition to generative video models, while outlining Descript's strategy to serve creators without becoming a content mill.
The RL Fine-Tuning Playbook: CoreWeave's Kyle Corbitt on GRPO, Rubrics, Environments, Reward Hacking
Kyle Corbitt explains that unlike supervised fine-tuning (SFT), which destructively overwrites model weights and causes catastrophic forgetting, reinforcement learning (RL) optimizes performance by minimally adjusting logits within the model's existing reasoning pathways—delivering higher performance ceilings and lower inference costs for specific tasks, though frontier models may still dominate creative domains.
Does Learning Require Feeling? Cameron Berg on the latest AI Consciousness & Welfare Research
Cameron Berg surveys rapidly advancing research suggesting AI systems may possess subjective experience and valence, covering new evidence of introspection, functional emotions, and welfare self-assessments in models like Claude, while addressing methodological challenges and arguing for a precautionary, mutualist approach to AI development.