[State of MechInterp] SAEs in Production, Circuit Tracing, AI4Science, "Pragmatic" Interp — Goodfire

Latent Space

| Podcasts | December 31, 2025 | 980 views | 21:48

TL;DR

Goodfire researchers discuss how mechanistic interpretability has evolved from pure research to practical deployment in 2025, highlighting production applications like PII detection and scientific discovery while navigating the field's pivot toward 'pragmatic' tools that prioritize real-world utility over complete mechanistic understanding.

🏭 Production Deployments & Enterprise Use 3 insights

PII detection via feature probing

Racketin deploys a 'sidecar' interpretability model that detects when PII-related features fire in customer chats, achieving higher recall than LLM judges at 1/500th the cost.

Interpretability enters model cards

Major labs now integrate interpretability into evaluation workflows, with techniques appearing in Gemini 3 and Claude 4 model cards and red teaming processes.

AI for scientific discovery

Researchers apply interpretability to superhuman biological models in genomics and proteomics to identify novel disease biomarkers from opaque 'base pairs in, base pairs out' systems.

🔬 Unsupervised Techniques & Model Control 3 insights

Direct latent manipulation

Goodfire's paint.goodfire.ai tool allows users to manipulate Stable Diffusion's internal feature space directly, enabling users to drag and position unsupervised concepts like animals without text prompts.

Memorization follows a spectrum

Recent research demonstrates that memorization ranges from rote storage of repeated documents to logical reasoning capabilities, with factual recall existing between these extremes and proving difficult to disentangle from core cognition.

Cross-layer circuit tracing

The Topics paper introduces cross-layer transcoders that construct attribution graphs mapping model computations across layers, though Goodfire's replication efforts confirm these remain computationally intensive to scale.

🎯 The Pragmatic Turn in Interpretability 3 insights

Beyond alignment science

DeepMind's pivot to 'pragmatic interpretability' signals an industry shift toward tools that solve immediate deployment challenges rather than pursuing complete mechanistic understanding of model internals.

Technique-specific tooling

Effective deployment requires matching methods to use cases, such as feature probing for runtime monitoring, circuit tracing for alignment verification, and specialized approaches for post-training model diffing.

Limits of knowledge editing

Updating specific facts remains intractable because knowledge is entangled with reasoning capabilities, making true 'unlearning' impossible and fact editing risky without broader cognitive side effects.

Bottom Line

Mechanistic interpretability is transitioning from academic research to practical engineering, with 2025 marking the shift toward production deployments in privacy, science, and monitoring—but success depends on selecting the right interpretability technique for each specific use case rather than pursuing universal mechanistic understanding.

Watch on YouTube

More from Latent Space

🔬Top Black Holes Physicist: GPT5 can do Vibe Physics, here's what I found

Latent Space

🔬Top Black Holes Physicist: GPT5 can do Vibe Physics, here's what I found

Physicist Alex Lubyansky discusses how GPT-5 and reasoning models like o3 have achieved superhuman capabilities in theoretical physics, solving the year-long mystery of single minus gluon tree amplitudes and reproducing complex research in minutes rather than months.

4 days ago · 9 points

The $15B Physical AI Company: Simulation, Autonomy OS, Neural Sim, & 1K Engineers—Applied Intuition

Latent Space

The $15B Physical AI Company: Simulation, Autonomy OS, Neural Sim, & 1K Engineers—Applied Intuition

Applied Intuition is building the unified 'Android for physical machines' to solve OS fragmentation across vehicles and industrial equipment, enabling modern AI deployment through simulation tools, proprietary operating systems, and end-to-end autonomy models with a 1,000-engineer team.

12 days ago · 9 points

CI/CD Breaks at AI Speed: Tangle, Graphite Stacks, Pro-Model PR Review — Mikhail Parakhin, Shopify

Latent Space

CI/CD Breaks at AI Speed: Tangle, Graphite Stacks, Pro-Model PR Review — Mikhail Parakhin, Shopify

Shopify CTO Mikhail Parakhin reveals that AI agents have achieved nearly 100% daily adoption among developers, driving a 30% month-over-month surge in PR merges that is breaking traditional CI/CD pipelines, and argues that organizations must shift from parallel token-burning agents to high-latency, critique-loop architectures using expensive pro-level models for code review.

17 days ago · 10 points

🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

Latent Space

🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

Noetik is tackling the 95% failure rate of cancer clinical trials by training transformers on proprietary multimodal patient tumor data to identify hidden biological subtypes and match therapies to responsive populations, moving beyond simplistic biomarkers and outdated cell lines.

19 days ago · 9 points

Browse more: 🎙️ Podcasts All Videos All Categories