Notion’s Sarah Sachs & Simon Last on Custom Agents, Evals, and the Future of Work

Latent Space

| Podcasts | April 15, 2026 | 9.74 Thousand views | 1:25:37

TL;DR

Notion's AI leads Sarah Sachs and Simon Last detail their three-year journey to launch custom agents, revealing how they navigated premature model capabilities, built a culture of radical iteration, and balance immediate utility with forward-looking bets on software factories and MCP integration.

🚀 The Long Road to Custom Agents 3 insights

Four complete rebuilds since 2022

Notion attempted to build AI agents immediately after gaining GPT-4 access in late 2022, but abandoned early attempts involving fine-tuned function calling because models lacked native tool support and sufficient context windows.

The Sonnet 3.6/3.7 unlock

The project only became production-viable after Anthropic's Sonnet 3.6 and 3.7 emerged early last year, finally providing the reliability necessary for background-running agents that operate without human supervision.

Learning to swim with the current

The team developed crucial intuition for distinguishing between 'swimming upstream' against model limitations versus preparing infrastructure early for anticipated capability improvements.

🔄 Engineering Culture & Iteration 3 insights

Radical low-ego codebase

Sachs emphasizes a culture where engineers happily delete their own code and avoid writing design docs for political advancement, enabling the team to abandon approaches immediately when model capabilities shift.

The Simon vortex methodology

Last operates a high-velocity prototyping 'vortex' where direction changes daily, staffed by senior engineers who rotate between experimental work and productionization without strict management boundaries.

Decentralized ideation over hierarchy

Leadership focuses on clarifying objectives rather than gatekeeping ideas, allowing engineers who identify user pain points to prototype solutions without seeking hierarchical approval from product leadership.

🏗️ Architecture & Evaluation 3 insights

MCP for narrow permissioned tools

They view Model Context Protocol as ideal for lightweight agents requiring strict permissions, contrasting with full coding agents that need compute runtimes and sandboxed environments.

P99 transcript analysis ritual

Every Friday, the team examines the most token-intensive failed agent transcripts to identify specific failure modes and ruthlessly cut tasks that exceed current model capabilities.

Solving enterprise permission intersections

Building for enterprise required solving complex access control where agents shared in Slack channels must respect the intersection between channel memberships and document-level permissions.

🔮 Future of Work & AI 3 insights

The software factory vision

Notion is exploring automated workflows where multiple agents collaboratively develop, debug, merge, and maintain codebases with minimal human intervention, effectively bootstrapping their own capabilities.

Coding agents as the AGI kernel

They view coding agents as the fundamental 'kernel' for AGI, where agents that can write and maintain software effectively create their own tools and extend their own functionality.

The Datadog analogy

Like Datadog provides observability expertise atop AWS infrastructure, Notion aims to be the expert collaboration layer atop raw AI capabilities, focused specifically on how enterprise teams actually work together.

Bottom Line

Successfully deploying AI agents requires the organizational discipline to abandon approaches that fight current model limitations while maintaining the cultural flexibility to rebuild completely when capabilities inevitably improve.

Watch on YouTube

More from Latent Space

🔬 "The Most Innovative Diffusion Research Is Happening in Drug Discovery, Not Image Generation"

Latent Space

🔬 "The Most Innovative Diffusion Research Is Happening in Drug Discovery, Not Image Generation"

Evan Fineberg and Sergey Udov of Genesis Molecular AI discuss how diffusion models have pivoted from image generation to drive breakthroughs in 3D protein structure prediction. They detail how their Pearl model applies LLM-style scaling strategies—including synthetic physics-based training data and inference-time 'thinking'—to solve the historically intractable challenge of predicting how small molecules bind to proteins.

14 days ago · 7 points

Cooking with OpenAI’s Research Chief: AGI, o1, Evals, and Scaling Laws — Mark Chen

Latent Space

Cooking with OpenAI’s Research Chief: AGI, o1, Evals, and Scaling Laws — Mark Chen

OpenAI Chief Research Officer Mark Chen discusses the company's research philosophy while cooking Korean tofu stew, emphasizing that scaling laws remain robust, reinforcement learning excels in objective domains, and successful research organizations balance top-down vision with bottom-up conviction.

19 days ago · 10 points

The Agent Cloud: Databricks’ Bet on the Future of AI — Matei Zaharia and Reynold Xin

Latent Space

The Agent Cloud: Databricks’ Bet on the Future of AI — Matei Zaharia and Reynold Xin

Matei Zaharia and Reynold Xin detail Databricks' open-source 'Agent Cloud' platform (Omnigen), arguing that standardized protocols and persistent infrastructure—not just better models—will determine which enterprises successfully deploy collaborative, secure AI agents at scale.

20 days ago · 9 points

AI Security After Codex and Claude Code — Zico Kolter & Matt Fredrikson, Gray Swan

Latent Space

AI Security After Codex and Claude Code — Zico Kolter & Matt Fredrikson, Gray Swan

Gray Swan co-founders Zico Kolter and Matt Fredrikson explain why AI systems require a fundamentally different security approach than traditional software, highlighting how their automated red teaming system 'Shade' has begun to outperform human experts at finding model vulnerabilities. They emphasize the urgent need to treat AI agents as inherently untrusted entities capable of correlated failures across the software ecosystem.

22 days ago · 8 points

Browse more: 🎙️ Podcasts All Videos All Categories