Notion’s Sarah Sachs & Simon Last on Custom Agents, Evals, and the Future of Work
TL;DR
Notion's AI leads Sarah Sachs and Simon Last detail their three-year journey to launch custom agents, revealing how they navigated premature model capabilities, built a culture of radical iteration, and balance immediate utility with forward-looking bets on software factories and MCP integration.
🚀 The Long Road to Custom Agents 3 insights
Four complete rebuilds since 2022
Notion attempted to build AI agents immediately after gaining GPT-4 access in late 2022, but abandoned early attempts involving fine-tuned function calling because models lacked native tool support and sufficient context windows.
The Sonnet 3.6/3.7 unlock
The project only became production-viable after Anthropic's Sonnet 3.6 and 3.7 emerged early last year, finally providing the reliability necessary for background-running agents that operate without human supervision.
Learning to swim with the current
The team developed crucial intuition for distinguishing between 'swimming upstream' against model limitations versus preparing infrastructure early for anticipated capability improvements.
🔄 Engineering Culture & Iteration 3 insights
Radical low-ego codebase
Sachs emphasizes a culture where engineers happily delete their own code and avoid writing design docs for political advancement, enabling the team to abandon approaches immediately when model capabilities shift.
The Simon vortex methodology
Last operates a high-velocity prototyping 'vortex' where direction changes daily, staffed by senior engineers who rotate between experimental work and productionization without strict management boundaries.
Decentralized ideation over hierarchy
Leadership focuses on clarifying objectives rather than gatekeeping ideas, allowing engineers who identify user pain points to prototype solutions without seeking hierarchical approval from product leadership.
🏗️ Architecture & Evaluation 3 insights
MCP for narrow permissioned tools
They view Model Context Protocol as ideal for lightweight agents requiring strict permissions, contrasting with full coding agents that need compute runtimes and sandboxed environments.
P99 transcript analysis ritual
Every Friday, the team examines the most token-intensive failed agent transcripts to identify specific failure modes and ruthlessly cut tasks that exceed current model capabilities.
Solving enterprise permission intersections
Building for enterprise required solving complex access control where agents shared in Slack channels must respect the intersection between channel memberships and document-level permissions.
🔮 Future of Work & AI 3 insights
The software factory vision
Notion is exploring automated workflows where multiple agents collaboratively develop, debug, merge, and maintain codebases with minimal human intervention, effectively bootstrapping their own capabilities.
Coding agents as the AGI kernel
They view coding agents as the fundamental 'kernel' for AGI, where agents that can write and maintain software effectively create their own tools and extend their own functionality.
The Datadog analogy
Like Datadog provides observability expertise atop AWS infrastructure, Notion aims to be the expert collaboration layer atop raw AI capabilities, focused specifically on how enterprise teams actually work together.
Bottom Line
Successfully deploying AI agents requires the organizational discipline to abandon approaches that fight current model limitations while maintaining the cultural flexibility to rebuild completely when capabilities inevitably improve.
More from Latent Space
View all
Devin’s 80% Moment: Background Agents, 7x PRs, & End of Hand-Held Coding — Walden Yan & Cole Murray
AI coding agents have reached an inflection point where Devin now writes 80% of code at Cognition, marking an industry-wide shift from IDE pair-programming to autonomous background agents that demand new architectural patterns for security and infrastructure.
🔬 The Bitter Lesson is Coming for Proteins - Alex Rives, BioHub
Alex Rives demonstrates how the 'bitter lesson' of AI scaling applies to protein biology, showing that massive transformer models trained on billions of evolutionary sequences develop emergent world models capable of predicting structure, function, and designing novel antibodies without prior biological knowledge.
AI Agents Need Computers: 74% MoM Growth, 850K/Day Runs, & New Agent Cloud — Ivan Burazin, Daytona
Daytona CEO Ivan Burazin explains their pivot from human developer environments to 'composable computers' for AI agents, revealing how bare-metal, stateful infrastructure (not ephemeral VMs) unlocked 74% month-over-month growth and massive enterprise demand.
The Agent-Native Cloud: 3M Users, 100K Signups/Wk, Data Centers, & Death PRs — Jake Cooper, Railway
Jake Cooper, founder of Railway, explains how the 'agent-native cloud' hit 3 million users and 100,000 weekly signups by betting that manual coding is obsolete, detailing their journey from a $500K/month free tier loss to bare metal infrastructure ownership.