Notion’s Sarah Sachs & Simon Last on Custom Agents, Evals, and the Future of Work
TL;DR
Notion's AI leads Sarah Sachs and Simon Last detail their three-year journey to launch custom agents, revealing how they navigated premature model capabilities, built a culture of radical iteration, and balance immediate utility with forward-looking bets on software factories and MCP integration.
🚀 The Long Road to Custom Agents 3 insights
Four complete rebuilds since 2022
Notion attempted to build AI agents immediately after gaining GPT-4 access in late 2022, but abandoned early attempts involving fine-tuned function calling because models lacked native tool support and sufficient context windows.
The Sonnet 3.6/3.7 unlock
The project only became production-viable after Anthropic's Sonnet 3.6 and 3.7 emerged early last year, finally providing the reliability necessary for background-running agents that operate without human supervision.
Learning to swim with the current
The team developed crucial intuition for distinguishing between 'swimming upstream' against model limitations versus preparing infrastructure early for anticipated capability improvements.
🔄 Engineering Culture & Iteration 3 insights
Radical low-ego codebase
Sachs emphasizes a culture where engineers happily delete their own code and avoid writing design docs for political advancement, enabling the team to abandon approaches immediately when model capabilities shift.
The Simon vortex methodology
Last operates a high-velocity prototyping 'vortex' where direction changes daily, staffed by senior engineers who rotate between experimental work and productionization without strict management boundaries.
Decentralized ideation over hierarchy
Leadership focuses on clarifying objectives rather than gatekeeping ideas, allowing engineers who identify user pain points to prototype solutions without seeking hierarchical approval from product leadership.
🏗️ Architecture & Evaluation 3 insights
MCP for narrow permissioned tools
They view Model Context Protocol as ideal for lightweight agents requiring strict permissions, contrasting with full coding agents that need compute runtimes and sandboxed environments.
P99 transcript analysis ritual
Every Friday, the team examines the most token-intensive failed agent transcripts to identify specific failure modes and ruthlessly cut tasks that exceed current model capabilities.
Solving enterprise permission intersections
Building for enterprise required solving complex access control where agents shared in Slack channels must respect the intersection between channel memberships and document-level permissions.
🔮 Future of Work & AI 3 insights
The software factory vision
Notion is exploring automated workflows where multiple agents collaboratively develop, debug, merge, and maintain codebases with minimal human intervention, effectively bootstrapping their own capabilities.
Coding agents as the AGI kernel
They view coding agents as the fundamental 'kernel' for AGI, where agents that can write and maintain software effectively create their own tools and extend their own functionality.
The Datadog analogy
Like Datadog provides observability expertise atop AWS infrastructure, Notion aims to be the expert collaboration layer atop raw AI capabilities, focused specifically on how enterprise teams actually work together.
Bottom Line
Successfully deploying AI agents requires the organizational discipline to abandon approaches that fight current model limitations while maintaining the cultural flexibility to rebuild completely when capabilities inevitably improve.
More from Latent Space
View all
⚡️ The best engineers don't write the most code. They delete the most code. — Stay Sassy
The Stay SaaSy crew explains how AI consumption-based pricing is forcing companies to manage individual employee token budgets like departmental budgets, creating complex ROI calculations and flipping traditional build-vs-buy economics as engineering costs shift from headcount to compute.
Extreme Harness Engineering for the 1B token/day Dark Factory — Ryan Lopopolo, OpenAI Frontier
Ryan Lopopolo reveals how OpenAI's Frontier team built a 'Dark Factory' processing 1 billion tokens daily, generating over 1 million lines of code from zero human-written code in 5 months. By treating human attention as the only scarce resource and enforcing strict constraints like sub-minute builds, the team shifted from manual coding to autonomous agents that write, review, and merge their own code.
Marc Andreessen introspects on Death of the Browser, Pi + OpenClaw, and Why "This Time Is Different"
Marc Andreessen frames artificial intelligence as an '80-year overnight success,' arguing that while the field has cycled through boom-bust periods since 1943, the current convergence of LLMs, reasoning models, agents, and recursive self-improvement represents a permanent inflection point where the technology finally 'works' at scale, justifying the view that 'this time is different' for builders and investors.
Moonlake: Multimodal, Interactive, and Efficient World Models — with Fan-yun Sun and Chris Manning
Moonlake founders Fan-yun Sun and Chris Manning argue that true world models require action-conditioned symbolic reasoning about physics and consequences, not just pixel prediction, enabling spatial intelligence with orders of magnitude less data than pure scaling approaches.