You can't just one shot it — Mehedi Hassan, Granola
TL;DR
Mehedi Hassan explains why simply adding AI features with a single prompt ('one-shotting') fails in production, advocating instead for tight feedback loops through custom tracing infrastructure and rapid iteration frameworks to refine LLM behavior for specific use cases.
💥 The Limits of 'One-Shot' AI Integration 3 insights
Generic chatbots misunderstand nuanced context
Simple chat implementations fail at nuanced queries like distinguishing between 'coach' as a sports role versus business mentorship, leading to irrelevant outputs.
Web search tools hide exploding costs
While adding web search appears as simple as one line of code, token usage can reach 10 pence per chat at scale, making it economically unfeasible for millions of users.
Single prompts cannot serve diverse user roles
Sales teams need deal-focused outputs while engineers require action items and Linear tickets, making universal prompts ineffective across different personas.
🔍 Building Transparency into the Black Box 3 insights
Custom tracing tools reveal LLM decision-making
Granola built internal visibility tools to track tool calls, reasoning steps, and costs from start to finish, treating off-the-shelf SaaS solutions as insufficient for their needs.
Structured data enables cross-functional debugging
The tracing interface serves not just engineers but also product, data, and CX teams, eliminating the need for complex CloudWatch queries to identify failures.
LLMs accelerate internal tooling, not user features
Unlike user-facing features, internal tools like tracing systems can be effectively 'one-shotted' with LLMs, allowing rapid development of custom observability infrastructure.
🚀 Engineering for Rapid Iteration 3 insights
Abstracting Electron to web standards
Granola transformed their desktop app's frontend into a web shell deployable online, enabling CI-generated preview links for parallel feature testing without local dependency friction.
AI-powered self-verification of code changes
Cursor automatically tests pull requests and uploads screenshots to PRs, drastically speeding up the review process without manual testing environments.
Desktop constraints require creative solutions
Because Granola runs as a single-instance desktop app, they made the render process environment-agnostic by abstracting IPC APIs to fall back to web standards when needed.
Bottom Line
Stop trying to perfect AI features with better single prompts; instead, build infrastructure that lets you rapidly test, trace, and iterate with your LLM like a game of tennis until the output feels like magic.
More from AI Engineer
View all
The Production AI Playbook: Deploying Agents at Enterprise Scale — Sandipan Bhaumik, Databricks
Sandipan Bhaumik from Databricks presents a battle-tested five-pillar framework for deploying enterprise AI agents, arguing that starting with model selection leads to inevitable production failures while proper evaluation, observability, and data governance determine success at scale.
Sovereign Escape Velocity: Ownership w Open Models — Gus Martins, & Ian Ballantyne, Google DeepMind
Google DeepMind's Gus Martins and Ian Ballantyne introduce Gemma 4, a family of open models (2B to 31B parameters) that deliver frontier-level intelligence with disproportionate efficiency, enabling sovereign AI ownership through local deployment, Apache 2.0 licensing, and on-device capabilities.
LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize
Dat Ngo from Arize AI explains how modern AI systems require reimagined observability and evaluation patterns built on OpenTelemetry to manage non-deterministic agents, emphasizing that the future of AI engineering lies in automated experimentation flywheels that eliminate manual dashboard work.
Text Diffusion — Brendon Dillon, Google DeepMind
Google DeepMind researcher Brendon Dillon explains text diffusion as a parallel alternative to autoregressive language models that iteratively denoises random tokens rather than generating sequentially, offering significantly lower latency and unique capabilities like self-correction and adaptive computation, though currently limited by high serving costs for large batches.