Playground in Prod - Optimising Agents in Production Environments — Samuel Colvin, Pydantic

AI Engineer

| Podcasts | May 07, 2026 | 2.52 Thousand views

TL;DR

Samuel Colvin demonstrates optimizing AI agent prompts in production using Jepper, a genetic algorithm library that breeds high-performing prompt variations, combined with Logfire's managed variables for structured configuration and deterministic evaluation against golden datasets.

🧬 Jepper: Genetic Prompt Optimization 2 insights

Pareto Frontier Breeding Strategy

Jepper optimizes prompts using genetic algorithms that selectively breed candidates from the Pareto frontier of best performers, similar to selective racehorse breeding rather than random mutation.

String Optimization Beyond Text

The library optimizes any string value, whether simple text prompts or JSON data containing complex structured configurations.

⚙️ Managed Variables and Production Infrastructure 3 insights

Structured Configuration Management

Logfire's managed variables support any object definable by a Pydantic model, extending beyond simple text prompts to enable management of complex structured agent parameters.

AI Observability as Feature Not Category

Colvin argues that AI observability will eventually be absorbed by general observability platforms or AI frameworks, serving as a feature rather than a standalone category.

Autonomous Optimization Pipeline

The platform is evolving toward autonomous agent optimization where variables are tuned directly from the observability interface without manual intervention.

📊 Deterministic Evaluation Methodology 3 insights

Golden Dataset Over LLM Judges

Deterministic evaluators comparing outputs against verified golden datasets provide more reliable benchmarks than LLM-as-judge approaches, which Colvin describes as 'lunatics running the asylum'.

Political Dynasty Extraction Demo

The demonstration uses Pydantic AI with structured outputs to analyze Wikipedia data for UK MPs, specifically optimizing prompts to identify ancestral political relationships while filtering out spouses and siblings.

Pydantic Gateway for Model Access

The gateway service provides unified API access to multiple model providers with built-in observability, caching, and fallback capabilities for production environments.

Bottom Line

Use deterministic evaluations against golden datasets rather than LLM-as-judge for reliable agent benchmarking, and implement prompt optimization through genetic algorithms that breed high-performing variations rather than relying on random mutation.

Watch on YouTube

More from AI Engineer

Agentic Search for Context Engineering — Leonie Monigatti, Elastic

AI Engineer

Agentic Search for Context Engineering — Leonie Monigatti, Elastic

Leonie Monigatti from Elastic argues that context engineering is fundamentally 80% agentic search, evolving from rigid RAG pipelines to dynamic agent-driven retrieval that must navigate diverse context sources through carefully curated, specialized search tools.

about 14 hours ago · 9 points

Vibe Engineering Effect Apps — Michael Arnaldi, Effectful

AI Engineer

Vibe Engineering Effect Apps — Michael Arnaldi, Effectful

Michael Arnaldi demonstrates "vibe engineering" by building a TypeScript project with AI agents, revealing that cloning library repositories directly into your codebase—rather than using npm packages—enables AI to learn patterns from source code, while strict TypeScript and custom lint rules act as essential guardrails.

1 day ago · 8 points

Everything You Need To Know About Agent Observability — Danny Gollapalli and Ben Hylak, Raindrop

AI Engineer

Everything You Need To Know About Agent Observability — Danny Gollapalli and Ben Hylak, Raindrop

As AI agents grow more complex and autonomous, traditional pre-deployment testing fails to catch the infinite edge cases of production behavior. The video outlines a new observability paradigm combining explicit system metrics with implicit semantic signals and self-diagnostics to monitor agents in real-time.

1 day ago · 9 points

Skills at Scale — Nick Nisi and Zack Proser, WorkOS

AI Engineer

Skills at Scale — Nick Nisi and Zack Proser, WorkOS

Nick Nisi and Zack Proser from WorkOS demonstrate how 'skills'—portable, markdown-based context units—solve the 'cold start' problem of AI coding agents by encoding constraints and deterministic scripts that can be shared across teams and projects, eliminating repetitive context reloading.

2 days ago · 10 points

Browse more: 🎙️ Podcasts All Videos All Categories