Build a Prompt Learning Loop - SallyAnn DeLucia & Fuad Ali, Arize

AI Engineer

| Podcasts | January 06, 2026 | 11.2 Thousand views | 52:08

TL;DR

SallyAnn DeLucia and Fuad Ali from Arize demonstrate how iterative "prompt learning"—combining automated evaluations with human explanatory feedback—can improve AI agent performance by 15% without fine-tuning, outperforming traditional optimization methods while reducing costs significantly.

🔧 Root Causes of Agent Failure 2 insights

Agent failures stem from instruction quality

Most agent breakdowns occur due to weak environment setup, static planning, and poor context engineering rather than inadequate foundation models.

Expertise silos hinder prompt optimization

A disconnect between technical developers and domain experts creates gaps, as subject matter experts possess critical user experience insights but often lack access to prompt engineering workflows.

🧠 The Prompt Learning Methodology 2 insights

Human explanations outperform scalar scores

Prompt learning leverages detailed text feedback explaining why responses failed, unlike reinforcement learning or metaprompting that rely solely on numerical rewards.

Continuous adaptation replaces static prompts

The methodology treats optimization as an ongoing loop where "overfitting" to domain data is reframed as developing expertise, using train-test splits to ensure rule generalization.

📈 Quantified Performance Gains 2 insights

15% gains achieved through rules alone

Adding explicit coding standards and error-handling rules to system prompts improved SWE-bench Light scores by 15%, enabling Claude 3.5 Sonnet to match Claude 3 Opus performance at 66% lower cost.

Superior efficiency versus evolutionary methods

Benchmarks against DSPy's GEA showed prompt learning achieving better performance in fewer optimization loops while emphasizing the critical need for high-quality LLM-as-a-Judge evaluators.

⚙️ Critical Implementation Factors 2 insights

Eval prompt quality determines success

The reliability of automated evaluation prompts is equally critical as agent prompts, requiring the same optimization rigor to provide trustworthy optimization signals.

Explicit rules replace vague instructions

Converting generic system prompts into specific, enforceable rules—such as mandatory testing protocols and error-handling procedures—delivers immediate reliability improvements without architectural changes.

Bottom Line

Implement a continuous prompt learning loop where domain experts provide explanatory text feedback on failures alongside automated evals, iteratively refining system instructions to build domain expertise without fine-tuning.

Watch on YouTube

More from AI Engineer

Agentic Search for Context Engineering — Leonie Monigatti, Elastic

AI Engineer

Agentic Search for Context Engineering — Leonie Monigatti, Elastic

Leonie Monigatti from Elastic argues that context engineering is fundamentally 80% agentic search, evolving from rigid RAG pipelines to dynamic agent-driven retrieval that must navigate diverse context sources through carefully curated, specialized search tools.

1 day ago · 9 points

Playground in Prod - Optimising Agents in Production Environments — Samuel Colvin, Pydantic

AI Engineer

Playground in Prod - Optimising Agents in Production Environments — Samuel Colvin, Pydantic

Samuel Colvin demonstrates optimizing AI agent prompts in production using Jepper, a genetic algorithm library that breeds high-performing prompt variations, combined with Logfire's managed variables for structured configuration and deterministic evaluation against golden datasets.

2 days ago · 8 points

Vibe Engineering Effect Apps — Michael Arnaldi, Effectful

AI Engineer

Vibe Engineering Effect Apps — Michael Arnaldi, Effectful

Michael Arnaldi demonstrates "vibe engineering" by building a TypeScript project with AI agents, revealing that cloning library repositories directly into your codebase—rather than using npm packages—enables AI to learn patterns from source code, while strict TypeScript and custom lint rules act as essential guardrails.

2 days ago · 8 points

Everything You Need To Know About Agent Observability — Danny Gollapalli and Ben Hylak, Raindrop

AI Engineer

Everything You Need To Know About Agent Observability — Danny Gollapalli and Ben Hylak, Raindrop

As AI agents grow more complex and autonomous, traditional pre-deployment testing fails to catch the infinite edge cases of production behavior. The video outlines a new observability paradigm combining explicit system metrics with implicit semantic signals and self-diagnostics to monitor agents in real-time.

2 days ago · 9 points

Browse more: 🎙️ Podcasts All Videos All Categories