Skill Issue: How We Used AI to Make Agents Actually Good at Supabase — Pedro Rodrigues, Supabase

| Podcasts | May 04, 2026 | 3.2 Thousand views | 1:18:41

TL;DR

Pedro Rodrigues from Supabase details how structured 'skills'—markdown-based instruction sets with progressive disclosure—dramatically improve AI agent performance with complex products, distinguishing them from MCP tools and establishing an evaluation-driven development framework for systematic testing.

📁 Skills Architecture & Progressive Disclosure 3 insights

Skills function as structured knowledge books

A skill consists of a skill.md file containing front matter (name and description) that acts as an 'index on steroids,' plus optional reference files and scripts stored in a references folder.

Progressive disclosure minimizes context bloat

Unlike loading all documentation immediately, skills use progressive disclosure where only the skill.md front matter loads initially, and the agent fetches additional reference files only when specifically needed.

Reference files create navigable knowledge graphs

Skills support nested references where files can point to other files, creating graph-like structures that allow agents to traverse complex information hierarchies efficiently.

⚖️ Skills vs MCP Tools 2 insights

Complementary roles for integrations and context

MCP tools handle external integrations and remote server-side execution, while skills provide progressive context disclosure and detailed workflows that exceed tool description character limits.

Local execution environment requirements

Unlike MCP tools that run remotely, skill scripts execute locally in the user's environment, requiring OS-specific compatibility (Linux, macOS, Windows) and bash access.

🧪 Evaluation-Driven Development 3 insights

Nondeterministic testing via evaluations

Traditional unit tests fail with LLMs; evaluations (evals) assess agent reasoning steps, tool usage patterns, and behavioral outcomes rather than exact output matching.

Iterative eval-driven development cycle

The framework involves defining success metrics, creating the skill, running controlled test scenarios with inputs/expected outputs, grading agent behavior, and iterating—similar to TDD but accounting for LLM variability.

Observability platforms track agent behavior

Tools like Brain Trust provide systematic evaluation execution with full observability into agent decision-making during controlled test scenarios.

💻 Practical Implementation at Supabase 2 insights

Prioritizing DAX over traditional DX

Supabase focuses on 'Developer Experience for Agents' (DAX) rather than just human DX, optimizing how AI agents interact with their backend-as-a-service platform and Postgres databases.

Performance review application demonstration

The workshop demonstrates building a skill to guide agents in fixing database errors within a performance review app containing four employees, specifically creating SQL views to resolve data issues.

Bottom Line

Combine MCP tools for external integrations with skills for progressive context disclosure, then systematically test using evaluation-driven development cycles that assess agent reasoning rather than deterministic outputs.

More from AI Engineer

View all
Ralph Loops: Build Dumb AI Loops That Ship — Chris Parsons, Cherrypick
AI Engineer AI Engineer

Ralph Loops: Build Dumb AI Loops That Ship — Chris Parsons, Cherrypick

Chris Parsons introduces 'Ralph Loops'—a minimalist automation approach where repeatedly prompting an AI agent with the same task outperforms complex orchestration workflows, leveraging the model's self-correction to ship better code with less maintenance.

about 17 hours ago · 9 points
TLMs: Tiny LLMs and Agents on Edge Devices with LiteRT-LM — Cormac Brick, Google
1:20:58
AI Engineer AI Engineer

TLMs: Tiny LLMs and Agents on Edge Devices with LiteRT-LM — Cormac Brick, Google

Cormac Brick from Google AI Edge introduces Tiny LLMs (TLMs) and on-device agent capabilities powered by LiteRT-LM and the new Gemma 4 models, demonstrating how fine-tuned small models (100M-4B parameters) can now deliver sophisticated AI experiences—including multimodal reasoning and tool use—directly on mobile phones, laptops, and even Raspberry Pis without cloud dependency.

1 day ago · 10 points