Everything I Learned Training Frontier Small Models — Maxime Labonne, Liquid AI

AI Engineer

| Podcasts | April 29, 2026 | 30 Thousand views

TL;DR

Maxime Labonne explains that small language models (350M–24B parameters) for edge deployment face unique architectural and training challenges distinct from simply scaling down large models, requiring specialized solutions like short convolutions, massive over-training, and targeted reinforcement learning to overcome memory constraints and 'doom looping' while excelling at agentic tool use.

💾 Architecture for Memory-Bound Devices 3 insights

Embedding layer inefficiency in small models

Mainstream small models like Gemma 3 270M and Qwen 3.5 0.8B waste 63% and 29% of parameters respectively on massive embedding layers due to distillation from large-vocabulary teachers, leaving fewer effective parameters for reasoning.

Short convolutions maximize throughput

LFM2 utilizes gated short convolutions that profile significantly faster than sliding window attention and GQA on both AMD Ryzen CPUs and Samsung Galaxy S25 Ultra devices while using less memory.

Hardware-first optimization

Effective edge architecture requires on-device profiling on target hardware rather than theoretical optimization, identifying operators that minimize latency under strict memory constraints.

📈 Training Beyond Traditional Scaling Laws 3 insights

Massive over-training improves small models

Liquid AI trains its 350M parameter model on 28 trillion tokens, violating Chinchilla scaling laws but aligning with newer test-time scaling laws and delivering significant gains across knowledge and tool-use benchmarks.

RL efficiency at small scale

Reinforcement learning is extremely effective for small models but requires diverse task environments and careful curation of cold-start SFT data to prevent training instability.

Narrow capability focus

Small models should specialize in specific high-value tasks like data extraction and function calling rather than competing as general-purpose chatbots or math solvers.

🔄 Solving Doom Looping 3 insights

The doom looping phenomenon

Small reasoning models frequently enter infinite repetition loops on complex tasks, with Qwen 3.5 0.8B experiencing over 50% doom loops compared to LFM 2.5's near-zero rate after optimization.

Preference alignment with diverse rollouts

Generating five temperature-sampled rollouts versus one greedy rollout during on-policy DPO data generation allows an LLM jury to identify and reject looping sequences as negative examples.

RL with verifiable rewards

Combining reinforcement learning with verifiable rewards and n-gram repetition penalties reduced LFM 2.5 1.2B's doom loop ratio from 16% to almost zero.

🤖 Agentic Deployment Strategy 2 insights

Tools compensate for knowledge gaps

Small models overcome low parametric knowledge capacity through web search and Python tools, making them highly capable agents for latency-sensitive, offline, or privacy-critical environments like automotive and healthcare.

Recursive environments bypass context limits

Long-context limitations can be solved through recursive agentic workflows and code execution rather than attempting to scale context windows architecturally.

Bottom Line

Treat small models as specialized, agentic systems that leverage external tools and extensive pre-training rather than attempting to build miniature general-purpose chatbots.

Watch on YouTube

More from AI Engineer

Human-in-the-Loop Automation with n8n — Liam McGarrigle

AI Engineer

Human-in-the-Loop Automation with n8n — Liam McGarrigle

Liam McGarrigle demonstrates building AI agents in n8n using visual workflows, emphasizing transparent orchestration over black-box automation through configurable memory, chat triggers, and tool integration for practical business applications.

about 7 hours ago · 9 points

Mastering AI Pricing: Flexible & Agile Monetization — Mayank Pant, Stripe

AI Engineer

Mastering AI Pricing: Flexible & Agile Monetization — Mayank Pant, Stripe

AI companies are growing three times faster than traditional SaaS but face unique pricing challenges due to unpredictable compute costs and razor-thin margins, requiring a shift from static subscription models to flexible hybrid pricing that prioritizes rapid iteration and customer-perceived value over technical metrics.

1 day ago · 10 points

Shipping complex AI applications — Braintrust & Trainline

AI Engineer

Shipping complex AI applications — Braintrust & Trainline

This workshop demonstrates how to bridge the gap between AI prototypes and production systems using Brain Trust's observability platform, featuring Trainline's experience deploying multi-agent AI applications serving 27 million users.

1 day ago · 10 points

Building Conversational Agents — Thor Schaeff and Philipp Schmid, Google DeepMind

AI Engineer

Building Conversational Agents — Thor Schaeff and Philipp Schmid, Google DeepMind

Google DeepMind engineers Thor Schaeff and Philipp Schmid demonstrate building conversational agents using the new Gemini Interactions API, a unified interface that supports both direct model inference and complex autonomous agents like Deep Research with server-side state management and asynchronous execution.

3 days ago · 9 points

Browse more: 🎙️ Podcasts All Videos All Categories