Teach AI to Code in Every Language with NVIDIA NeMo | NVIDIA GTC

NVIDIA AI Podcast

| Podcasts | April 10, 2026 | 2.02 Thousand views | 45:47

TL;DR

NVIDIA researchers demonstrate training a multilingual code generation model from scratch using 43x less data than typical foundation models, achieving 38.87% accuracy on HumanEval+ while supporting English/Spanish and Python/Rust through efficient data curation and checkpoint merging.

🧠 LLM Architecture & Reasoning 3 insights

Autoregressive token generation mechanics

Code LLMs operate by predicting the next token sequentially, feeding each generated token back into the input context until reaching maximum length or a completion signal.

Reasoning traces improve output quality

Models forced to generate step-by-step reasoning between <think> tags before coding produce more accurate results because the intermediate tokens provide additional beneficial context.

Agent-based tool orchestration

Advanced code LLMs can decompose user intent, invoke external tools like calculators or APIs, and synthesize inputs from specialized sub-agents such as bug trackers or documentation systems.

🌍 Multilingual Training Results 3 insights

Non-English prompts cause language confusion

Standard code models frequently generate incorrect programming languages or buggy code when prompted in Spanish or other non-English languages, despite understanding the underlying intent.

Resource-efficient training scale

The team trained a Qwen 3 1.7B parameter model on only 0.88 trillion tokens using 32 DGX A100 servers for 34 hours, compared to the original model's 35 trillion token training run.

Checkpoint merging technique

Averaging weights from separate checkpoints optimized for HumanEval and MBPP benchmarks yielded 38.87% accuracy on HumanEval+, effectively combining specialized capabilities into a single model.

⚙️ Data Pipeline & Optimization 3 insights

Four-phase data preparation

Quality training data requires cleaning and filtering, deduplication (lexical and semantic), strategic blending (71% code, 9% math, multilingual text), and tokenization to ensure balanced capability coverage.

Pre-training with SFT data injection

Including supervised fine-tuning datasets during the pre-training phase produces empirical gains that cannot be recovered during post-training alone, justifying the 0.5 trillion token blend approach.

Adaptive learning rate scheduling

Resetting learning rates between the pre-training (33K iterations) and post-training (15K iterations) phases prevents stagnation and allows continued improvement rather than converging at suboptimal points.

Bottom Line

Organizations can train specialized multilingual code models from scratch using less than 1 trillion tokens and modest GPU infrastructure by prioritizing high-quality data curation and strategic checkpoint merging over raw compute scaling.

Watch on YouTube

More from NVIDIA AI Podcast

Securing Long-Running AI Agents: From Setup to Sandboxing

NVIDIA AI Podcast

Securing Long-Running AI Agents: From Setup to Sandboxing

NVIDIA details the shift toward autonomous 'long-running' AI agents capable of independent multi-hour execution, introducing the NVIDIA Agent Toolkit featuring open Neotron models, packaged CUDA-X skills, and runtime security to enable scalable enterprise deployment.

10 days ago · 7 points

How NVIDIA Blackwell and NVIDIA Dynamo Scale AI Agents for Production

NVIDIA AI Podcast

How NVIDIA Blackwell and NVIDIA Dynamo Scale AI Agents for Production

NVIDIA Blackwell delivers up to 40x more concurrent AI agents per GPU than Hopper through its rack-scale NVL72 architecture and Dynamo framework, fundamentally shifting AI infrastructure measurement from token throughput to agent concurrency benchmarks.

13 days ago · 9 points

Build Video Analytics AI Agents with Skills

NVIDIA AI Podcast

Build Video Analytics AI Agents with Skills

NVIDIA introduces the Video Search and Summarization (VSS) blueprint for building vision AI agents that process billions of camera streams using vision language models and a new 'skills' framework, enabling deep video search and summarization 60x faster than manual review.

about 2 months ago · 9 points

Ask the Experts: Nemotron 3 Nano Omni | Nemotron Labs

NVIDIA AI Podcast

Ask the Experts: Nemotron 3 Nano Omni | Nemotron Labs

NVIDIA researchers detail the development of Nemotron 3 Nano Omni, explaining how they evolved a text-only model into a multimodal system capable of processing vision, audio, and video through progressive training stages while maintaining the hybrid Mamba-Transformer architecture.

2 months ago · 10 points

Browse more: 🎙️ Podcasts All Videos All Categories