Teach AI to Code in Every Language with NVIDIA NeMo | NVIDIA GTC
TL;DR
NVIDIA researchers demonstrate training a multilingual code generation model from scratch using 43x less data than typical foundation models, achieving 38.87% accuracy on HumanEval+ while supporting English/Spanish and Python/Rust through efficient data curation and checkpoint merging.
đź§ LLM Architecture & Reasoning 3 insights
Autoregressive token generation mechanics
Code LLMs operate by predicting the next token sequentially, feeding each generated token back into the input context until reaching maximum length or a completion signal.
Reasoning traces improve output quality
Models forced to generate step-by-step reasoning between <think> tags before coding produce more accurate results because the intermediate tokens provide additional beneficial context.
Agent-based tool orchestration
Advanced code LLMs can decompose user intent, invoke external tools like calculators or APIs, and synthesize inputs from specialized sub-agents such as bug trackers or documentation systems.
🌍 Multilingual Training Results 3 insights
Non-English prompts cause language confusion
Standard code models frequently generate incorrect programming languages or buggy code when prompted in Spanish or other non-English languages, despite understanding the underlying intent.
Resource-efficient training scale
The team trained a Qwen 3 1.7B parameter model on only 0.88 trillion tokens using 32 DGX A100 servers for 34 hours, compared to the original model's 35 trillion token training run.
Checkpoint merging technique
Averaging weights from separate checkpoints optimized for HumanEval and MBPP benchmarks yielded 38.87% accuracy on HumanEval+, effectively combining specialized capabilities into a single model.
⚙️ Data Pipeline & Optimization 3 insights
Four-phase data preparation
Quality training data requires cleaning and filtering, deduplication (lexical and semantic), strategic blending (71% code, 9% math, multilingual text), and tokenization to ensure balanced capability coverage.
Pre-training with SFT data injection
Including supervised fine-tuning datasets during the pre-training phase produces empirical gains that cannot be recovered during post-training alone, justifying the 0.5 trillion token blend approach.
Adaptive learning rate scheduling
Resetting learning rates between the pre-training (33K iterations) and post-training (15K iterations) phases prevents stagnation and allows continued improvement rather than converging at suboptimal points.
Bottom Line
Organizations can train specialized multilingual code models from scratch using less than 1 trillion tokens and modest GPU infrastructure by prioritizing high-quality data curation and strategic checkpoint merging over raw compute scaling.
More from NVIDIA AI Podcast
View all
Building Towards Self-Driving Codebases with Long-Running, Asynchronous Agents
Cursor co-founder Aman traces AI coding's evolution from autocomplete to synchronous agents, outlining the shift toward long-running async cloud agents that use multi-agent architectures to overcome context limits, and predicting a future of self-driving codebases with self-healing systems and minimal human intervention.
Accelerate AI through Open Source Inference | NVIDIA GTC
Industry leaders from NVIDIA, Hugging Face, Mistral AI, Black Forest Labs, and Lightricks discuss how open-source inference optimization—spanning quantization, latent compression, and Mixture of Experts architectures—is enabling both massive trillion-parameter models and efficient edge deployment while driving the shift toward sovereign AI and local data control.
Reinforcement Learning at Scale: Engineering the Next Generation of Intelligence
Former OpenAI researchers now leading frontier startups explain how reinforcement learning has evolved from game-playing agents to powering enterprise automation and scientific discovery, requiring new scaling paradigms focused on inference compute and long-horizon reasoning rather than just pre-training FLOPs.
Advancing to AI's Next Frontier: Insights From Jeff Dean and Bill Dally
Google's Jeff Dean and NVIDIA's Bill Dally discuss the rapid evolution toward autonomous AI agents capable of multi-day tasks and self-improvement, while detailing the radical hardware shifts—toward 'speed of light' latency and specialized inference chips—required to power this next frontier.