CUDA: New Features and Beyond | NVIDIA GTC

| Podcasts | March 31, 2026 | 6.94 Thousand views | 44:27

TL;DR

This presentation outlines CUDA's evolution toward 'guaranteed asymmetric parallelism,' introducing Green Contexts to enable dynamic GPU resource partitioning for disaggregated AI inference workloads, while previewing future multi-node CUDA graphs that will orchestrate computations across entire data centers.

🔀 The Shift from Symmetric to Asymmetric Parallelism 4 insights

Traditional symmetric execution limits utilization

Conventional CUDA grid launches run identical workloads across all 160 SMs of a Blackwell GPU sequentially, preventing simultaneous execution of different tasks.

AI inference phases have opposing resource needs

Prefill phases are compute-bound requiring matrix operations, while decode phases are memory bandwidth-bound, making uniform provisioning inefficient.

Disaggregation delivers 10x performance gains

Running prefill and decode workers on separately configured GPU partitions eliminates resource starvation and right-sizes hardware for each phase.

Dynamic orchestration manages unpredictable workloads

NVIDIA Dynamo orchestrates these disaggregated systems, dynamically balancing resources between context-heavy queries and token-generation-heavy reasoning tasks.

🟢 Green Contexts: Dynamic GPU Partitioning 4 insights

Green contexts bridge streams and MPS

This new mechanism sits between CUDA streams (too opportunistic) and Multi-Process Service (too static) to enable guaranteed asymmetric parallelism within a single process.

Sandboxed resource allocation without code changes

Developers create descriptors to partition SMs (e.g., dividing Blackwell's 160 units), and kernels run oblivious to their constrained sandboxes, enabling true multiplexing.

Graphs span multiple green contexts

CUDA graphs can now capture workflows targeting different green contexts, allowing single-launch orchestration of heterogeneous tasks across partitioned GPU resources.

Enables dynamic reconfiguration patterns

Green contexts support nested hierarchies, low-latency reservations, and dynamic repartitioning to respond to changing workload demands without restarting applications.

🌐 Future: Data Center-Scale CUDA 3 insights

Graphs will span racks and data centers

NVIDIA aims to extend CUDA graphs beyond single nodes to orchestrate work across NVLink-72 racks and eventually 100,000+ GPU data centers as unified compute fabrics.

System-level naming and topology required

Multi-node execution requires CUDA to provide consistent naming conventions and topology awareness across complex dragonfly networks so all nodes agree on resource locations.

Centralized control without centralized bottlenecks

Future CUDA will enable single-controller orchestration of disaggregated workloads while maintaining fine-grained guarantees about where and when computations execute across the infrastructure.

Bottom Line

Adopt Green Contexts now to dynamically partition GPUs for asymmetric inference workloads, positioning applications for future data center-scale CUDA orchestration.

More from NVIDIA AI Podcast

View all
The State of Open Source AI | NVIDIA GTC
36:11
NVIDIA AI Podcast NVIDIA AI Podcast

The State of Open Source AI | NVIDIA GTC

Leading researchers and executives discuss how open source AI has evolved from a values-based movement into a viable commercial ecosystem, with companies like NVIDIA, Databricks, and Hugging Face demonstrating that open-weight models and transparent research can drive both industry innovation and sustainable business models through cloud services and foundation model programs.

3 days ago · 10 points
Agentic AI 101 | NVIDIA GTC
38:49
NVIDIA AI Podcast NVIDIA AI Podcast

Agentic AI 101 | NVIDIA GTC

This session traces the rapid evolution of AI from simple chatbots to autonomous 'agentic' systems capable of reasoning, coding new abilities, and collaborating in multi-agent networks, while demonstrating how developers can now build functional AI agents using modular tools and NVIDIA's open blueprints.

6 days ago · 10 points