Build Custom Large-Scale Generative AI Models | NVIDIA GTC

NVIDIA AI Podcast

| Podcasts | April 08, 2026 | 1.71 Thousand views | 39:02

TL;DR

Adobe's CTO explains why the company chose to build proprietary generative AI models from scratch to ensure legal compliance and creative control, then details how they discovered that naive scaling approaches resulted in GPUs sitting idle 60-70% of the time due to coordination bottlenecks.

🎯 Strategic Decision to Build 3 insights

Professional creatives reject prompt roulette

Off-the-shelf models produced random outputs unsuitable for Adobe's customers, who required precise iterative control rather than gambling with text prompts to achieve their specific vision.

Legal liability blocked enterprise adoption

Enterprise legal departments refused available models due to copyright and IP training risks, forcing Adobe to use fully licensed, human-moderated datasets with complete provenance tracing.

Differentiation justified massive investment

Adobe determined that control and legal compliance provided sufficient competitive differentiation to justify building custom frontier models despite requiring millions in GPU infrastructure.

🛠️ Initial Technical Architecture 3 insights

Naive scaling with PyTorch Lightning

The team initially approached large-scale training as simply adding more GPUs to standard loops, using PyTorch Lightning and AWS S3 storage with thousands of NVIDIA A100s.

Simple data parallelism strategy

They implemented straightforward data parallelism that split petabytes of training data across GPUs, where each processed independently before a manager node collated updates.

Early validation masked inefficiency

The first working model validated the technical approach but exhibited typical early-generation artifacts while hiding severe resource underutilization that threatened cost viability.

⚠️ The Utilization Crisis 3 insights

GPUs idle 60-70% of training time

Profiling revealed GPUs sat empty approximately two-thirds of the time waiting for a manager node to gather and merge model updates, effectively wasting $600,000 of every $1 million spent.

Data parallelism fails beyond 16 GPUs

The straightforward data parallelism approach creates exponentially worse coordination bottlenecks when scaling beyond roughly 16 parallel processors, making it unsuitable for frontier model training.

Storage and checkpointing bottlenecks

Loading petabytes from distributed storage and saving massive checkpoints for insurance created constant stalls, compounded by CPU preprocessing delays and unnecessary GPU synchronization.

Bottom Line

Scaling AI training requires fundamental pipeline architecture changes to eliminate coordination overhead and storage bottlenecks, not simply adding more GPUs, as standard data parallelism becomes exponentially inefficient beyond small clusters.

Watch on YouTube

More from NVIDIA AI Podcast

Securing Long-Running AI Agents: From Setup to Sandboxing

NVIDIA AI Podcast

Securing Long-Running AI Agents: From Setup to Sandboxing

NVIDIA details the shift toward autonomous 'long-running' AI agents capable of independent multi-hour execution, introducing the NVIDIA Agent Toolkit featuring open Neotron models, packaged CUDA-X skills, and runtime security to enable scalable enterprise deployment.

7 days ago · 7 points

How NVIDIA Blackwell and NVIDIA Dynamo Scale AI Agents for Production

NVIDIA AI Podcast

How NVIDIA Blackwell and NVIDIA Dynamo Scale AI Agents for Production

NVIDIA Blackwell delivers up to 40x more concurrent AI agents per GPU than Hopper through its rack-scale NVL72 architecture and Dynamo framework, fundamentally shifting AI infrastructure measurement from token throughput to agent concurrency benchmarks.

10 days ago · 9 points

Build Video Analytics AI Agents with Skills

NVIDIA AI Podcast

Build Video Analytics AI Agents with Skills

NVIDIA introduces the Video Search and Summarization (VSS) blueprint for building vision AI agents that process billions of camera streams using vision language models and a new 'skills' framework, enabling deep video search and summarization 60x faster than manual review.

about 2 months ago · 9 points

Ask the Experts: Nemotron 3 Nano Omni | Nemotron Labs

NVIDIA AI Podcast

Ask the Experts: Nemotron 3 Nano Omni | Nemotron Labs

NVIDIA researchers detail the development of Nemotron 3 Nano Omni, explaining how they evolved a text-only model into a multimodal system capable of processing vision, audio, and video through progressive training stages while maintaining the hybrid Mamba-Transformer architecture.

about 2 months ago · 10 points

Browse more: 🎙️ Podcasts All Videos All Categories