Build a Document Intelligence Pipeline With Nemotron RAG | Nemotron Labs

NVIDIA AI Podcast

| Podcasts | February 10, 2026 | 3.23 Thousand views | 58:30

TL;DR

This video demonstrates how to build a multimodal RAG pipeline using NVIDIA's Nemotron models to process complex enterprise documents, solving the 'linearization loss' problem by jointly embedding text and images for more accurate document Q&A.

📄 The Linearization Problem 2 insights

Traditional RAG destroys document structure

Standard PDF extractors convert tables, charts, and figures into plain text, causing 'linearization loss' where critical visual relationships and structural context are permanently lost.

Real documents require visual understanding

Enterprise documents contain complex layouts with double columns, pie charts, and bar graphs that require human-like visual parsing beyond simple text extraction to interpret correctly.

🔄 Multimodal Pipeline Architecture 3 insights

Four-stage intelligent document processing

The pipeline combines extraction (Nemo Retriever/NV-Ingest), multimodal embedding (Nemotron Embed), cross-encoder re-ranking, and generation (Nemotron Super 49B).

Joint embedding space unifies text and images

Nemotron Embed projects both text and visual elements into the same vector space, allowing semantic similarity search across modalities where images and descriptive text cluster together.

Re-ranking improves retrieval precision

A cross-encoder re-ranker performs fine-grained relevance scoring on the top retrieved chunks to verify contextual accuracy before sending context to the reasoning model.

⚙️ Implementation Details 2 insights

Open source with commercial licensing

All components including the Nemo Retriever extraction library, embedding models, reranker, and Nemotron Super are open source and free for commercial use via Hugging Face.

Flexible deployment options

The pipeline runs on both T4 and H100 GPUs with a fallback mechanism for flash attention, offering library mode for development and container mode for horizontal enterprise scaling.

Bottom Line

Adopt multimodal RAG pipelines that process documents as both text and images to eliminate linearization loss and achieve accurate retrieval from complex enterprise documents containing tables and charts.

Watch on YouTube

More from NVIDIA AI Podcast

Apr 14 - Jetson AI Lab Research Group Call - Tensor RT Edge LLM on Jetson & Culture

NVIDIA AI Podcast

Apr 14 - Jetson AI Lab Research Group Call - Tensor RT Edge LLM on Jetson & Culture

NVIDIA researchers Lynn Chai and Luc introduce TensorRT Edge LLM, a purpose-built inference engine for deploying large language models on Jetson edge devices, showcasing NVFP4 quantization and speculative decoding techniques that achieve up to 7x faster prefill speeds and 500 tokens per second generation while previewing a simplified vLLM-style Python API coming soon.

5 days ago · 10 points

March 10 - Jetson AI Lab Research Group Call - Lightning talks

NVIDIA AI Podcast

March 10 - Jetson AI Lab Research Group Call - Lightning talks

This Jetson AI Lab Research Group call features lightning talks on open-source hardware for remote Jetson access, a real-time emotional AI engine for robots running entirely on Jetson Nano, and updates to the Jetson AI Lab model repository with new performance benchmarks and deployment guides.

5 days ago · 8 points

Feb 10 - Jetson AI Lab Research Group Call - Drones on Jetson & Isaac Lab on DGX Spark

NVIDIA AI Podcast

Feb 10 - Jetson AI Lab Research Group Call - Drones on Jetson & Isaac Lab on DGX Spark

Cameron Rose presents 'Operation Squirrel,' an autonomous drone project using Jetson Orin Nano for real-time target tracking and dynamic payload delivery. The system uses a modular C++ software stack with TensorRT-optimized YOLO and OSNet running at 21 FPS, communicating via UART with a flight controller to maintain following distance through velocity commands.

5 days ago · 9 points

Jan 13: Jetson AI Lab Research Group Call - Accelerating Robotics with Isaac ROS on Jetson

NVIDIA AI Podcast

Jan 13: Jetson AI Lab Research Group Call - Accelerating Robotics with Isaac ROS on Jetson

NVIDIA's Isaac ROS team explains how their NITROS framework eliminates costly GPU memory copies in ROS 2 to enable a new era of "Physical AI" where end-to-end learned policies replace traditional robotic control, requiring tight integration of accelerated computing from simulation to deployment on Jetson.

5 days ago · 8 points

Browse more: 🎙️ Podcasts All Videos All Categories