Build Video Analytics AI Agents with Skills
TL;DR
NVIDIA introduces the Video Search and Summarization (VSS) blueprint for building vision AI agents that process billions of camera streams using vision language models and a new 'skills' framework, enabling deep video search and summarization 60x faster than manual review.
🏗️ VSS Blueprint Architecture 3 insights
Three-layer vision AI pipeline
Real-time feature extraction using VLMs and CV models feeds into a metadata database for offline agentic analytics like search and summarization.
Modular microservices design
Components can integrate into existing agents or applications, with training scripts provided for fine-tuning default models on custom data.
Multimodal processing support
As of June 1st, Neatron Omni models enable single-model processing of video, audio, and text modalities within the same architecture.
🛠️ Agent Skills Framework 3 insights
Pre-built workflow skills
Reference implementations for search, summarization, alerting, and reporting allow external agents (OpenClaw, CodeEx) to invoke VSS capabilities via standardized APIs.
Natural language deployment
An upcoming 'build a vision agent' skill will generate Docker Compose configurations from plain English descriptions to automate deployment packaging.
Advanced agentic search with critique
The system decomposes queries, fuses results from multiple embedding domains, and applies VLM-based critique to verify match accuracy before returning results.
⚡ Performance & Edge Deployment 3 insights
Real-time processing benchmarks
Deep agentic search with VLM critique completes in under 5 seconds, alert verification in under 3 seconds, and video summarization achieves 60x speedup over manual review.
Flexible hardware deployment
Edge deployment supported on AGX, IGX, and DGX Spark for offline operation, with 32GB GPUs handling component processing and 80GB GPUs required for fully local deployments.
Open source availability
VSS is free on GitHub with complete skills codebase releasing June 1st at GTC Taipei, including new capabilities for 3D tracking and third-party system integration.
Bottom Line
Developers can rapidly deploy production-ready video analytics AI agents using NVIDIA's open-source VSS blueprint and skills framework, eliminating the need to build vision AI infrastructure from scratch while maintaining full customization capabilities.
More from NVIDIA AI Podcast
View all
Ask the Experts: Nemotron 3 Nano Omni | Nemotron Labs
NVIDIA researchers detail the development of Nemotron 3 Nano Omni, explaining how they evolved a text-only model into a multimodal system capable of processing vision, audio, and video through progressive training stages while maintaining the hybrid Mamba-Transformer architecture.
Apr 14 - Jetson AI Lab Research Group Call - Tensor RT Edge LLM on Jetson & Culture
NVIDIA researchers Lynn Chai and Luc introduce TensorRT Edge LLM, a purpose-built inference engine for deploying large language models on Jetson edge devices, showcasing NVFP4 quantization and speculative decoding techniques that achieve up to 7x faster prefill speeds and 500 tokens per second generation while previewing a simplified vLLM-style Python API coming soon.
March 10 - Jetson AI Lab Research Group Call - Lightning talks
This Jetson AI Lab Research Group call features lightning talks on open-source hardware for remote Jetson access, a real-time emotional AI engine for robots running entirely on Jetson Nano, and updates to the Jetson AI Lab model repository with new performance benchmarks and deployment guides.
Feb 10 - Jetson AI Lab Research Group Call - Drones on Jetson & Isaac Lab on DGX Spark
Cameron Rose presents 'Operation Squirrel,' an autonomous drone project using Jetson Orin Nano for real-time target tracking and dynamic payload delivery. The system uses a modular C++ software stack with TensorRT-optimized YOLO and OSNet running at 21 FPS, communicating via UART with a flight controller to maintain following distance through velocity commands.