Securing Long-Running AI Agents: From Setup to Sandboxing

| Podcasts | July 02, 2026 | 662 views | 45:01

TL;DR

NVIDIA details the shift toward autonomous 'long-running' AI agents capable of independent multi-hour execution, introducing the NVIDIA Agent Toolkit featuring open Neotron models, packaged CUDA-X skills, and runtime security to enable scalable enterprise deployment.

🚀 The Evolution to Autonomous Agents 2 insights

Three inflection points in AI utility

The industry progressed from the ChatGPT moment (content generation) to the DeepSeek moment (reasoning models with 10x token growth) to the current 'Claw moment' where agents autonomously execute tasks without human-in-the-loop interaction.

Agents combine models with harnesses

Long-running agents consist of reasoning models plus a 'harness'—the orchestration layer, APIs, and data sources—that enables them to independently execute code and make decisions over hours based solely on outcome-based prompts.

🛠️ NVIDIA Agent Toolkit Architecture 3 insights

Neotron Ultra trained for tool use

Neotron Ultra, part of the open Neotron 3 family launching soon, was specifically trained on agent harnesses like OpenClaw and Hermes to excel at orchestration, delivering 5x speed and 30% lower inference cost than comparable models.

Skills convert libraries into agent tools

NVIDIA packaged its CUDA-X libraries as 'skills'—natural language instruction manuals that explain tool capabilities—allowing agents to understand and utilize complex software without loading entire codebases into their context window.

AIQ employs specialized multi-agent teams

The AIQ deep research skill uses a system of models where Neotron Nano routes intent, frontier models like GPT manage orchestration, and specialized Neotron 3 sub-agents handle specific disciplines like fact-gathering, critique, and synthesis.

🔒 Enterprise Security and Deployment 2 insights

ServiceNow autonomously resolves 90% of L1 tickets

Using NVIDIA's agent blueprint, ServiceNow deployed specialized triage and resolution agents that independently research past fixes and resolve 90% of Level 1 support tickets without human escalation.

Verified skills ensure runtime security

NVIDIA's verified skills program scans tools for vulnerabilities, evaluates cross-compatibility across different harnesses, and cryptographically signs skills so enterprises can enforce at runtime that only authorized, secure code executes.

Bottom Line

Enterprises should deploy autonomous AI by adopting model-agnostic agent harnesses with verified skill marketplaces and runtime sandboxing to safely enable long-running agents that independently execute complex business processes.

More from NVIDIA AI Podcast

View all
Build Video Analytics AI Agents with Skills
59:53
NVIDIA AI Podcast NVIDIA AI Podcast

Build Video Analytics AI Agents with Skills

NVIDIA introduces the Video Search and Summarization (VSS) blueprint for building vision AI agents that process billions of camera streams using vision language models and a new 'skills' framework, enabling deep video search and summarization 60x faster than manual review.

about 2 months ago · 9 points
Ask the Experts: Nemotron 3 Nano Omni | Nemotron Labs
48:56
NVIDIA AI Podcast NVIDIA AI Podcast

Ask the Experts: Nemotron 3 Nano Omni | Nemotron Labs

NVIDIA researchers detail the development of Nemotron 3 Nano Omni, explaining how they evolved a text-only model into a multimodal system capable of processing vision, audio, and video through progressive training stages while maintaining the hybrid Mamba-Transformer architecture.

about 2 months ago · 10 points
Apr 14 - Jetson AI Lab Research Group Call - Tensor RT Edge LLM on Jetson & Culture
51:38
NVIDIA AI Podcast NVIDIA AI Podcast

Apr 14 - Jetson AI Lab Research Group Call - Tensor RT Edge LLM on Jetson & Culture

NVIDIA researchers Lynn Chai and Luc introduce TensorRT Edge LLM, a purpose-built inference engine for deploying large language models on Jetson edge devices, showcasing NVFP4 quantization and speculative decoding techniques that achieve up to 7x faster prefill speeds and 500 tokens per second generation while previewing a simplified vLLM-style Python API coming soon.

about 2 months ago · 10 points