NVIDIA Nemotron-3 Super: Open-Source 120B Parameter AI Model for Agents

NVIDIA’s Nemotron 3 Super Aims to Streamline Complex AI Workflows

NVIDIA today launched Nemotron 3 Super, a 120-billion-parameter open model designed to accelerate the development and deployment of “agentic AI” systems – applications that use autonomous agents to perform complex tasks. The model, boasting 12 billion active parameters, focuses on improving both the efficiency and accuracy of these systems, addressing key challenges like context management and computational cost. It’s available now through platforms like build.nvidia.com, Perplexity, OpenRouter and Hugging Face, with Dell Technologies integrating it into the Dell AI Factory for on-premise deployment.

The Challenge of Multi-Agent Systems

As AI applications move beyond simple chatbots and into more sophisticated multi-agent workflows, developers face two primary hurdles. The first is “context explosion.” Multi-agent systems, where multiple AI agents collaborate, require significantly more data processing than traditional chat applications. According to NVIDIA, these workflows can generate up to 15 times more tokens – the basic units of text processing – because each interaction necessitates resending complete histories, including tool outputs and reasoning steps. This increased context volume drives up costs and can lead to “goal drift,” where agents lose focus on the original objective.

The second challenge is the “thinking tax.” Complex agents need to reason at every step, but relying on large models for every subtask can make multi-agent applications prohibitively expensive and slow. Nemotron 3 Super attempts to mitigate both of these issues.

How Nemotron 3 Super Works: A Hybrid Approach

Nemotron 3 Super tackles these challenges through a hybrid architecture combining three key innovations. It features a 1-million-token context window, allowing agents to retain a much larger amount of workflow state in memory, thereby preventing goal drift. This is coupled with a mixture-of-experts (MoE) architecture where only 12 billion of the model’s 120 billion parameters are active during inference, reducing computational load. The model also utilizes Mamba layers, which NVIDIA claims deliver four times higher memory and compute efficiency compared to traditional transformer layers, while still enabling advanced reasoning capabilities.

A novel technique called “Latent MoE” further enhances accuracy by activating four specialist experts for the cost of one when generating the next token. Finally, multi-token prediction allows the model to predict multiple future words simultaneously, resulting in a reported 3x speed increase in inference.

Performance and Efficiency Gains

NVIDIA highlights the model’s efficiency gains, claiming up to 5x higher throughput and 2x higher accuracy compared to the previous Nemotron Super model. Testing by Artificial Analysis shows Nemotron 3 Super achieving 11% higher throughput per NVIDIA B200 GPU than gpt-oss-120b. The model is optimized to run on the NVIDIA Blackwell platform using NVFP4 precision, which reduces memory requirements and accelerates inference by up to 4x compared to FP8 on NVIDIA Hopper, without sacrificing accuracy.

Applications Across Industries

NVIDIA is positioning Nemotron 3 Super for a wide range of applications. AI-native companies like Perplexity are integrating the model into their search offerings, while software development firms such as CodeRabbit, Factory, and Greptile are leveraging it to improve the accuracy and cost-effectiveness of their AI agents. Life sciences organizations, including Edison Scientific and Lila Sciences, plan to use the model for deep literature research, data science, and molecular understanding.

Enterprise adoption is also a key focus. Industry leaders like Amdocs, Palantir, Cadence, Dassault Systèmes, and Siemens are deploying and customizing the model to automate workflows in sectors like telecom, cybersecurity, semiconductor design, and manufacturing. Specifically, Siemens is utilizing the model within its EDA AI System.

The model’s ability to handle large contexts makes it particularly well-suited for tasks like loading entire codebases for end-to-end code generation and debugging, or processing thousands of pages of financial reports for analysis. Its high-accuracy tool calling is also crucial for autonomous agents navigating complex function libraries in security orchestration scenarios.

Open Weights and Accessibility

NVIDIA is releasing Nemotron 3 Super with open weights under a permissive license, allowing developers to deploy and customize it on various platforms, from workstations to the cloud. The model was trained on synthetic data generated using advanced reasoning models, and NVIDIA is publishing the complete methodology, including over 10 trillion tokens of pre- and post-training datasets, along with 15 training environments and evaluation recipes. Researchers can further fine-tune the model using the NVIDIA NeMo platform.

Broad Ecosystem Support

Beyond direct access through NVIDIA’s platforms, Nemotron 3 Super is being integrated into a growing ecosystem of cloud service providers, including Google Cloud’s Vertex AI, Oracle Cloud Infrastructure, and soon, Amazon Web Services through Amazon Bedrock and Microsoft Azure. NVIDIA cloud partners like Coreweave, Crusoe, and Together AI are also offering access, as are inference service providers such as Baseten, Cloudflare, DeepInfra, Fireworks AI, Inference.net, Lightning AI, and FriendliAI. Data platforms and services like Distyl, Dataiku, DataRobot, Deloitte, EY, and Tata Consultancy Services are also supporting the model. The model is packaged as a NVIDIA NIM microservice for streamlined deployment.

The release of Nemotron 3 Super represents a significant step towards more efficient and scalable agentic AI systems. Further development and adoption will likely focus on refining the model’s performance, expanding its application across diverse industries, and addressing the ongoing challenges of context management and computational cost in complex AI workflows. The open-weight nature of the model should encourage community contributions and accelerate innovation in the field.