Research
NVIDIA Nemotron 3: Agentic AI with Open Hybrid Models
NVIDIA Nemotron 3: Agentic AI models that activate only 10% of their parameters to combine speed, accuracy, and cost control.
Agentic artificial intelligence has seen spectacular acceleration in recent months. In this context of excitement, NVIDIA makes a strong statement with Nemotron 3, a collection of models that shakes up established standards. More than a simple technological update, this initiative marks a strategic turning point for the GPU giant. By combining innovative architecture, radical transparency, and accessible tools, NVIDIA Nemotron aims to democratize the development of intelligent agents capable of reasoning, planning, and acting autonomously.
NVIDIA Nemotron Changes Foundation Model Paradigm
NVIDIA Nemotron represents much more than a family of large language models. This range stands out with a holistic approach that simultaneously provides the models, training datasets, and complete development recipes. This unprecedented transparency allows businesses and developers to understand exactly how these models were built, an essential condition for serene adoption in a professional environment.
The underlying architecture constitutes the true revolution of NVIDIA Nemotron. Engineers have designed a hybrid structure that combines three technologies: traditional Transformers, the Mamba 2 architecture derived from state-space models, and a sparse mixture-of-experts mechanism. This combination allows only 10% of the total parameters to be activated during inference, drastically reducing computational needs without sacrificing precision. For example, NVIDIA Nemotron 3 Nano has 30 billion parameters but only uses 3 billion per request, thereby optimizing the performance-cost ratio.

The context window extended to 1 million tokens represents another major asset of NVIDIA Nemotron. This capacity far exceeds current standards and allows for the processing of voluminous documents, prolonged conversations, or complex multi-document analyses. For agentic AI, this extended memory becomes crucial as agents must maintain the thread of their actions over extended periods.
Nemotron Range to Meet All Agentic Needs
The NVIDIA Nemotron collection comes in three main versions adapted to different usage scenarios. NVIDIA Nemotron 3 Nano, with its 30 billion parameters, targets applications where efficiency is paramount. Four times faster than its previous version, this model generates nearly 384 tokens per second, a remarkable performance for a model of this size. It is perfectly suited for agents needing to execute targeted tasks with minimal latency.
NVIDIA Nemotron 3 Super scales up with 120 billion parameters, of which 12 billion are activated. This configuration targets complex multi-agent environments requiring increased precision. Finally, NVIDIA Nemotron Ultra, the flagship of the range, boasts 253 billion parameters. Designed for the most demanding enterprise workflows, this model prioritizes absolute precision for critical applications such as customer service automation or sophisticated supply chain management.
Beyond pure reasoning, NVIDIA Nemotron offers specialized models for vision, retrieval-augmented generation, and security. NVIDIA Nemotron RAG models excel at extracting structured multimodal information, while safety barriers protect against harmful content and hijacking attempts.
Decisive Advantages for Modern Agentic Systems
NVIDIA Nemotron stands out as a tailored response to the challenges of agentic AI. Intelligent agents must chain complex reasoning, tool calls, and prolonged interactions. The hybrid architecture of NVIDIA Nemotron, combining computational efficiency and precision, precisely meets these requirements. The multi-step reasoning capability, refined by reinforcement learning, allows agents to break down complex problems into coherent sub-tasks.
The complete openness of the components also facilitates customization. Companies can fine-tune NVIDIA Nemotron on their proprietary data using the NeMo Gym and NeMo RL tools provided under an Apache license. This flexibility allows for adapting models to specific domains such as finance, healthcare, or engineering, without starting from scratch.
What to Remember About NVIDIA Nemotron Models
NVIDIA Nemotron marks a turning point in the recent history of generative artificial intelligence. By simultaneously focusing on technical performance, radical openness, and tool accessibility, NVIDIA offers an alternative vision to closed proprietary models.
This strategy directly addresses the needs of companies looking to control their AI infrastructure while benefiting from cutting-edge technologies. It remains to be seen if this approach will lead to the massive adoption hoped for by the NVIDIA teams. The era of customizable agentic AI is likely just beginning.
[1] NVIDIA. (2025). NVIDIA Nemotron 3: Efficient and open intelligence . arXiv. https://doi.org/10.48550/arXiv.2512.20856