NVIDIA AI-Q Achieves Top Rank in DeepResearch Benchmarks: Core Technologies and Prospects
NVIDIA AI-Q has recently achieved the top rank in two major deep research agent evaluation benchmarks, DeepResearch Bench (55.95) and DeepResearch Bench II (54.50). This represents a significant advancement in the field of open and portable deep research. It is noteworthy that developers can implement state-of-the-art agent-based research with accessible models and tools. In particular, NVIDIA AI-Q provides an open design blueprint for building AI agents based on enterprise and web data, allowing enterprises to customize, review, configure, and own their use cases.
The core of AI-Q is the ‘Deep Researcher,’ which is one of the workflows within a larger design blueprint, including intention routing, query clarification, and deep research. This deep researcher adopts a multi-agent architecture consisting of a planner, researcher, and orchestrator based on the NVIDIA NeMo Agent Toolkit, and leverages optimized NVIDIA Nemotron 3 Super models. NVIDIA AI-Q’s success demonstrates the synergistic effect of these technological elements.
The Significance of Deep Research Benchmark Victory
DeepResearch Bench I and II evaluate research agents in a complementary manner. DeepResearch Bench compares reference reports and quality in terms of completeness, depth of insight, adherence to instructions, and readability. DeepResearch Bench II uses over 70 detailed binary rubrics for each task to verify whether agents retrieve accurate information (Information Recall), synthesize it into higher-level analyses (Analysis), and present results clearly (Presentation). NVIDIA AI-Q achieves high scores on both of these benchmarks, proving that it not only produces visually appealing reports but also possesses both accurate information retrieval and in-depth analysis capabilities.
Overview of the AI-Q Deep Research Agent Architecture
The NVIDIA AI-Q deep research agent architecture operates around three core components: an orchestrator (adjusting the research loop), a planner (mapping the information landscape and designing evidence-based research plans), and a researcher (assigning experts to collect and synthesize evidence from various analytical perspectives). Each agent can be powered by different LLMs. An ensemble, which selectively uses multiple agents in parallel, may be used for maximum report quality and information coverage. This architecture highlights the flexibility and scalability of NVIDIA AI-Q.
Core Technology Stack: NVIDIA and Deep Research
Both leaderboard submissions use the same basic stack, which includes the following core components:
- NVIDIA NeMo Agent Toolkit: workflow wiring, function registration, evaluation
- LangChain DeepAgents: multi-step planner-researcher-orchestrator flow
- NVIDIA Nemotron 3 LLM: agent pipeline support
This technology stack forms a critical foundation supporting the efficiency and performance of NVIDIA AI-Q.
Key Components of AI-Q
The key components that have contributed to the success of NVIDIA AI-Q include:
- Multi-agent architecture for evidence-based planning and expert research
- Fine-tuned NVIDIA Nemotron 3 Super model leveraging real-world search and synthesis tractors
- Custom middleware for long-term reliability
- Ensemble research and report refiner (optional) for maximum report quality
These elements play a crucial role in maximizing the performance of NVIDIA AI-Q.
Fine-tuned NVIDIA Nemotron 3 Super: Data and Training
One of the key factors contributing to the success of NVIDIA AI-Q is the fine-tuned NVIDIA Nemotron-3-Super-120B-A12B model. This model is well-suited for multi-step agent inference, tool usage, and citation-based reporting and is effective for planner, researcher, and orchestrator roles through fine-tuning on real-world search and synthesis tractors.
To generate tractors, research questions were collected from OpenScholar, ResearchQA, and Fathom-DeepResearch-SFT, yielding approximately 17k, 21k, and 2457 questions respectively. Then, approximately 80k tractors were generated for the entire workflow using the GPT-OSS-120B model. These tractors include actual web search results from Tavily and Serper APIs, allowing the model to learn multi-step search and synthesis methods on real-world data. 67k tractors were used for training.
AI-Q Deep Researcher
The AI-Q deep researcher adopts a multi-agent architecture that includes an orchestrator, planner, and researcher, utilizing an iterative plan→collect→synthesize loop, citation management, and custom middleware for long-term reliability. Ensemble and refiner layers can be activated for maximum report quality. The multi-agent design also serves as a long context strategy. Each sub-agent operates within its own context window and returns synthesized output, so the orchestrator does not see the raw tool responses. This allows the orchestrator’s context to be focused and prevents verbose, noisy search results from degrading inference.
Conclusion and Future Prospects
NVIDIA AI-Q has achieved the top rank in DeepResearch Bench I and II, built on a multi-agent deep researcher, NVIDIA NeMo Agent Toolkit, fine-tuned NVIDIA Nemotron 3 model, and custom middleware. NVIDIA AI-Q provides an open, reproducible, and configurable-as-needed stack, ensuring state-of-the-art results while not compromising transparency and control. The success of NVIDIA AI-Q is a significant indicator of the future of AI agent technology, which is likely to be utilized in various industries in the future. Specifically, solutions like NVIDIA AI-Q will play a critical role in enabling enterprises to leverage AI agents to enhance productivity and create new business opportunities.
In-depth Analysis and Implications
Array
Original Source: How NVIDIA AI-Q Reached #1 on DeepResearch Bench I and II
English
한국어