AI RESEARCH ANALYSIS
SciMaster: Towards General-Purpose Scientific AI Agents
This groundbreaking research introduces X-Master, a novel tool-augmented AI agent, and X-Masters, a scattered-and-stacked agentic workflow. This system sets a new state-of-the-art on Humanity's Last Exam (HLE) with 32.1%, surpassing all previous closed-source models and becoming the first to cross the 30% threshold. By leveraging flexible tool interaction and a multi-agent cognitive process, X-Masters paves the way for accelerating scientific discovery and tackling complex, frontier-level problems.
Executive Impact: Pioneering Scientific AI
X-Masters represents a significant leap in AI capabilities, demonstrating that open-source solutions can lead in complex scientific reasoning. Its performance on Humanity's Last Exam signals a new era for AI in research and development, offering unprecedented opportunities for innovation across various industries.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Foundation: X-Master's Design Philosophy
X-Master serves as the foundational tool-augmented reasoning agent, designed to emulate human researchers. Its core strength lies in its ability to flexibly interact with external tools during its deep thinking process. The architecture provides the groundwork for enhancing general AI agent capabilities.
A key innovation is the conceptualization of code as an interaction language. When faced with problems it cannot solve internally, X-Master formulates precise plans as Python code blocks. These are then executed to leverage various resources, from built-in libraries like NumPy and SciPy to custom-designed tools for web searches and data extraction. The execution results are seamlessly integrated back into the agent's context, enriching its understanding and informing subsequent reasoning.
Furthermore, Initial Reasoning Guidance, a mechanism of carefully crafted guiding texts embedded after the model's initial thought token, empowers even inherently non-agentic models to spontaneously generate and execute code, effectively functioning as capable agents without explicit finetuning for agentic behavior.
Scaling Intelligence: The X-Masters Workflow
To unlock the full potential of X-Master, the research introduces X-Masters, a scattered-and-stacked agentic workflow. This multi-agent cognitive process systematically enhances both the breadth and depth of reasoning at inference time. It orchestrates instances of X-Master into specialized roles:
- Solver: Generates five diverse initial solutions in parallel, fostering broad exploration.
- Critic: Evaluates and refines each initial solution, identifying flaws and providing corrections.
- Rewriter: Synthesizes all preceding outputs into five new, superior solutions, enhancing depth and quality.
- Selector: Adjudicates and selects the single best answer from the rewritten solutions.
This "scattered" phase explores diverse solution paths by leveraging stochastic decoding, while the "stacked" phase aggregates insights and iteratively refines solutions, ensuring a robust and high-quality final output. This structured thinking significantly scales the agent's intelligence for complex problem-solving.
Dynamic Tool Integration & Web Interaction
X-Master's flexible tool integration is a cornerstone of its intelligence. By expressing interactions as Python code, the agent can leverage a wide spectrum of tools, from existing Python libraries to custom-built modules, and even dynamically generate new tools as needed.
For information-seeking tasks, two core custom-built tools are emphasized:
- Web Search: Leverages search engines to identify relevant webpages. It provides concise summaries, related search queries, and can extract entity-related facts from knowledge graphs, allowing the agent to rapidly comprehend context and prioritize deeper exploration.
- Web Parse: Offers two strategies: General webpage parsing to extract primary content and identify relevant segments, including subpage links for deeper exploration; and Scientific paper parsing, which retrieves HTML versions from arXiv or downloads PDF documents to extract information directly related to the query.
These tools enable X-Master to emulate human-like online information-seeking behavior, iteratively acquiring novel information or validating existing conclusions, thereby transcending the limitations of its inherent knowledge base.
Benchmark Performance: Leading Humanity's Last Exam
X-Masters achieves a new state-of-the-art on Humanity's Last Exam (HLE), a critical touchstone for evaluating scientific AI agents. Its performance demonstrates superior reasoning capabilities and the effectiveness of its tool-augmented agentic workflow.
- HLE Score: Achieves an unprecedented 32.1%, surpassing leading closed-source models from OpenAI (26.6%) and Google DeepMind (26.9%) and becoming the first system to cross the 30% threshold.
- Biology/Medicine Category (HLE): Attains 27.6% accuracy, outperforming specialized scientific agents like STELLA (26%) and Biomni (17.3%) on text-only questions, highlighting its advanced capabilities in complex biomedical problems.
- TRQA-lit (choice): Achieves 67.4% on this specialized biology benchmark, outperforming other models like DeepSeek-R1-0528 (62.1%) and OriGene (60.1%), further cementing its efficacy in research tasks involving therapeutic target identification and biomedical mechanisms.
This leading performance underscores the potential of X-Masters to accelerate scientific discovery and provides valuable insights for future AI advancements.
Enterprise Process Flow: X-Masters Agentic Workflow
| Model | HLE Score (%) | Key Features |
|---|---|---|
| X-Masters (Ours) | 32.1% |
|
| Kimi-Researcher | 26.9% |
|
| Gemini Deep Research | 26.9% |
|
| OpenAI Deep Research | 26.6% |
|
| Gemini 2.5 Pro | 21.6% |
|
| DeepSeek-R1-0528 | 17.7% |
|
| Claude Opus | 10.7% |
|
Case Study: Dynamic Problem Solving with X-Master
Problem: A user queries, "Fourteen circles of radius one are tightly packed into another circle. What is the radius of this circle up to 4 significant digits?" This requires precise numerical calculation and access to specialized knowledge.
X-Master's Approach:
1. Initial Web Search: X-Master initiates by using its web_search tool with keywords like "circle packing 14 equal circles inside a circle". It identifies relevant links, including Wikipedia.
2. Information Extraction: It then employs the web_parse tool on the Wikipedia page, successfully extracting the specific detail: "The minimal radius for 14 unit circles is ≈ 4.328".
3. Cross-Verification & Refinement: To ensure accuracy and precision, X-Master then attempts to find higher precision data. It tries parsing another academic site and, when that fails to yield a numeric value, flexibly shifts strategy. It then uses web_parse on an "Engineering Toolbox" calculator page to cross-check the diameter, verifying the previously found radius. This demonstrates iterative refinement and tool-adjustment capabilities.
4. Final Answer: Based on the verified information, X-Master concludes that 4.328 is the accepted value, already expressed to four significant digits, and provides this as the final answer.
This case study exemplifies X-Master's ability to combine internal reasoning with flexible, iterative external tool usage, adjusting strategies when encountering unexpected results, and validating conclusions for complex, knowledge-intensive problems.
Calculate Your Enterprise AI ROI
Estimate the potential cost savings and reclaimed hours your organization could achieve by implementing advanced AI agents like X-Masters.
Your Path to Advanced AI Implementation
Implementing sophisticated AI agents like X-Masters requires a strategic, phased approach to ensure seamless integration and maximum impact.
Phase 01: Strategic Assessment & Pilot
Conduct a deep dive into existing workflows, identify high-impact automation opportunities, and design a pilot program for X-Master integration within a critical scientific or research domain.
Phase 02: Customization & Tool Integration
Tailor X-Master's tool-augmented reasoning to your specific enterprise data sources and scientific tools. Develop custom connectors and refine the agentic workflow to optimize for your research challenges.
Phase 03: Scaled Deployment & Training
Expand X-Masters across your enterprise. Implement robust monitoring, security protocols, and provide comprehensive training to your research teams to maximize adoption and leverage its full capabilities.
Phase 04: Continuous Optimization & Innovation
Establish a feedback loop for continuous improvement. Adapt X-Masters to evolving research frontiers, integrate new scientific breakthroughs, and foster a culture of AI-driven innovation.
Ready to Transform Your Research?
Schedule a consultation with our AI strategists to explore how X-Masters can accelerate your scientific discovery and drive unprecedented innovation within your enterprise.