Enterprise AI Analysis
AgentSelect: Benchmark for Narrative Query-to-Agent Recommendation
This paper introduces AGENTSELECT, a new benchmark for narrative query-to-agent recommendation. It standardizes heterogeneous evaluation artifacts into query-conditioned supervision for learning to rank compositional agent configurations. Comprising 111,179 queries and 107,721 deployable agents from 40+ sources, it covers LLM-only, toolkit-only, and compositional agents. The analysis reveals a shift from dense head reuse to sparse, long-tail supervision, where content-aware capability matching is crucial. Models trained on AGENTSELECT show strong performance, transfer well to real-world marketplaces, and induce capability-sensitive behavior, providing a unified infrastructure to accelerate the agent ecosystem.
Executive Impact
The rise of AI agents for task automation presents a selection dilemma due to the explosion of configurations. AGENTSELECT addresses this by providing the first unified benchmark for ranking agents based on natural-language queries and their capability profiles (M,T). This enables learning to recommend end-to-end compositional agent configurations, a critical step towards democratized task automation. The benchmark’s structure supports robust, content-aware matching and its insights are transferable to real-world agent marketplaces.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
AGENTSELECT unifies model-only, tool-only, and compositional settings into positive-only query-agent interaction benchmarks. It converts heterogeneous evaluation artifacts into structured, positive-only interaction data, providing a consistent training and evaluation interface for learning rankers at scale.
Enterprise Process Flow
The benchmark consists of 111,179 narrative queries, an agent catalog of 107,721 deployable agents, and 251,103 positive query-agent interactions. This massive scale, combined with its positive-only implicit feedback design, positions AGENTSELECT as a robust foundation for agent recommendation research.
| Method Family | Key Findings |
|---|---|
| CF/GNN Methods |
|
| Content-aware DNN/Two-Tower Models |
|
| Generative Recommenders (OneRec) |
|
The evaluation reveals a significant regime shift from dense head reuse to long-tail, near one-off supervision. This means traditional popularity-based methods (CF/GNN) become fragile, and content-aware capability matching becomes essential for effective agent selection.
MuleRun Marketplace Transfer
Models trained on AGENTSELECT consistently outperform untuned baselines on an external MuleRun agent marketplace. This demonstrates transferable supervision and practical significance for real-world agent retrieval on unseen catalogs.
Counterfactual capability edits show that learned rankers shift preferences in expected directions, indicating capability-sensitive behavior. Furthermore, Part III synthesized interactions are learnable and improve coverage over realistic compositions.
Advanced ROI Calculator
This calculator estimates potential annual savings and hours reclaimed by optimizing AI agent selection in your enterprise. By efficiently matching narrative queries to the most suitable agents, organizations can reduce trial-and-error costs, accelerate task completion, and improve resource allocation. The efficiency gain and cost multiplier are adjusted based on industry specifics.
Implementation Roadmap
A typical phased approach to integrating intelligent agent selection within your enterprise:
Phase 1: Initial Assessment
Analyze existing agent usage, identify key pain points, and define core capabilities needed. Baseline current agent selection efficiency.
Phase 2: Benchmark Integration
Integrate AGENTSELECT benchmark data and train initial agent recommender models using enterprise-specific interaction logs (if available).
Phase 3: Pilot Deployment & Refinement
Deploy the agent recommender in a pilot environment. Collect feedback, monitor performance, and iteratively refine models for optimal matching.
Phase 4: Full-Scale Rollout
Expand the recommender to all relevant agent ecosystems. Implement continuous learning mechanisms and ensure seamless integration with existing workflows.
Ready to Optimize Your AI Agent Ecosystem?
Book a free consultation to explore how AGENTSELECT can transform your enterprise AI strategy.