Research Article

TOOL-CURE: Tool Selection via Curriculum-Enhanced Reinforcement Learning with Sample Screening for LLMs

Large language models (LLMs) are increasingly deployed as intelligent agents capable of executing complex real-world tasks through external tool interactions, but effective tool selection remains challenging due to the inherent limitations of real-world training data. These datasets suffer from severe tool imbalance following long-tail distributions, data scarcity for specialized tools, logic conflicts between user queries and available tools, rapidly evolving toolsets, and the presence of subpar samples including partially correct and dirty examples. Existing supervised fine-tuning (SFT) approaches struggle with these multifaceted challenges as they require abundant high-quality data, treat all labeled examples as ground truth regardless of quality, and lack the flexibility to generalize beyond specific query-tool pairings seen during training. While reinforcement learning (RL) offers a promising alternative through outcome-based learning, vanilla approaches like Group Relative Policy Optimization (GRPO) suffer from training instability due to conflicting reward signals and inefficient learning from weak signals. To address these issues, we propose TOOL-CURE, a novel method with two key improvements to GRPO: Proficiency-Scaled Curriculum Learning (PSCL), which organizes training into a two-stage curriculum that builds foundational skills on easier samples before progressing to harder ones, and Online Policy Guarding via Sample Screening (OPGSS), which continuously assesses roll-out quality and masks dirty samples to prevent noisy gradients from destabilizing policy updates. Our approach enables stable and efficient learning from heterogeneous real-world data, resulting in a robust tool-selection agent that demonstrates significant improvements in accuracy and generalization capability. Our code is available at: https://github.com/einnullnull/TOOL-CURE.git.

Schedule Your Strategy Session

Executive Impact & Key Metrics

TOOL-CURE addresses key challenges in LLM tool selection like data imbalance, logical conflicts, and evolving toolsets. By integrating Proficiency-Scaled Curriculum Learning (PSCL) and Online Policy Guarding via Sample Screening (OPGSS) into a GRPO framework, it enables LLMs to learn robustly from imperfect real-world data. Experiments show significant improvements in both in-domain accuracy and out-of-domain generalization, making it a reliable solution for developing trustworthy LLM agents capable of adapting to complex, dynamic real-world tasks.

0 In-Domain Acc. Improvement (7B Model)

0 Out-of-Domain F1 Improvement (MetaTool, 7B Model)

0 Generalization Robustness Score

0 Total Downloads (Paper 1)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Research Article

This paper introduces TOOL-CURE, a novel training strategy for Large Language Models (LLMs) to enhance tool selection capabilities. It addresses challenges like tool imbalance, logic conflicts, and evolving toolsets by combining curriculum learning and online policy guarding within a reinforcement learning framework. This approach aims to improve accuracy and generalization in real-world applications.

Enterprise Process Flow

Initial Policy Model

→

Proficiency Assessment

→

PSCL Sampler (Easy/Hard Sets)

→

GRPO Training Loop

→

OPGSS Guard (Sample Screening)

→

Robust Policy Update

73.40% Improvement in Out-of-Domain F1 (MetaTool Benchmark)

Feature	Supervised Fine-Tuning (SFT)	TOOL-CURE (Reinforcement Learning)
Data Quality Handling	Treats all labels as ground truth, even flawed ones. Vulnerable to noisy/subpar samples.	Learns from outcome-based rewards, not just imitation. PSCL and OPGSS actively manage data quality and curriculum.
Generalization Capability	Struggles to generalize beyond seen query-tool pairings. Prone to overfitting on specific data distributions.	Achieves significant out-of-domain generalization. More adaptable to evolving toolsets and new tasks.
Training Stability	Relatively stable with high-quality data. Performance can collapse with noisy or imbalanced data.	Designed for stable and efficient learning. OPGSS prevents noisy gradients from destabilizing updates.

Mitigating Subpar Sample Impact on RL Training

Section 4.4.1 highlights how even a small portion of subpar samples can cause standard Reinforcement Learning (RL) methods like Vanilla GRPO to collapse on out-of-domain tasks, significantly degrading performance below zero-shot baselines. In stark contrast, TOOL-CURE demonstrates remarkable resilience and superior generalization by effectively defending against harmful data through its integrated PSCL and OPGSS mechanisms, ensuring productive learning from real-world imperfect data.

Calculate Your Potential ROI

See how implementing advanced AI solutions can transform your enterprise operations. Adjust the parameters below to estimate your potential annual savings and reclaimed hours.

Your Industry

Number of Employees (impacted by manual tasks)

Avg. Hours per Week on Manual Tasks (per employee)

Avg. Hourly Cost (per employee, including benefits)

Estimated Annual Savings $0

Annual Hours Reclaimed 0

Your AI Implementation Roadmap

A structured approach to integrating AI into your enterprise, ensuring maximum impact and minimal disruption.

Discovery & Strategy

In-depth analysis of your current workflows, identifying key AI opportunities and defining a tailored strategy aligned with your business objectives.

Pilot & Proof-of-Concept

Develop and deploy a small-scale pilot project to validate the AI solution, measure initial impact, and refine the approach based on real-world feedback.

Full-Scale Integration

Seamlessly integrate the AI solution across your enterprise, ensuring robust performance, scalability, and user adoption with comprehensive training and support.

Optimization & Scaling

Continuous monitoring, performance tuning, and expansion of AI capabilities to new areas, maximizing long-term value and competitive advantage.

Ready to Transform Your Enterprise with AI?

Take the first step towards a more efficient, innovative, and competitive future. Our experts are ready to guide you.

Schedule Your Strategy Session

Research Article

TOOL-CURE: Tool Selection via Curriculum-Enhanced Reinforcement Learning with Sample Screening for LLMs

Executive Impact & Key Metrics

Deep Analysis & Enterprise Applications

Enterprise Process Flow

Mitigating Subpar Sample Impact on RL Training

Calculate Your Potential ROI

Your AI Implementation Roadmap

Discovery & Strategy

Pilot & Proof-of-Concept

Full-Scale Integration

Optimization & Scaling

Ready to Transform Your Enterprise with AI?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai