AGENTHER: HINDSIGHT EXPERIENCE REPLAY FOR LLM AGENT TRAJECTORY RELABELING
Unlock Hidden Potential in Failed LLM Agent Trajectories
LLM agents often fail on real-world tasks, leading to a significant waste of valuable training data. AgentHER revolutionizes this by adapting Hindsight Experience Replay (HER) to natural-language agent trajectories. It intelligently re-labels failed attempts with achievable alternative goals, transforming discarded failures into high-quality training signals. This systematic approach dramatically boosts performance and data efficiency across various LLM models and benchmarks.
Key Impact Metrics
AgentHER delivers measurable improvements by maximizing the utility of every agent interaction, even failures.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
AgentHER introduces a four-stage pipeline: (i) Failure Detector classifies failures, (ii) Outcome Extractor summarizes actual achievements, (iii) Prompt Relabeler generates new goals (with multi-judge verification), and (iv) Data Augmenter converts them to training data. This systematic approach ensures high-quality relabeling and maximal data utility.
Enterprise Process Flow
| Feature | Conventional Training | AgentHER |
|---|---|---|
| Data Used |
|
|
| Failure Handling |
|
|
| Goal Modification |
|
|
| Label Noise Reduction |
|
|
AgentHER consistently improves LLM agent performance across diverse benchmarks and model scales. It not only boosts success rates significantly but also does so with greater data efficiency, especially benefiting smaller models and demonstrating strong generalization capabilities beyond specific task memorization.
Example Relabeling: Finding Copper Wire Suppliers
Scenario: An agent was tasked with "Find copper wire suppliers with prices under $5/kg and MOQ below 100 kg." The agent failed the original goal, identifying MicroMetals at $5.30/kg, MOQ 10kg, which was just above the price cap.
AgentHER Outcome: AgentHER re-labels this failed trajectory as a success for the new goal: "Compare copper wire suppliers by price per kg and MOQ. Identify the option with the lowest MOQ and report its price." This transforms a failure into a perfect demonstration for a related, achievable goal.
Key Takeaway: Failed trajectories often contain valuable, correct intermediate steps that can be reframed into successful demonstrations for alternative goals, significantly expanding the training dataset's utility.
The framework demonstrates robust scalability across model sizes and maintains high precision through its verification mechanisms. Iterative redeployment further compounds gains, proving its long-term utility in dynamic agent development environments.
Figure 4: Model-size scaling on WebArena (Qwen2.5 family). Gains are consistent at every scale (1.5B-72B), peaking at 14B (+9.2 pp with MJ). Even the 1.5B model achieves >2× gain over its SFT-Success baseline.
Calculate Your Potential ROI
Estimate the efficiency gains and cost savings your enterprise could realize by implementing AgentHER's data-driven approach.
Your AgentHER Implementation Roadmap
A typical phased approach to integrating AgentHER into your existing LLM agent development lifecycle for maximum impact.
Phase 1: Discovery & Data Audit
Analyze existing LLM agent failure logs, identify common failure types, and assess data quality and quantity. Establish baseline performance metrics.
Phase 2: AgentHER Integration & Pilot
Integrate AgentHER pipeline components (Failure Detector, Outcome Extractor, Prompt Relabeler) into a pilot project. Begin relabeling failed trajectories and generate initial training datasets.
Phase 3: Model Retraining & Evaluation
Retrain your LLM agents using AgentHER-augmented data. Conduct rigorous A/B testing and performance evaluation against previous baselines.
Phase 4: Iterative Optimization & Scaling
Implement iterative redeployment cycles to continuously improve agent performance. Expand AgentHER to cover more agent tasks and model families across your enterprise.
Ready to Transform Your LLM Agents?
Stop wasting valuable data. Harness the power of AgentHER to turn failures into formidable training opportunities.