Revolutionizing Financial AI with FinTrace
A New Benchmark for Trajectory-Level Tool Calling in Complex Financial Tasks
Unlocking Advanced Financial Intelligence
FinTrace is not just another benchmark; it's a foundational step towards AI agents that can truly understand and execute complex financial workflows. By moving beyond call-level metrics to holistic, trajectory-level evaluation, we identify critical gaps in current LLM capabilities and provide the means to train more robust, reliable financial AI.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Core Contributions to Financial AI
FinTrace provides a new standard for evaluating and improving LLMs in complex financial tasks. Our key contributions enable more realistic assessment and targeted training.
We propose FinTrace, a benchmark for evaluating LLMs' tool-calling capabilities in long-horizon financial tasks. It comprises 800 expert-annotated examples spanning 34 diverse financial scenarios across multiple difficulty levels and varied tool-calling patterns, providing substantially broader coverage of real-world financial use cases than existing benchmarks.
We introduce a rubric-based evaluation protocol that assesses tool-calling trajectories along four complementary axes, including action correctness, execution efficiency, process quality, and output quality, moving beyond the single-metric evaluations prevalent in prior work to capture the full spectrum of agent capabilities required in practical financial applications.
We construct FinTrace-Training, a preference dataset comprising 8,196 real financial tool-calling trajectories with a notable number of long-horizon tool-calling trajectories included, designed to support supervised fine-tuning and reinforcement learning-based post-training for improving LLMs' tool-calling general capabilities in diverse financial service tasks.
FinTrace Benchmark Construction Pipeline
| Rank | Model | Overall Score |
|---|---|---|
| 1 | Claude-Opus-4.6 | 0.788 |
| 2 | Claude-Sonnet-4.6 | 0.750 |
| 3 | GPT-5.4 | 0.737 |
| 4 | Gemini-3.1-Pro | 0.688 |
| 5 | KiMi-2.5 | 0.643 |
The Trajectory-Level Bottleneck
Our evaluation reveals that even frontier models struggle with trajectory-level reasoning, particularly on process quality and information utilization. This gap highlights a fundamental challenge: while LLMs can learn which tools to call, they struggle to reason over the results in the multi-step, numerically demanding context of financial tasks. Even after post-training on FinTrace-Training, end-to-end answer quality remains a bottleneck.
Calculate Your Potential AI ROI
Estimate the significant time savings and cost reductions AI can bring to your enterprise operations.
Your AI Implementation Roadmap
A phased approach to integrate advanced AI capabilities into your financial operations, ensuring seamless adoption and maximum impact.
Phase 01: Discovery & Strategy
Comprehensive analysis of your existing workflows, identifying key opportunities for AI integration and defining success metrics. We work closely with your team to align AI solutions with your strategic objectives.
Phase 02: Pilot & Proof-of-Concept
Develop and deploy a targeted AI pilot program, demonstrating tangible value and refining the solution based on real-world performance. This phase includes initial model training and tool integration.
Phase 03: Full-Scale Integration
Seamlessly integrate the proven AI solution across your enterprise, providing extensive training and support for your teams. We ensure robust security, scalability, and ongoing performance optimization.
Phase 04: Continuous Optimization
Implement monitoring, feedback loops, and iterative enhancements to ensure your AI agents evolve with your business needs and market changes. Regular performance reviews and updates are included.
Ready to Transform Your Financial Operations with AI?
Schedule a personalized consultation to discuss how FinTrace-driven AI solutions can enhance your efficiency, accuracy, and decision-making.