AI BENCHMARKING REPORT
FINCH: Benchmarking AI for Finance Workflows
This analysis explores the new FINCH benchmark, evaluating AI agents on real-world finance and accounting workflows involving complex, multimodal enterprise data.
Executive Impact Summary
FINCH reveals significant gaps in current AI capabilities for complex, long-horizon finance & accounting tasks, highlighting areas for strategic investment.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
FINCH Benchmark Overview
The FINCH benchmark addresses the critical need for evaluating AI agents on real-world, enterprise-grade professional workflows in finance and accounting. Unlike curated toy tasks, FINCH uses authentic data from sources like Enron and other financial institutions, capturing the inherent messiness and complexity of daily enterprise work.
It involves multi-step reasoning across diverse domains like budgeting, trading, and asset management, pushing the boundaries of what current AI systems can achieve.
Workflow Construction
FINCH workflows are meticulously constructed via a novel pipeline involving LLM-assisted discovery and expert annotation. This process derives workflows from email threads, version histories of spreadsheets, and high-quality financial artifacts, ensuring authenticity and depth.
The benchmark yields 172 composite workflows with 384 tasks, involving over 1,710 spreadsheets, PDFs, and other multimodal artifacts, requiring over 700 hours of domain-expert effort.
Key Findings & AI Performance
Evaluations of frontier AI systems, including GPT 5.1 Pro and Claude Sonnet 4.5, reveal significant performance gaps. GPT 5.1 Pro passed only 38.4% of workflows, while Claude Sonnet 4.5 achieved 25.0%. This highlights the substantial challenges real-world enterprise workflows pose for current AI agents, particularly in handling compositional tasks, messy data, and multimodal reasoning.
Enterprise Process Flow
Case Study: Complex F&A Workflow
One particular workflow involved cross-checking department headcounts against detailed roster sheets, identifying discrepancies, and updating totals across varying schema. This task exposed AI agents' limitations in handling irregular table layouts, cross-sheet references, and the need for meticulous validation, leading to systematic failures.
The challenge escalated due to multimodal dependencies, requiring agents to parse information from PDFs and images, integrate it with spreadsheet data, and ensure consistency across numerous interconnected files.
Calculate Your Potential AI ROI
Estimate the efficiency gains and cost savings your enterprise could achieve by automating finance and accounting workflows with advanced AI.
Your AI Implementation Roadmap
A clear path to integrating AI into your enterprise, designed for measurable impact and seamless adoption.
Phase 1: Discovery & Strategy
Comprehensive analysis of existing F&A workflows, identifying key automation opportunities and defining strategic AI objectives.
Phase 2: Pilot & Proof-of-Concept
Develop and deploy AI solutions for selected high-impact workflows, validating performance and demonstrating ROI with real data.
Phase 3: Scaled Integration & Training
Full-scale integration of AI across F&A departments, including team training, system customization, and establishing governance.
Phase 4: Continuous Optimization
Ongoing monitoring, performance tuning, and iterative enhancement of AI models to adapt to evolving business needs and maximize efficiency.
Ready to Transform Your Finance & Accounting?
Connect with our experts to explore how FINCH insights can be applied to your specific enterprise challenges and build a tailored AI strategy.