Research Paper Analysis

Automating High Energy Physics Data Analysis with LLM-Powered Agents

Presented by Eli Gendreau-Distler, Joshua Ho, Dongwon Kim, Luc Tomas Le Pottier, Haichen Wang, and Chengxi Yang.

This study demonstrates the use of Large Language Model (LLM) agents to automate a representative High Energy Physics (HEP) analysis, utilizing a hybrid system combining an LLM-based supervisor-coder agent with the Snakemake workflow manager. The framework enables systematic benchmarking of model capabilities, stability, and limitations in real-world scientific computing environments, revealing significant progress in automated data analysis.

Schedule Your Strategy Session

Executive Impact: Key Achievements

Our innovative LLM-agent framework achieved significant milestones in automating complex HEP data analysis workflows.

88% Max Stage Success Rate (S-B Separation)

98% Tokens from Autonomous Reasoning

58% Data Prep Success Rate

74% Categorization Success Rate

Discuss Your Implementation

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

LLM Agents in Scientific Computing

Large Language Models (LLMs) and agent-based systems are increasingly being explored in scientific computing, with applications in genomics and software engineering. While High Energy Physics (HEP) data analyses are inherently structured, the use of LLMs is in early stages. This study introduces a framework for integrating LLM-based agents into a Snakemake-managed workflow, enabling controlled evaluation of agent-driven steps in a full collider-analysis setting.

Hybrid Architecture for HEP Analysis

The study employs a hybrid approach, combining a Snakemake workflow manager with a supervisor-coder LLM agent. Snakemake orchestrates the sequential execution of five analysis steps: ROOT file inspection, Ntuple conversion, Preprocessing, S-B separation, and Categorization. This design balances the determinism of the analysis structure with the flexibility required in HEP analyses, allowing the agent to autonomously generate, execute, and iteratively correct analysis code based on user instructions.

Performance Statistics

A baseline study with the gemini-pro-2.5 model across 219 experiments per step showed success rates of 58% for data preparation, 88% for S-B separation, and 74% for categorization. The data preparation stage was the most error-prone (93 failures out of 219 trials), largely due to insufficient context for identifying objects within ROOT files. S-B separation and categorization had fewer failures, with categorization errors often related to function-calling and semantic issues due to its increased complexity.

Cross-Model LLM Benchmarking

The study benchmarked a selection of state-of-the-art LLMs, including Gemini and GPT-5 series, Claude family, and open-weight models. Figures 2a and 2b illustrate significant variation in reliability and error patterns across architectures. Models in the GPT-5 and Gemini families generally achieved higher completion fractions, while smaller or open-weight models like gpt-oss-120b showed lower but non-negligible success. This demonstrates the feasibility and reproducibility of the agentic workflow across diverse LLM architectures.

Systematic Error Classification

Failure modes were systematically categorized into seven types, including "all data weights = 0", "intermediate file not found", "function-calling error", and "semantic error". An LLM (gpt-oss-120b) was used for error classification. A notable semantic error occurred in the categorization step where the agent misinterpreted user intent, leading to incorrect boundary initialization and an extra boundary being generated, resulting in failure. This highlights the importance of precise prompting and validation.

Economic Footprint of LLM Agents

The study measured the estimated monetary cost per workflow step, based on token usage and public pricing. The preprocessing stage was generally the most expensive due to intensive internal prompting and multi-step reasoning. S-B separation was consistently inexpensive, while categorization showed high variability. Higher-capacity proprietary models often achieved lower agent work and more stable retry behavior but incurred higher costs due to per-token pricing, while open-weight models, despite lower absolute cost, showed reduced robustness.

Current Limitations & Future Directions

While LLMs successfully supported HEP data analysis via natural language interpretation, code generation, and basic self-correction, multi-step task planning is currently beyond the scope of this study. Future work aims to strengthen the framework through improvements in prompting, agent design, domain adaptation, and retrieval-augmented generation. The Snakemake integration provides a natural path toward rule-based agent planning, enhancing reproducibility and control.

Enterprise Process Flow

ROOT file inspection

→

Ntuple conversion

→

Preprocessing

→

S-B separation

→

Categorization

LLM Agent Performance Comparison

A comparative overview of selected LLM agents across key performance indicators for the first step of the analysis pipeline. Costs are per 1M tokens.

Model	Step 1 Success Rate	Input Cost	Output Cost
Claude Sonnet 4.5	100% (10/10)	$3.00	$15.00
Gemini 2.5 Pro	100% (10/10)	$1.25	$10.00
GPT-5 Codex	100% (10/10)	$1.25	$10.00
GPT-OSS-120B	72% (54/75)	$0.00	$0.00
Qwen-3 (235B)	100% (10/10)	$0.50	$2.00

98% of Tokens from Autonomous Reasoning

0.00 Lowest Inference Cost (GPT-OSS-120B)

Case Study: Semantic Error in Categorization

A critical analysis revealed a semantic error in the categorization stage where the Supervisor-Coder agent misinterpreted user intent. The agent failed to initialize the significance array with the null selection, leading to an extra boundary being generated. This resulted in five categorization boundaries instead of the expected four, causing the trial to be marked as unsuccessful. This incident underscores the importance of precise communication and robust validation mechanisms in complex, iterative AI-driven workflows.

Calculate Your Potential AI ROI

Estimate the efficiency gains and cost savings your enterprise could achieve by automating workflows with LLM-powered agents.

Your Industry

Number of Employees (impacted by this workflow)

Avg. Hours Spent per Week (per employee on this task)

Avg. Hourly Rate of Employees ($)

Annual Cost Savings $0

Annual Hours Reclaimed 0

Get a Custom Estimate

Your AI Implementation Roadmap

A typical journey to integrate LLM-powered agents into your enterprise workflows.

Phase 01: Discovery & Strategy

In-depth analysis of existing workflows, identification of automation opportunities, and strategic planning for LLM agent integration. Define success metrics and a phased rollout plan.

Phase 02: Pilot Development & Testing

Develop and deploy a pilot LLM-agent system on a selected workflow. Rigorous testing, validation, and iterative refinement based on performance data and user feedback.

Phase 03: Scaled Deployment & Integration

Expand LLM agent deployment across identified workflows. Seamless integration with existing enterprise systems and infrastructure, ensuring data security and compliance.

Phase 04: Performance Monitoring & Optimization

Continuous monitoring of agent performance, efficiency, and ROI. Ongoing optimization, model retraining, and adaptation to evolving business needs and data patterns.

Start Your AI Journey

Ready to Transform Your Enterprise with AI?

Book a personalized consultation to explore how LLM-powered agents can streamline your operations, enhance efficiency, and drive innovation.

Book Your Consultation Now

Research Paper Analysis

Automating High Energy Physics Data Analysis with LLM-Powered Agents

Executive Impact: Key Achievements

Deep Analysis & Enterprise Applications

LLM Agents in Scientific Computing

Hybrid Architecture for HEP Analysis

Performance Statistics

Cross-Model LLM Benchmarking

Systematic Error Classification

Economic Footprint of LLM Agents

Current Limitations & Future Directions

Enterprise Process Flow

LLM Agent Performance Comparison

Case Study: Semantic Error in Categorization

Calculate Your Potential AI ROI

Your AI Implementation Roadmap

Phase 01: Discovery & Strategy

Phase 02: Pilot Development & Testing

Phase 03: Scaled Deployment & Integration

Phase 04: Performance Monitoring & Optimization

Ready to Transform Your Enterprise with AI?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai