Skip to main content
Enterprise AI Analysis: Streaming X-ray Detector Data to Remote Facilities Using EJFAT

Enterprise AI Analysis

Streaming X-ray Detector Data to Remote Facilities Using EJFAT

This comprehensive analysis delves into the technical advancements and strategic implications of integrating PvaPy with ESnet JLab FPGA Accelerated Transport (EJFAT) for high-performance X-ray data streaming.

Siniša Veseli, Argonne National Laboratory, Lemont, IL, USA

John Hammonds, Argonne National Laboratory (ANL), Lemont, IL, USA

Steven Henke, Argonne National Laboratory, Lemont, IL, USA

Madeline Miller, Argonne National Laboratory, Lemont, IL, USA

Hannah Parraga, Argonne National Laboratory, Lemont, IL, USA

Ilya Baldin, Thomas Jefferson National Accelerator Facility, Newport News, VA, USA

Derek Howard, Energy Sciences Network (ESnet), Berkeley, CA, USA

Yatish Kumar, Energy Sciences Network (ESnet), Berkeley, CA, USA

Nicholas Schwarz, Argonne National Laboratory (ANL), Lemont, IL, USA

Abstract

Propelled by the increasing need for the near real-time feedback for user experiments on its X-ray beamlines, the Advanced Photon Source continues to investigate the use of streaming workflows, with several of those being successfully deployed on its local computing infrastructure. With ever-growing data volumes and compute resource needs, the ability to analyze beamline data at remote facilities is becoming more and more important. In this paper we investigate the possibility of using ESnet JLab FPGA Accelerated Transport (EJFAT) project infrastructure to bring X-ray detector data directly from the instrument into an analysis application running at a remote high performance computing center. To that end, we describe successful integration of PvaPy, a Python API for the EPICS PV Access protocol, with the EJFAT software library. We also discuss potential use cases, as well as illustrate system performance in terms of maximum achievable frame and data rates in a test environment.

Executive Impact: Key Takeaways

This integration facilitates near real-time data processing for advanced scientific discovery, offering significant improvements in experimental efficiency and throughput.

This paper demonstrates a significant advancement in streaming X-ray detector data to remote high-performance computing (HPC) facilities. By integrating PvaPy with ESnet JLab FPGA Accelerated Transport (EJFAT), the project successfully enables near real-time feedback for user experiments at the Advanced Photon Source. The framework addresses critical challenges like firewall traversal and high data volumes, achieving impressive throughput rates. This work is a pivotal step towards enhancing scientific discovery through cross-facility data analysis workflows and lays the groundwork for future autonomous, AI-driven experiments.

0 Max Throughput (0.26MB images)
0 Max Throughput (16.78MB images)
0 Facilities Integrated
0 WAN Capacity (EJFAT)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

The paper introduces the increasing need for near real-time feedback for user experiments on X-ray beamlines, driven by rapid advances in ML/AI and growing data analysis needs. It highlights the Department of Energy's (DOE) Integrated Research Infrastructure (IRI) program's focus on enabling seamless and secure links between experimental facilities and high-performance computing centers like ALCF, NERSC, and OLCF. The Advanced Photon Source (APS) is deploying beamline analysis workflows on these resources, moving from file-based to streaming workflows for real-time feedback. The ESnet JLab FPGA Accelerated Transport (EJFAT) project is key to streaming data directly from instruments to remote HPCs, overcoming previous firewall issues. The work also integrates PvaPy, a Python API for EPICS PV Access, with EJFAT.

This section details the integration of PvaPy, a Python binding for EPICS PV Access, with EJFAT. PvaPy provides a framework for high-performance data streaming from EPICS-controlled X-ray detectors to compute resources. Recent PvaPy enhancements address firewall issues by allowing TCP-based channel searches and flexible input/output modes. EJFAT's goal is to stream data from scientific instruments to HPC facilities using a load balancer over UDP. The integration involves directly including the E2SAR library (EJFAT's core) into PvaPy's streaming framework, enabling serialization/deserialization of EPICS PVA structures to/from bytes. This approach, while having serialization overhead, is transparent to the user and supports distributed processing.

Performance tests were conducted on a 64-bit Linux system with 96 logical cores (dual Intel(R) Xeon(R) Gold 6342 CPUs with hyper-threading enabled) running at 2.8 GHz and with 2 TB of RAM. We used PvaPy 5.5.0 with local implementation of EJFAT data receiver and publisher [25] that was based on E2SAR library version 0.2.1 for Python 3.11. Serialization and deserialization overhead was measured, showing it's slightly more costly for serialization. For smaller 0.26MB images, the maximum sustained frame rate was around 24,000 Hz, achieving about 6.1 GB/s in default mode. For larger 16.78MB images, the highest sustained rate was 720 Hz (12.1 GB/s). EJFAT throughput was comparable to other EPICS modes. Load balancer connection tests also successfully demonstrated system functionality for different image sizes, though only single consumer tests were possible due to firewall configurations. The system's ability to handle high UDP packet rates (over 400,000 pps) for larger images was noted, emphasizing the need for network tuning.

The PvaPy/EJFAT framework hides the complexity of setting up streaming workflows, making it easier for users to integrate data from EPICS-controlled detectors into applications running on compute nodes. This integration is crucial for near real-time data processing and autonomous, self-driving experiments at APS beamlines. Future work will focus on overall system reliability, limitations, and potential failure modes, especially concerning EJFAT load balancers. The project shows promise for ubiquitous streaming workflows across multiple institutions.

Data Flow with EJFAT & PvaPy

X-ray Detector Data Source
PvaPy Streaming Framework (Edge)
EJFAT Load Balancer (WAN)
Remote HPC Analysis (Compute)
Processed Data Viewers

Throughput Comparison: EPICS PVA vs. EJFAT

Feature EPICS PVA (Default Mode) EJFAT Mode
Frame Rate (16.78MB, 8 Consumers) 720 Hz 240 Hz
Data Rate (16.78MB, 8 Consumers) 12.08 GB/s 4.03 GB/s
Protocol Used TCP/UDP (PVA discovery) UDP (EJFAT)
Firewall Handling
  • Requires TCP-based search or SSH tunnels
  • WAN-optimized, load balancer manages UDP
Complexity for User
  • Handled by PvaPy framework
  • Handled by PvaPy framework (E2SAR integration)

Case Study: Real-time Tomographic Reconstruction at APS

Context: The Advanced Photon Source (APS) 2-BM beamline for dynamically driven experiments.

Challenge: Optimizing environmental conditions (e.g., cooling, pressure) during experiments where the X-ray beam affects the sample, requiring immediate feedback.

Solution: Implementing streaming tomographic reconstruction software that enables 3D zooming into regions of interest as data arrives.

Impact: Allows scientists to adjust experiment parameters in near real-time, significantly enhancing experimental efficiency and data quality. This forms a crucial step towards autonomous experiments.

Calculate Your Potential ROI

See how real-time data streaming and AI integration can transform your operational efficiency and drive innovation.

Projected Annual Savings $0
Annual Hours Reclaimed 0

Your Path to Real-time Data Streaming

Our proven methodology ensures a smooth transition to advanced data workflows and AI integration.

Phase 1: Discovery & Assessment

Initial evaluation of existing detector infrastructure, network capabilities, and computational needs. Define key streaming requirements and identify integration points with PvaPy and EJFAT.

Phase 2: Pilot & Proof-of-Concept

Implement a small-scale streaming pipeline for a selected X-ray detector, validating PvaPy/EJFAT integration, data throughput, and latency. Address initial firewall and network tuning challenges.

Phase 3: Scalable Deployment

Expand the streaming solution across multiple beamlines and integrate with remote HPC facilities. Optimize load balancing, error handling, and data distribution for production-level workloads.

Phase 4: Advanced AI/ML Integration

Develop and deploy real-time analysis applications leveraging AI/ML models on the streamed data, enabling autonomous feedback loops and accelerating scientific discovery.

Ready to Transform Your Data Strategy?

Unlock the full potential of your X-ray detector data with real-time streaming and advanced analytics. Schedule a free consultation to explore how EJFAT and PvaPy can benefit your research.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking