Enterprise AI Analysis
Streaming X-ray Detector Data to Remote Facilities Using EJFAT
This comprehensive analysis delves into the technical advancements and strategic implications of integrating PvaPy with ESnet JLab FPGA Accelerated Transport (EJFAT) for high-performance X-ray data streaming.
Abstract
Propelled by the increasing need for the near real-time feedback for user experiments on its X-ray beamlines, the Advanced Photon Source continues to investigate the use of streaming workflows, with several of those being successfully deployed on its local computing infrastructure. With ever-growing data volumes and compute resource needs, the ability to analyze beamline data at remote facilities is becoming more and more important. In this paper we investigate the possibility of using ESnet JLab FPGA Accelerated Transport (EJFAT) project infrastructure to bring X-ray detector data directly from the instrument into an analysis application running at a remote high performance computing center. To that end, we describe successful integration of PvaPy, a Python API for the EPICS PV Access protocol, with the EJFAT software library. We also discuss potential use cases, as well as illustrate system performance in terms of maximum achievable frame and data rates in a test environment.
Executive Impact: Key Takeaways
This integration facilitates near real-time data processing for advanced scientific discovery, offering significant improvements in experimental efficiency and throughput.
This paper demonstrates a significant advancement in streaming X-ray detector data to remote high-performance computing (HPC) facilities. By integrating PvaPy with ESnet JLab FPGA Accelerated Transport (EJFAT), the project successfully enables near real-time feedback for user experiments at the Advanced Photon Source. The framework addresses critical challenges like firewall traversal and high data volumes, achieving impressive throughput rates. This work is a pivotal step towards enhancing scientific discovery through cross-facility data analysis workflows and lays the groundwork for future autonomous, AI-driven experiments.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
The paper introduces the increasing need for near real-time feedback for user experiments on X-ray beamlines, driven by rapid advances in ML/AI and growing data analysis needs. It highlights the Department of Energy's (DOE) Integrated Research Infrastructure (IRI) program's focus on enabling seamless and secure links between experimental facilities and high-performance computing centers like ALCF, NERSC, and OLCF. The Advanced Photon Source (APS) is deploying beamline analysis workflows on these resources, moving from file-based to streaming workflows for real-time feedback. The ESnet JLab FPGA Accelerated Transport (EJFAT) project is key to streaming data directly from instruments to remote HPCs, overcoming previous firewall issues. The work also integrates PvaPy, a Python API for EPICS PV Access, with EJFAT.
This section details the integration of PvaPy, a Python binding for EPICS PV Access, with EJFAT. PvaPy provides a framework for high-performance data streaming from EPICS-controlled X-ray detectors to compute resources. Recent PvaPy enhancements address firewall issues by allowing TCP-based channel searches and flexible input/output modes. EJFAT's goal is to stream data from scientific instruments to HPC facilities using a load balancer over UDP. The integration involves directly including the E2SAR library (EJFAT's core) into PvaPy's streaming framework, enabling serialization/deserialization of EPICS PVA structures to/from bytes. This approach, while having serialization overhead, is transparent to the user and supports distributed processing.
Performance tests were conducted on a 64-bit Linux system with 96 logical cores (dual Intel(R) Xeon(R) Gold 6342 CPUs with hyper-threading enabled) running at 2.8 GHz and with 2 TB of RAM. We used PvaPy 5.5.0 with local implementation of EJFAT data receiver and publisher [25] that was based on E2SAR library version 0.2.1 for Python 3.11. Serialization and deserialization overhead was measured, showing it's slightly more costly for serialization. For smaller 0.26MB images, the maximum sustained frame rate was around 24,000 Hz, achieving about 6.1 GB/s in default mode. For larger 16.78MB images, the highest sustained rate was 720 Hz (12.1 GB/s). EJFAT throughput was comparable to other EPICS modes. Load balancer connection tests also successfully demonstrated system functionality for different image sizes, though only single consumer tests were possible due to firewall configurations. The system's ability to handle high UDP packet rates (over 400,000 pps) for larger images was noted, emphasizing the need for network tuning.
The PvaPy/EJFAT framework hides the complexity of setting up streaming workflows, making it easier for users to integrate data from EPICS-controlled detectors into applications running on compute nodes. This integration is crucial for near real-time data processing and autonomous, self-driving experiments at APS beamlines. Future work will focus on overall system reliability, limitations, and potential failure modes, especially concerning EJFAT load balancers. The project shows promise for ubiquitous streaming workflows across multiple institutions.
Data Flow with EJFAT & PvaPy
| Feature | EPICS PVA (Default Mode) | EJFAT Mode |
|---|---|---|
| Frame Rate (16.78MB, 8 Consumers) | 720 Hz | 240 Hz |
| Data Rate (16.78MB, 8 Consumers) | 12.08 GB/s | 4.03 GB/s |
| Protocol Used | TCP/UDP (PVA discovery) | UDP (EJFAT) |
| Firewall Handling |
|
|
| Complexity for User |
|
|
Case Study: Real-time Tomographic Reconstruction at APS
Context: The Advanced Photon Source (APS) 2-BM beamline for dynamically driven experiments.
Challenge: Optimizing environmental conditions (e.g., cooling, pressure) during experiments where the X-ray beam affects the sample, requiring immediate feedback.
Solution: Implementing streaming tomographic reconstruction software that enables 3D zooming into regions of interest as data arrives.
Impact: Allows scientists to adjust experiment parameters in near real-time, significantly enhancing experimental efficiency and data quality. This forms a crucial step towards autonomous experiments.
Calculate Your Potential ROI
See how real-time data streaming and AI integration can transform your operational efficiency and drive innovation.
Your Path to Real-time Data Streaming
Our proven methodology ensures a smooth transition to advanced data workflows and AI integration.
Phase 1: Discovery & Assessment
Initial evaluation of existing detector infrastructure, network capabilities, and computational needs. Define key streaming requirements and identify integration points with PvaPy and EJFAT.
Phase 2: Pilot & Proof-of-Concept
Implement a small-scale streaming pipeline for a selected X-ray detector, validating PvaPy/EJFAT integration, data throughput, and latency. Address initial firewall and network tuning challenges.
Phase 3: Scalable Deployment
Expand the streaming solution across multiple beamlines and integrate with remote HPC facilities. Optimize load balancing, error handling, and data distribution for production-level workloads.
Phase 4: Advanced AI/ML Integration
Develop and deploy real-time analysis applications leveraging AI/ML models on the streamed data, enabling autonomous feedback loops and accelerating scientific discovery.
Ready to Transform Your Data Strategy?
Unlock the full potential of your X-ray detector data with real-time streaming and advanced analytics. Schedule a free consultation to explore how EJFAT and PvaPy can benefit your research.