Sep 23 – 27, 2024
ESRF Auditorium
Europe/Paris timezone

Dranspose: A constrained map-reduce live analysis pipeline for fast experimental feedback

Sep 25, 2024, 4:25 PM
15m
Hybrid event (ESRF Auditorium)

Hybrid event

ESRF Auditorium

EPN Campus ESRF - ILL 71 Av. des Martyrs, 38000 Grenoble
Talk Data Reduction Data Reduction

Speaker

Dr Felix Engelmann (Max IV, Lund University)

Description

During beamtimes, critical decisions on how to proceed with an experiment must be made constantly. As a result, it is important to provide feedback with the best possible data analysis, mostly in the form of visualizations, with the lowest possible latency. For low data rates, writing and monitoring a file works well. However, processing tens of gigabytes per second is difficult with a filesystem, and streaming solutions are therefore preferable. Apart from the high throughput, it is important that the users consuming the feedback are able to quickly tweak the analysis to adapt to changing conditions. While there exist generic stream processing frameworks like Apache Spark, or application-specific OnDA (JAC 46, 2016), our goal is to combine all major relevant features from those tools.

We propose Dranspose (http://dranspo.se), a data analysis pipeline for high-throughput data acquisitions. It is a horizontally scalable, distributed system deployable to a Kubernetes cluster, relying on Redis for coordination and ZeroMQ for data streaming. As a programming paradigm, we propose a novel constrained map-reduce. In a classical map-reduce, every event is processed by an independent worker which forwards their result to a reducer. Often, experimental data has temporal dependencies such as the evolution over time, which in standard map-reduce is only analyzable in the reducer. Our constrained map-reduce enables stateful workers combined with load balancing for fast temporal difference calculations by sending consecutive frames to the same worker (see Figure). Additionally, it enables event formation from multiple sources to get a complete data view for a specific trigger. This is especially useful to normalize one detector by another. The reducer writes the analyzed data to an HSDS service, which provides standardized, simultaneous access for visualization tools like silx, h5pyd, or h5web. Dranspose takes care of the orchestration and distribution of data but allows users to easily adapt and update the map and reduce Python functions. For ease of development, we provide record and replay actions to develop the analysis on a local machine before the experiment.

Several applications for Dranspose include merging rotation stage encoder positions to a CMOS camera at 3GB/s for live tomographic reconstruction, azimuthal integration with normalization to $I_0$, live XRF concentration mapping, or crystallography spot finding. With gained confidence from the scientific users, Dranspose is useful for data reduction, and the feedback may be directly consumed by the control system to decide on the progression of a scan in a closed loop.

Abstract publication I agree that the abstract will be published on the web site

Primary author

Dr Felix Engelmann (Max IV, Lund University)

Co-author

Dr Paul Bell (Max IV, Lund University)

Presentation materials

There are no materials yet.