Sep 23 – 27, 2024
ESRF Auditorium
Europe/Paris timezone

Remote Collaboration via Distributed HDF File Access

Sep 24, 2024, 6:00 PM
2h
ESRF Entrance Hall

ESRF Entrance Hall

Poster Data Analysis Posters

Speaker

Justin Wozniak (Argonne National Laboratory)

Description

Cross-institutional data sharing is still a challenging problem for the large datasets collected at the Advanced Photon Source (APS). Sector 6 at the APS routinely collects single-crystal x-ray diffraction data at a rate of several terabyes per day, which is streamed for automated data reduction in local file stores. Such large data volumes make it challenging to collaborate on data analysis with remote collaborators, without the inefficiencies of transferring full data sets. There is a need to be able to collaborate with remote users performing visualization, lightweight analysis, and small data modification on these datasets, tasks for which the full dataset is not needed. Thus, we desire to create a system in which it is possible to enable remote data slicing and selection. The HDF interface allows for the possibility of lightweight data access, at least at the interface level. We are prototyping this approach using the HDF Highly Scalable Data Service (HSDS), a distributed service that can be hosted on an institutional cluster or commercial cloud and accessed by a remote client. This proposed solution integrates with existing analysis routines and visualization tools such as NeXpy. In this presentation, we will provide more detail on the use cases for remote collaboration, outline the architecture in more detail, and provide preliminary measurements on the costs of the approach, both from a performance and financial perspective.

Abstract publication I agree that the abstract will be published on the web site

Primary authors

Justin Wozniak (Argonne National Laboratory) Raymond Osborn (Argonne National Laboratory)

Presentation materials