Sep 23 – 27, 2024
ESRF Auditorium
Europe/Paris timezone

Latest news from the SciCat ecosystem: ElasticSearch integration and Job sub-system

Sep 26, 2024, 11:10 AM
15m
Hybrid event (ESRF Auditorium)

Hybrid event

ESRF Auditorium

EPN Campus ESRF - ILL 71 Av. des Martyrs, 38000 Grenoble
Talk FAIR data management FAIR data management

Speaker

Massimiliano Novelli (European Spallation Source)

Description

SciCat is an open-source data catalog providing data management, annotation, and publishing features for scientific facilities (https://scicatproject.github.io/). It enables tracking of data provenance, annotation with metadata, and publication of datasets with a unique DOI. SciCat is built on a flexible microservice architecture, allowing easy configuration for diverse use cases. The adoption of SciCat by multiple research facilities has helped its community to expand development, improve functionality and usability.

Development of SciCat is currently focused on version 4, which migrated the codebase to a more modern technology stack based on Typescript and the node.js framework Nest.js. The latest release improves search functionality through ElasticSearch integration, and the next one will improve interoperability with other services through a highly configurable job sub-system.

ElasticSearch integration is part of the effort of increasing data FAIRness, and comes as a consequence of the increased number of use cases that the community has access to and their review. During the review process, it became evident that the search capabilities needed to be improved and expanded with a better free text search option, similar to a search engine. This led to the choice of ElasticSearch, which is the industry standard.

The growing number of adopters and their varying landscape of IT infrastructure has exposed a need for flexible integration with additional services, such as archiving systems or custom APIs. The choice has been to design a configurable job sub-system which is dedicated to manage and dispatch configurable commands with third party services through a variety of protocols and systems, including calls to REST APIs, posting messages to RabbitMQ queues or Kafka topics. This allows each facility to easily configure SciCat to interact with services that are or will be deployed and best suit their needs.

We will present how SciCat integrates with ElasticSearch, provide an overview of the technical challenges, and a few examples. We will then provide an overview of the configurable jobs sub-system, present the idea behind it and the use cases that drove their design (such as the OpenEM project from the Swiss electron microscopy facilities). We will highlight the implementation details and the current status of this effort. We will conclude with a brief overview of future development.

Abstract publication I agree that the abstract will be published on the web site

Primary authors

Presentation materials