Sep 23 – 27, 2024
ESRF Auditorium
Europe/Paris timezone

Ontology management for the SciCat catalog using LinkML

Sep 24, 2024, 6:15 PM
45m
Hybrid event (ESRF Auditorium)

Hybrid event

ESRF Auditorium

EPN Campus ESRF - ILL 71 Av. des Martyrs, 38000 Grenoble
Poster Metadata Posters

Speakers

Dylan McReynolds (Lawrence Berkeley National Lab) Linus Pithan (DESY)

Description

The SciCat[1] metadata catalog is in use at several scientific user facilities. SciCat stores metadata about datasets (both raw and derived), proposals, and instruments. When introduced into SciCat, each dataset is given a unique identifier. Datasets can be searched for and browsed in a web portal. Authorization rules can be applied to allow fine-grained access to datasets for staff and users. Datasets can also be made publicly accessible with SciCat, integrating with DOI systems.

When datasets are introduced into SciCat, they contain a "Scientific Metadata" section. SciCat's Scientific Metadata is completely unopinionated, allowing for any fields and values to be extracted from datasets and added to the catalog. This flexibility has enabled adoption by a wide variety of facilities, including X-ray sources, neutron sources, and academic groups. However, this flexibility comes at the expense of standardization, documentation, and machine readability.

The LinkML[2] project provides a set of tools facilitating the definition and publication of "schemas," and allows these to be integrated into a workflow. It enables the definition, maintenance, and interlinking of domain-specific ontologies, and expresses these in a variety of standard definition languages such as JSON Schema, JSON-LD, RDF, and OWL.

Defining schemas for scientific metadata outside of a catalog like SciCat opens the possibility to use higher-level abstractions in the metadata definition than what would be possible, for example, with pure JSON schema definitions. For instance, it is possible to specify hierarchical dependencies and ontological mappings in dedicated data-model frameworks such as LinkML.

DESY and the Advanced Light Source have begun working with LinkML, investigating its use as part of the workflow with SciCat. The aim is to develop a comprehensive solution to enhance the management and validation of scientific metadata within the SciCat framework. The primary objectives include the preparation of a robust data model for experiment metadata, the establishment of a validation layer to ensure the accuracy and integrity of ingested data, the generation of detailed documentation for metadata classes and attributes, the building of schema-based GUIs to insert datasets into SciCat, and the creation of a flexible spreadsheet for efficient metadata list management.

[1] https://scicatproject.github.io/
[2] https://linkml.io/

Abstract publication I agree that the abstract will be published on the web site

Primary authors

Dylan McReynolds (Lawrence Berkeley National Lab) Linus Pithan (DESY)

Co-authors

Dr Anjali Aggarwal (DESY) Dr Paul Millar (DESY) Dr Runbo Jiang (LBNL) Dr Tim Wetzel (DESY)

Presentation materials

There are no materials yet.