Speaker
Saulius Grazulis
(VU Institute of Biotechnology, Life Science Center)
Description
For more than 20 year, the Crystallography Open Database (COD) collects published crystal structure data and makes it available on the Web under the CC0 license in an organised, machine readable and searchable for. Currently, the collection of the COD has over 500 thous. records and is used for crystal analysis, material identification, DFT calculations, machine learning, teaching and much more. To be usable for such applications, the COD data must satisfy certain quality criteria. All data that are deposited to the COD undergo tree levels of checks – syntax checks, semantic validation against the IUCr dictionaries and COD specific crystallographic checks based in the IUCr journal publication rules.
Over the years, software tools were developed for these tasks that are routinely used in the COD pipelines but can also be used in other applications. The software is developed using high standards of development, undergoes systematic testing, code review and is versioned using SemVer principles. Recently, we are exploring possibilities to apply methods of formal validation to the COD and other crystallographic software, using durable time-proven systems such as Ada/SPARK, Perl, SQL and C.
The use of Open Source software, the support of the community, European and Lithuanian funders and the Vilnius University allowed us to sustain the COD for more than two decades. The COD now is becoming an essential part of the Open Science FAIR data infrastructure, as attested by the Vilnius University Open Science policy roadmap and by the inclusion of the COD in numerous open or database catalogues. The goal is to collect and make openly searchable all crystallographic data that was published in reliable sources. This will open paths to implement crystal structure and property prediction and better understanding of how matter is organised in its crystalline form.
Primary author
Saulius Grazulis
(VU Institute of Biotechnology, Life Science Center)