-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
40 changed files
with
280 additions
and
450 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,20 +1,46 @@ | ||
# About this Cookbook | ||
|
||
The IUPAC FAIR Chemistry Cookbook is intended to be an open, collaborative, community focused resource on working | ||
with FAIR data in chemistry. This community resource aims to provide a range of practical and re-usable training | ||
materials that demonstrate how to manage digital data files and content. Our goal is to get more practical tools | ||
& tips in the hands of practicing chemists and others working with digital chemical data, to lower barriers and | ||
smooth the adoption of best practices for sharing and reusing FAIR chemical data. The content primarily consists | ||
of actionable recipes for a range of tasks to prepare and deposit FAIR machine-enabled chemical data, identify | ||
and extract chemically relevant metadata, and compile and validate chemical data files using online tools. | ||
This resource was initially formulated as an output of the [IUPAC WorldFAIR Chemistry project](https://iupac.org/project/2022-028-1-024/), for the | ||
[WorldFAIR initiative](https://worldfair-project.eu/) (see below). The IUPAC FAIR Chemistry Cookbook is intended to support the broader | ||
community in understanding how to work with machine-readable chemical data and implement the FAIR data principles. | ||
The site is designed to be a living community resource through the addition of new content as strategies evolve | ||
and the sharing and reuse of FAIR chemical data continues to increase. Feedback and contributions are welcome. | ||
|
||
FAIR data are findable, accessible, interoperable, and reusable for machine processing {cite:p}`Wilkinson2016`. | ||
FAIR chemical data need to be machine-readable, and this can be an unfamiliar scenario for many researchers | ||
and other stakeholders involved with publishing and managing experimental data. This cookbook aims to support | ||
best practices for sharing and reusing chemical data aligned with the technical criteria for FAIR | ||
machine-readable data. Practical, interactive tutorials based on common workflows and readily accessible | ||
online tools for working with digital content augment broader guidance. | ||
## Project contributors | ||
|
||
The IUPAC FAIR Chemistry Cookbook is designed to be an evolving resource for the chemistry community. It is | ||
supported by the International Union of Pure and Applied Chemistry (IUPAC) as part of the WorldFAIR | ||
Initiative (see About this project). | ||
- Stuart Chalk (Project Lead), University of North Florida | ||
- Ann-Christin Andres, Johannes Gutenberg University Mainz | ||
- Simon Coles, University of Southampton | ||
- Jordi Cuadros, IQS Universitat Ramon Llull | ||
- Sonja Herres-Pawlis, RWTH Aachen University | ||
- John Jolliffe, Johannes Gutenberg University Mainz | ||
- Sunghwan Kim, National Center for Biotechnology Information, National Institutes of Health | ||
- Nicola Knight, University of Southampton | ||
- Ken Kroenlein, Citrine Informatics | ||
- Ye Li, Massachusetts Institute of Technology | ||
- Leah McEwen, Cornell University | ||
- Samuel Munday, University of Southampton | ||
- Fatima Mustafa, Texas A&M San Antonio | ||
- Vincent F. Scalfani, University of Alabama | ||
|
||
## Cookbook development | ||
|
||
This Cookbook is created online with Jupyter Book. Content is generated locally and managed through a GitHub repository. | ||
This infrastructure enables the following requirements at the content creation level: | ||
- Open and FAIR development and deployment | ||
- Support for a diverse set of user personas | ||
- Agile development and adaptation to user needs | ||
- Community engagement for long term stability | ||
- Documentation at user, contributor and administrator levels | ||
|
||
## WorldFAIR Chemistry | ||
|
||
The Committee on Data of the International Science Council ([CODATA](https://codata.org/)) and the Research Data | ||
Alliance ([RDA](https://rd-alliance.org/)) launched the [WorldFAIR Initiative](https://worldfair-project.eu/) in | ||
2022 to advance implementation of the [FAIR data principles](https://force11.org/info/the-fair-data-principles/) | ||
within and across research domains. The International Union of Pure and Applied Chemistry ([IUPAC](https://iupac.org/)), | ||
known as the world authority on chemical nomenclature, terminology, and standardized methods of measurement hosts | ||
the WorldFAIR Chemistry project in a concerted effort to support broader data sharing of chemical data through | ||
collaboration with related disciplines and data science communities. The goal of | ||
[WorldFAIR Chemistry](https://iupac.org/project/2022-012-1-024) is to support the use of chemical data standards in | ||
research workflows to enable downstream data reuse through practical direction and resources. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,16 @@ | ||
# Contributions to the Cookbook | ||
# Contribute to the Cookbook! | ||
|
||
We would like to thank all the contributors have made the IUPAC FAIR Chemistry Cookbook possible. | ||
The Cookbook is an open, collaborative, community focused resource based on a broadly accessible online dynamic | ||
platform in the form of a Jupyter Book that supports the development and publication of executable content. | ||
The content covers a range of activities for working with FAIR chemical data likely to be encountered by researchers, | ||
data scientists and many other stakeholders engaged in publishing and re-using chemical data. If you regularly | ||
work with digital chemical data and have useful approaches that could be demonstrated through a Jupyter Notebook, | ||
please consider contributing. Best practices for using standards and tools are emphasized and instructions for | ||
how to contribute materials are provided. | ||
|
||
- Stuart Chalk | ||
- Jordi Cuadros | ||
- Sunghwan Kim | ||
- Nicola Knight | ||
- Sam Munday | ||
- Vin Scalfani | ||
More information is available on how to contribute to the Cookbook in the [documentation wiki](https://github.com/IUPAC/WFChemCookbook/wiki): | ||
- [What is the IUPAC FAIR Chemistry Cookbook?](https://github.com/IUPAC/WFChemCookbook/wiki/What-is-the-IUPAC-FAIR-Chemistry-Cookbook%3F) | ||
- [How was the Cookbook developed?](https://github.com/IUPAC/WFChemCookbook/wiki/How-was-the-Cookbook-developed%3F) | ||
- [How to create content for the Cookbook](https://github.com/IUPAC/WFChemCookbook/wiki/How-to-create-content-for-the-Cookbook) | ||
- [How to submit a contribution to the Cookbook](https://github.com/IUPAC/WFChemCookbook/wiki/How-to-submit-a-contribution-to-the-Cookbook) | ||
- [Benefits of contributing to the Cookbook](https://github.com/IUPAC/WFChemCookbook/wiki/Benefits-of-contributing-to-the-Cookbook) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,64 +1,6 @@ | ||
# The Joy of Cooking (FAIR data) | ||
|
||
This cookbook provides a range of example protocols developed by active community members. These recipes target | ||
different tasks across a range of possible use cases for working with machine-readable chemical data (i.e., FAIR data). | ||
The aim is to present all materials with relevant chemistry examples, point to external content that are of high quality | ||
where available, reference IUPAC and community digital standards where appropriate, and engage the chemistry community | ||
in order to broaden the understanding of FAIR in chemistry. | ||
# The Joy of Cooking | ||
|
||
Working with data in terms of FAIR and in a digital environment means working with machine-readable data, therefore | ||
different activities, different steps to handle the data. This section provides a brief background on machine-readable | ||
data and the FAIR data principles in the context of chemistry, what you can do with machine-readable chemical data and | ||
the importance of preparing data to be FAIR and discoverable in domain repositories. | ||
|
||
In data science, a recipe describes a series of steps applied to a data set to prepare it for data analysis in a | ||
systematic away. Recipes can describe all the steps taken in a project from data ingestion to transformation to | ||
analysis to automate processes and share work with others. With recipes, you can prepare your dataset in a systematic | ||
and repeatable way. Recipes can cover many aspects of data preparation including normalization and joining multiple | ||
datasets. The recipes in this cookbook demonstrate actions… | ||
|
||
[Pointing to other sources: ELIXIR FAIR cookbook, NFDI4Chem KnowledgeBase, other resources] | ||
|
||
## Themes | ||
- FAIR describes attributes of machine-readable data that enables them to be reusable | ||
- Structure and consistency are important but there is no one rigid best way | ||
- Standards are designed to encapsulate multiple attributes into convenient motifs and methods | ||
- [Application of motifs/workflows enables FAIR attributes of data/metadata] | ||
- Common motifs in machine-readable chemical data (also in Culinary School) | ||
- Chemical identification and structure representation | ||
- Standard file formats for different data types | ||
- Chemical metadata description | ||
- What you can do with machine-readable chemical data? | ||
- Enhance discovery | ||
- Compile data | ||
- Programmatically query for data | ||
- Making chemical Data FAIR (more practicals in later sections) | ||
- Data files | ||
- Data processing | ||
- Data description | ||
- Data sharing | ||
- Absolute “minimum” (meaning that would enable data to be discoverable and a reuser can then try to do something with | ||
it, even if not as efficiently as desirable) | ||
|
||
- Basic concepts – (also include concepts in glossary for specific linking) | ||
- [setting the stage, the vernacular, this is what is happening here and you will come across that] | ||
- machine-readable | ||
- programmatic access/reuse | ||
- Data exchange | ||
- Data export/import formats (e.g., JSON) | ||
- Metadata | ||
- Languages (e.g. python, R, markdown?) | ||
- Platforms | ||
- Workspaces | ||
- Workflows | ||
- Provenance | ||
- PIDs | ||
- How FAIR works? – (and other sections as appropriate) | ||
- Chemistry particulars – (and other sections as appropriate) | ||
- Data types | ||
- Motifs (identifiers, representations, schema, formats, ontologies) | ||
- Standards | ||
- Organizing principles of data resources | ||
- There should also be something in here about FAIR is a scale and that anything you can do to improve the FAIRness | ||
of your data is a good thing (with comments on the benefits of doing even the lowest level improvements). SJC 12/5/23 | ||
- RIPE as a sequence of considerations… | ||
different activities, different steps to handle the data. This section will provide a brief background on | ||
machine-readable data and the FAIR data principles in the context of chemistry, what you can do with machine-readable | ||
chemical data and the importance of preparing data to be FAIR and discoverable in domain repositories. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
# Sources of FAIR Chemical Data | ||
|
||
Recipes in this section review accessible online sources of reliable FAIR chemical data, including research data | ||
repositories and other aggregated sources. Materials include brief descriptions of content and available documentation, | ||
and provide tutorials and demos of API protocols for searching and retrieving various types of data. | ||
repositories and other aggregated sources. Materials include brief descriptions of content and available | ||
documentation, and provide tutorials and demos of API protocols for searching and retrieving various types of data. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,69 +1,7 @@ | ||
# FAIR Techniques for Chemical Data | ||
# FAIR Techniques | ||
|
||
Making your science FAIRer: This section details how you can make your work FAIRer by upgrading how you can do common | ||
activities in a FAIR enabled way. | ||
|
||
This material will be informed by the WorldFAIR Chemistry D3.1 project related to Reporting Guidance for FAIR chemical | ||
data and other community resources, including the NDFI4Chem Knowledgebase and the ELIXIR FAIR Cookbook. In addition, | ||
efforts to inform best practice, such as the IUPAC FAIR Spec Project, will be highlighted. | ||
|
||
Test Kitchen / Checking your chemical data/metadata: This section will review protocols for confirming the completeness | ||
and consistency of chemical data and metadata files, for example the checkCIF service for Crystallographic Information | ||
Files (CIF). This material will also be informed by the WorldFAIR Chemistry D3.3 project related to Protocol Services | ||
for standardized programmatic access to chemical data, and other community resources. | ||
|
||
Ingredients / Data standards and formats for Chemistry: This section will provide descriptions of standard notation | ||
and file formats available for sharing and reusing chemical data that are referred to in recipes throughout this book. | ||
- Chemical structures | ||
- Chemical properties | ||
- Chemical terminology | ||
- Other useful formats | ||
|
||
This category introduces basic techniques on how to work with machine-readable data with particular emphasis on | ||
chemical data nuances and ways chemical data can be made more FAIR, when it is initially shared and for reusing | ||
data that are not fully FAIR. Techniques should be relatively easy to implement into common workflow(s) and give | ||
tangible results/improvements. | ||
|
||
- Overview of good FAIR practices | ||
- You’ve got to manage your data [files, structures, description, etc.] | ||
- You’ve got to get the data shared and licensed and citable | ||
- Identifying things of import (people, instrument, samples) | ||
- Critical stages of data processing (raw, processed, derived) | ||
- ***Example of how InChI supports F-A-I-R | ||
- Overview of working with FAIR chemical data | ||
- Queries | ||
- Matching on chemical identifiers/representations | ||
- APIs (what is an API and then link to some the recipes that demo these for tools and resources) | ||
- Chemical data standards! | ||
- Safety/watch-outs | ||
- Syntax: character sets, units | ||
- Semantics: valence models, units, temperature scale, date format | ||
- Normalization | ||
- Validation | ||
- Clean up | ||
- Examples of unFAIR data | ||
- (condensed chemical formula is not fully interoperable/identifiable) | ||
- Data values without reference (or other provenance, conditions) | ||
- Using chemical data standards | ||
|
||
- General resources on FAIR for chemistry | ||
- FAIR Data Principles | NFDI4Chem Knowledge Base | ||
- Elixir FAIR Cookbook | ||
- Elixir DRM… | ||
- General topics about how to improve working with chemical data, for example… | ||
- Basic data management | ||
- Assigning unique identifiers (especially for chemicals) | ||
- File naming conventions | ||
|
||
|
||
Given that most users of the cookbook will not be cheminformatics/data science experts, there needs to be some content | ||
that provides background material to users. Generally this would mean content about chemistry information and data | ||
needed by a computer science/data science background AND computer science information needed by a chemistry professional | ||
or student. Some of this material will be available externally and linked in pages, but other content might be best | ||
discussed in the context of computer science or chemistry to communicate how they relate. | ||
- Basic data manipulation stuff | ||
- APIs, spreadsheets, languages | ||
- What happens when you have unFAIR data | ||
- Basic chemistry issues? And how do you manage these? | ||
- Chemical description | ||
- Chemistry data standards | ||
The cookbook is meant to provide practical approaches to different data tasks to inspire others to improve their own | ||
data practices. This section will introduce basic techniques on how to work with machine-readable data with particular | ||
emphasis on chemical data nuances and ways chemical data can be made more FAIR, when it is initially shared and for | ||
reusing data that are not fully FAIR. Techniques should be relatively easy to implement into common workflow(s) and | ||
give tangible results/improvements. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
# Tools for working with FAIR Chemical Data | ||
|
||
Recipes in this section highlight online cheminformatics tools and web services that data researchers in the chemical | ||
sciences should know about. Material includes brief explanations, tutorials and demos of what can be done, and indicate | ||
sciences should know about. Material includes brief explanations, tutorials and demos of what can be done, and indicate | ||
scenarios where the tool might be used to manipulate machine-readable chemical data - both by humans and machines. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,17 @@ | ||
# How to use the Cookbook | ||
|
||
This cookbook provides a range of protocols developed by active community members. These recipes target different | ||
tasks across a range of possible use cases for working with machine-readable chemical data (i.e., FAIR data). | ||
The aim is to present all materials with relevant chemistry examples, point to external content that are of high | ||
quality where available, reference IUPAC and community digital standards where appropriate, and engage the chemistry | ||
community in order to broaden the understanding of FAIR in chemistry. | ||
|
||
The cookbook presents a collection of annotated code snippets and workflows for specific tasks in manipulating | ||
machine-readable chemical data and metadata. | ||
|
||
- Many of the recipes on this site take advantage of Juypter Notebooks to run Python code in the browser for an | ||
interactive (and educational) feel for the user. | ||
- Information on how, what and when a recipe might be useful is available in the collapsable 'header' below the | ||
title of the recipe. | ||
- The header also includes bullets for skills and learning objectives | ||
- Ideas to further characterize the applicability of recipes are welcome (see feedback)! |
Oops, something went wrong.