Skip to content

Commit

Permalink
Merge pull request #408 from NFDI4Chem/402-update-data-publishing-art…
Browse files Browse the repository at this point in the history
…icles-according-to-feedback

402 update data publishing articles according to feedback
  • Loading branch information
jliermann authored Nov 4, 2024
2 parents 3e911ac + cf0cbb3 commit b2db5b6
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 8 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ slug: "/publishing_standards_authors"

### Use ORCID iD to identifiy authors and ROR to identifiy institutions

{/* image "order of operations" needed (flowchart). May need to highlight various types of workflows, e.g., at which stage of the manuscript publication process data should be published. */}

:::tip Standard
_Authors should provide their ORCID iD to identify the authors/creators and contributors, and their ROR identifier to identify the institution to which they are affiliated._
:::
Expand Down Expand Up @@ -49,7 +51,7 @@ Research data repositories offer the option to add a related identifier to link
_Researchers should link their datasets to be published to their corresponding articles using the relation type `IsSupplementTo`._
:::

According to the [DataCite Metadata Schema](https://datacite-metadata-schema.readthedocs.io/en/4.5/appendices/appendix-1/relationType/), [`IsCitedBy`](https://datacite-metadata-schema.readthedocs.io/en/4.5/appendices/appendix-1/relationType/#iscitedby) and [`IsSupplementTo`](https://datacite-metadata-schema.readthedocs.io/en/4.5/appendices/appendix-1/relationType/#issupplementto) are both recommended for discovery. For published articles, [Crossref's documentation on relationships](https://www.crossref.org/documentation/schema-library/markup-guide-metadata-segments/relationships/) recommends that `isSupplmenetTo` should be used to link datasets generated as part of research results. Please note that some repositories automatically detect whether the linked object is an article or some other dataset published in another repository, and therefore don't require authors to specify a relation(ship) type.
According to the [DataCite Metadata Schema](https://datacite-metadata-schema.readthedocs.io/en/4.5/appendices/appendix-1/relationType/), [`IsCitedBy`](https://datacite-metadata-schema.readthedocs.io/en/4.5/appendices/appendix-1/relationType/#iscitedby) and [`IsSupplementTo`](https://datacite-metadata-schema.readthedocs.io/en/4.5/appendices/appendix-1/relationType/#issupplementto) are both recommended for discovery. For published articles, [Crossref's documentation on relationships](https://www.crossref.org/documentation/schema-library/markup-guide-metadata-segments/relationships/) recommends that `isSupplementTo` should be used to link datasets generated as part of research results. Please note that some repositories automatically detect whether the linked object is an article or some other dataset published in another repository, and therefore don't require authors to specify a relation(ship) type.

### Usage of Collection DOIs

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,17 @@ To assist authors in selecting **well-established and community-specific [reposi
*Journals should add a data availability statement to published articles and collect the necessary information through their submission systems.*
:::

[Templates](/docs/data_availability_statement/#templates-for-data-availability-statements) for [**data availability statements**](/docs/data_availability_statement) or a similarly termed section should also be added to the manuscript submission system. Once a template has been selected by the submitter, the data availability statement should be editable to allow authors to add additional information, such as what data are included in the dataset, similar to what is currently often mentioned in the section on supporting information PDF files. The submission system should then require the submitter to provide the necessary information, such as the DOI (specified as [DOI name](https://www.doi.org/doi-handbook/HTML/doi-name-syntax2.html) e.g. `10.1000/182` or as a URL i.e. including a resolver e.g. `https://doi.org/10.1000/182` ), repository name, third party name and contact information, or reasons for restricted access and information on how to access a dataset, depending on the template used.
[Templates](/docs/data_availability_statement/#templates-for-data-availability-statements) for [**data availability statements**](/docs/data_availability_statement) or a similarly termed section should also be added to the manuscript submission system. Once a template has been selected by the submitter, the data availability statement should be editable to allow authors to add additional information, such as what data are included in the dataset, similar to what is currently often mentioned in the section on supporting information PDF files. The submission system should then require the submitter to provide the necessary information, such as the DOI (specified as [DOI name](https://www.doi.org/doi-handbook/HTML/doi-name-syntax2.html) e.g. `10.1000/182` or as a URL i.e. including a resolver e.g. `https://doi.org/10.1000/182` ), repository name, third party name and contact information (in case of third-party data ownership), or reasons for restricted access and information on how to access a dataset, depending on the template used.

### Link datasets to articles in Crossref DOI metadata

:::tip Standard
*Journals should use the information available in data availability statements to enhance Crossref DOI metadata by linking articles to datasets.*
:::

With the DOI and repository name in hand, journals should enrich Crossref DOI metadata of articles published following the [FAIR](/docs/fair/) principles (e.g [F2](/docs/fair/#f2-data-are-described-with-rich-metadata-defined-by-r1-below), [I3](/docs/fair/#i3-metadata-include-qualified-references-to-other-metadata)). This establishes a structured **link** between the DOI of the article and the DOI of the dataset and ensures humans and machines alike can interpret the relationship between the published objects. For Crossref metadata, a `related_item` should be added to mention the name of the repository (equal to `publisher` in the corresponding dataset DataCite DOI metadata).
With the DOI and repository name in hand, journals should enrich Crossref DOI metadata of articles published following the [FAIR](/docs/fair/) principles (e.g [F2](/docs/fair/#f2-data-are-described-with-rich-metadata-defined-by-r1-below), [I3](/docs/fair/#i3-metadata-include-qualified-references-to-other-metadata)). This establishes a structured **link** between the DOI of the article and the DOI of the dataset and ensures humans and machines alike can interpret the relationship between the published objects. Without defining a relation in the Crossref metadata, only humans can effectively interpret this connection through the data availability statement, effectively resulting in a missing link between the the reported results in the article and the underlying research data. For Crossref metadata, a `related_item` should be added to mention the name of the repository (equal to `publisher` in the corresponding dataset DataCite DOI metadata).

<!--image clarifiyng why this is important-->

In agreement with [Crossref's documentation on **linking datasets** to published items](https://www.crossref.org/documentation/schema-library/markup-guide-metadata-segments/relationships/), the relationship type `isSupplementedBy` should be used.

Expand Down Expand Up @@ -71,13 +73,13 @@ A disadvantage of pre-submission is that researchers cannot link the dataset to

Some repositories have an *under review* status alongside the *draft* and *published* statuses. A dataset *under review* is not editable and not yet published, i.e. it does not have a DOI registered. Therefore, the DOI cannot be validated. Nonetheless, the dataset has an internally reserved DOI and is accessible via a URL to provide access to editors and reviewers. This allows research data to be included in the review process. The URL to access the dataset should be requested by the submission system so that it can be forwarded to editors and reviewers.

### Encourage authors to publish datasets *under review* prior the articles gets published
### Encourage authors to publish datasets *under review* prior to article publication

:::tip Standard
*Journal author guidelines should require that datasets with status under review to be published prior to the publication of the associated article.*
:::

To assist in automated workflows, such as linking the datasets to the published article through their respective PIDs, **datasets *under review* should be published before the article gets published**. Once a manuscript has been accepted, the authors should be informed to publish their dataset *under review*. This ensures that the data has a registered DOI when the article gets published. Consequently, journals can run quality control checks on the provided DOI such as validation. This process must be explicitly communicated with authors through the author guidelines, yet, can also be included within other communication upon acceptance. Contemporaneous, the DOI for the article should be provided so that authors can include this information in their dataset's metadata prior to the publication of the dataset. Finally, the article is published, and its DOI is registered.
To assist in automated workflows, such as linking the datasets to the published article through their respective PIDs, **datasets *under review* should be published before the article is published**. Once a manuscript has been accepted, the authors should be informed to publish their dataset *under review*. This ensures that the data has a registered DOI when the article gets published. Consequently, journals can run quality control checks on the provided DOI such as validation. Publishers must explicitly communicate this process with authors through the author guidelines. In addition, other communication upon acceptance may also include this informaton. Contemporaneous, the DOI for the article should be provided so that authors can include this information in their dataset's metadata prior to the publication of the dataset. Finally, the article is published, and its DOI is registered.

### Scholix.org

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ slug: "/publishing_standards_infrastructure"
*Research data repositories should include the metadata in datasets downloaded by researchers and exchanged with other resources.*
:::

While researchers upload their data, metadata is attached. For generic, multidisciplinary repositories, additional metadata is provided by researchers via a metadata editor. For field-specific repositories, the metadata is extracted from the analytical data files and provided by researchers along their lab workflows, as this is the case for Chemotion ELN. Once data is retrieved from a repository, this metadata should not be lost but should be included in the downloaded package. The minimum to include is the descriptive DataCite metadata.
Generic and technical metadata is attached to the dataset during the upload process. For generic, multidisciplinary repositories, researchers provide additional metadata via a metadata editor. For field-specific repositories, the metadata is extracted from the analytical data files and provided by researchers along their lab workflows, as this is the case for Chemotion ELN. Once data is retrieved from a repository, this metadata should not be lost but should be included in the downloaded package. The minimum to include is the descriptive DataCite metadata.

[BagIt](https://www.rfc-editor.org/rfc/rfc8493.html), a set of hierarchical file system conventions, is one solution to enable reliable file transfer and to include metadata in downloaded dataset, as this is already the case for RADAR.

Expand All @@ -19,7 +19,7 @@ While researchers upload their data, metadata is attached. For generic, multidis
*Research data repositories should include structured, domain-specific metadata in datasets downloaded by researchers and exchanged with other resources.*
:::

Beside of metadata following generic schemes such as DataCite's metadata scheme, domain-specific metadata should be part of each dataset. This metadata should also be provided in datasets downloaded by researchers for reuse or exchanged with other resources.
Besides metadata following generic schemes such as [DataCite's metadata scheme](https://schema.datacite.org/), domain-specific metadata should be part of each dataset. This metadata should also be provided in datasets downloaded by researchers for reuse or exchanged with other resources.

One solution to this is to include Schema.org metadata making use of [RO-Crate](https://www.researchobject.org/) or by even [combining RO-Crate and BagIt](https://www.researchobject.org/ro-crate/specification/1.1/appendix/implementation-notes.html#adding-ro-crate-to-bagit). While BagIt focusses on reliable transfer, RO-Crate is about rich metadata.

Expand All @@ -29,7 +29,7 @@ One solution to this is to include Schema.org metadata making use of [RO-Crate](
*Research data repositories should provide a Collection DOI to wrap research data objects that are relevant to a single article that is to be published.*
:::

Field-specific research data repositories may provide DOIs to reference individual chemical reactions, molecules, and their analytical data. Generic, multidisciplinary research data repositories provide DOIs for whole published datasets, while more than one published dataset may be relevant to study results published via an article. In other words, many DOIs may be relevant to a published article, whereas **a data availability statement may provide some DOIs but not many DOIs**. To facilitate the process of manuscript submission and article publication, each repository should allow authors to generate a **Collection DOI** that wraps relevant data that should be referenced in the data availability statement.
Field-specific research data repositories may provide DOIs to reference individual chemical reactions, molecules, and their analytical data. Generic, multidisciplinary research data repositories provide DOIs for whole published datasets, while more than one published dataset may be relevant to study results published via an article. In other words, many DOIs may be relevant to a published article, whereas **a data availability statement may provide some DOIs but not many DOIs**. To facilitate the process of manuscript submission and article publication, each repository should allow authors to generate a [**Collection DOI**](https://chemotion.net/docs/repo/doi) that wraps relevant data that should be referenced in the data availability statement.

### Embargo period and metadata accessibility

Expand Down

0 comments on commit b2db5b6

Please sign in to comment.