Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a dataset-version schema example for the datacite-based "national gallery dataset" #99

Merged
merged 2 commits into from
Mar 13, 2024

Conversation

jsheunis
Copy link
Contributor

With this example data, a datacite metadata record is converted to be compliant with the dataset-version schema. The dataset described by the datacite record is known as the 'External Environmental Data' dataset from the British National Gallery. See https://api.test.datacite.org/dois/10.82433/9184-DY35\?publisher\=true\&affiliation\=true.

Mappings that were not straightforward:

  • The datacite record had no intuitive choice for the name property of a DatasetVersionObject
  • keywords were taken from record["data"]["attributes"]["subjects"]
  • the record has a DOI with value 10.82433/9184-dy35; however, https://doi.org/10.82433/9184-dy35 does not resolve. It still seems to be the only viable identifier in the record though (e.g. record["data"]["attributes"]["identifiers"] = [])
  • modified was taken from the date part of record["data"]["attributes"]["updated"] which has value "2024-01-02T20:15:20.000Z"
  • landing_page was taken from the most relevant source in the record["data"]["attributes"]["relatedIdentifiers"] list
  • The record has a couple of relations aren't exact matches for available ones in the citation ontology:
    • used cito:supports for isSupplementTo
    • used cito:isSupportedBy for IsSupplementedBy
  • Had a challenge figuring out what metatype would be ascribed to an entity that has a resource type of InteractiveResource in the datacite record, and is identified by a URL. Organizations/Persons and Publications were easy, in this case the relation is to an online page, essentially. This part of the data example (the entity and the relation) is commented out.

Some issues were encountered when trying to validate the example locally. However, when I ran validation on the penguin dataset locally, i received the same errors. And these errors do not show up on the CI, so perhaps this issue is related to my run environment...

linkml-validate --target-class DatasetVersionObject -s src/linkml/schemas/dataset-version.yaml src/examples/dataset-version/DatasetVersionObject-penguins.yaml
[ERROR] [src/examples/dataset-version/DatasetVersionObject-penguins.yaml/0] datetime.date(2020, 7, 16) is not of type 'string' in /modified
Traceback (most recent call last):
  File "/Users/jsheunis/opt/miniconda3/envs/dlconcepts/lib/python3.11/site-packages/referencing/_core.py", line 266, in pointer
    contents = contents[segment]  # type: ignore[reportUnknownArgumentType]
               ~~~~~~~~^^^^^^^^^
KeyError: 'Agent'

With this example data, a datacite metadata record is converted to be
compliant with the dataset-version schema. The dataset described by the
datacite record is known as the 'External Environmental Data' dataset
from the British National Gallery. See https://api.test.datacite.org/dois/10.82433/9184-DY35\?publisher\=true\&affiliation\=true
@jsheunis
Copy link
Contributor Author

Oh, I guess the issues are there because I didn't apply the patches locally...

Copy link
Contributor

@mih mih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@mih mih merged commit 54afd2f into psychoinformatics-de:main Mar 13, 2024
3 checks passed
@mih mih mentioned this pull request Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants