Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Migration API: Cannot migrate a dataset using example json-ld file, complains about Contact E-mail being null though it is specified. (testSemanticMetadataAPIs) #8533

Closed
kcondon opened this issue Mar 24, 2022 · 1 comment · Fixed by #8557 or #8592
Milestone

Comments

@kcondon
Copy link
Contributor

kcondon commented Mar 24, 2022

Update:
The cause was a change to the citation.tsv file where the Contact field name has changed to Point of Contact. If I manually edit the example file to use that, it works. So the fix would be to update the example file to use the new name:
Change:
"citation:Contact": {
"datasetContact:Name": "Admin, Dataverse",
"datasetContact:Affiliation": "GDCC",
"datasetContact:E-mail": "[email protected]"
},

to be:
"citation:Point of Contact": {
"datasetContact:Name": "Admin, Dataverse",
"datasetContact:Affiliation": "GDCC",
"datasetContact:E-mail": "[email protected]"
},

Error on command line, no server log error:

curl -H X-Dataverse-key:9aef05f2-eca0-412e-be4b-4c56697700cc -X POST "http://localhost:8080/api/dataverses/root/datasets/:startmigration" --upload-file dataset-migrate.jsonld
{"status":"ERROR","message":"Validation Failed: Contact E-mail is required. (Invalid value:edu.harvard.iq.dataverse.DatasetField[ id=null ])."}

Also tried specifying file type, same results:

curl -H 'Content-Type:application/ld+json' -H X-Dataverse-key:XXX-YYY-ZZZ -X POST "http://localhost:8080/api/dataverses/root/datasets/:startmigration" --upload-file dataset-migrate.jsonld
{"status":"ERROR","message":"Validation Failed: Contact E-mail is required. (Invalid value:edu.harvard.iq.dataverse.DatasetField[ id=null ])."}

Took file from doc:
https://guides.dataverse.org/en/5.10/_downloads/e6c9ad6742e37c64eabaea749a13fa27/dataset-migrate.jsonld

@qqmyers
Copy link
Member

qqmyers commented Mar 24, 2022

For fields without a termURI in the metadata block definition, the current code assumes it can assign terms using a block level namespace and the title. When #8454 changed the titles for citation block fields, it exposed the limits of this heuristic - while the titles are often more readable than names, it leads to the problem here as title is being used as part of the identifier in the semantic APIs (which are still thankfully described as 'experimental'). I see four options:

Two change the semantic URLs assigned to the terms and thus are 'breaking changes' for the semantic APIs/machine users of the OAI-ORE files.

  • Make a one-time change to use the name (i.e. the examples here would be "citation:datasetContact") - slightly less readable
  • Make a one-time change to update the context and rather than a block level namespace ("citation" as defined in the "@context"), have individual entries mapping the title to the URL using the name, e.g. "Point of Contact":"https://dataverse.org/schema/citation/datasetContact" - keeps more readability but makes the @context bigger

For anyone really using semantic tools, these options are ~the same ("https://dataverse.org/schema/citation/Contact" changes to "https://dataverse.org/schema/citation/datasetContact"). For anyone who's creating json and then slapping the @context on as a static addition, the latter could be easier (perhaps not since the "citation:" prefix has to be dropped on the field entries along with changing the @context).

Two others would preserve the existing semantic identifiers:

  • Only make display changes in the block.properties files (i.e. revert the 8127 citation field improvements #8454 changes to the citation.tsv block and make those changes in citation.properties instead/only.) - This means the title in the block functions as part of the semantic identifier (as it is in practice now) rather than only having a display role. (so the field with name 'datasetContact' would have title "Contact" in the tsv and "Point Of Contact" in the citation.properties file for display).
  • Make a change to add a termURI entry for each datasetFieldType in a block that doesn't have one that would match the current semantic URI, e.g. for the dataset contact it would be "https://dataverse.org/schema/citation/Contact". This keeps the semantic URIs as is, so existing code/examples work. It would require a code change to either add an entry for each field to the context rather than a block level namespace e.g. "Point of Contact":"https://dataverse.org/schema/citation/Contact" - which keeps the json aligned with the new titles, or we could retain the existing format - a "citation": "https://dataverse.org/schema/citation/" namespace entry but the field name would also stay "citation:Contact" - fine except that this isn't the current name or title for the field.

I'm not sure what makes the most sense, but given that the OAI-ORE files go in Bags and DANS (heads-up @janvanmansum )/others(?) are using the json-ld migrate APIs, we should probably decide on/implement a fix before releasing #8454 as that alone would break things in json-ld. (Mea culpa - should not have decided to/agreed with using a non identifier field in creating semantic URIs for internally defined terms in blocks. They have been stable for 4+ years, and there was hope that more would get mapped to external vocabs before too long, but still...)

@pdurbin pdurbin added this to the 5.11 milestone Mar 28, 2022
@pdurbin pdurbin changed the title Dataset Migration API: Cannot migrate a dataset using example json-ld file, complains about Contact E-mail being null though it is specified. Dataset Migration API: Cannot migrate a dataset using example json-ld file, complains about Contact E-mail being null though it is specified. (testSemanticMetadataAPIs) Mar 29, 2022
qqmyers added a commit to GlobalDataverseCommunityConsortium/dataverse that referenced this issue Mar 29, 2022
See IQSS#8533. With this commit, tests and any uses of the semantic APIs/ORE
export should be the same as in v5.10. Based on Tech Hour discussion,
I'll go ahead and implement a fix/change for IQSS#8533 in which I include
reverting this commit.
pdurbin added a commit to ErykKul/dataverse that referenced this issue May 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants