-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bklog: Deliverable - As a system integrator, I would appreciate a JSON Schema for validating my dataset JSON before uploading via API #26
Comments
The all mighty @pdurbin has found a very closely related issue: IQSS/dataverse#3060 |
While writing the HERMES concept paper, we once more stumbled over this. I learned that Zenodo is offering such a schema at https://zenodo.org/schemas/deposits/records/legacyrecord.json Recent talk with @atrisovic also revealed this would be very nice to create a crosswalk CodeMeta <-> Dataverse JSON IQSS/dataverse#7844 Tagging this as @hermes-hmc related |
Thanks @poikilotherm - this has come up as we've discussed integration with some Harvard library systems as well. PRs welcome if you have the availability to work on this. |
This is a central part for a future release, as mapping between different schemas and data types seem to be of high relevance for the future. You can find out more about my thoughts and work on this in pyDataverse here: gdcc/pyDataverse#102 To move on, it would totally make sense to connect such activity, to create a common understanding of data structures, validation processes and mappings. |
Quoting @4tikhonov at https://groups.google.com/g/dataverse-community/c/TqXmICwr0io/m/qEZDvPwuAAAJ "There is Dataverse schema in .nt format, you can easily get it in the knowledge graph with rdflib and serialise as json-ld: |
There are at least 2 subtasks
|
I just had a lovely chat with @JR-1991 while he was still in Boston about this (and other things). For easyDataverse, he would welcome having
The general schema might need to be versioned, so downloadable via the instance, too. NOTE: It might help adding the latest version to https://www.schemastore.org to have autocomplete when writing JSON manually. |
FWIW: The current json has a lot of schema-ish info in it already - would it make sense to remove that and make it flatter like the json-ld format? More work up front perhaps (and a breaking change) but a simpler schema and more readability. |
Yes, absolutely! We talked about that, too, but I forgot. We think alike you - meta information about the docs structure should be in a schema, not in a dataset's JSON. We could make it somewhat none-breaking by accepting it as deprecated input for a while in the schema, but ignoring it in the parser that translates to DTO/POJO. |
This could potentially sync the processing of the json and json-ld - e.g. if the json-ld can be made to look just like the json with an added @context which I think could be possible. |
I don't know if it helps, but you may also consider CEDAR as an example. Their metadata schema (https://more.metadatacenter.org/tools-training/outreach/cedar-template-model) is described as a JSON Schema, which defines the structure of a JSON-LD document, which should be the actual filled metadata. |
@beepsoft you're actually using CEDAR with Dataverse, right? I forget, did you give us a sample we can play with? 😄 |
@pdurbin Yes I am. :-) As part of this we now have or will have soon the following tools:
We use these to keep DV metadata blocks and CEDAR templates in sync. I'm not sure if I gave you an example CEDAR template, but here is one, for example: At the end of the page you can find "Advanced View", there you can copy the JSON Schema. For more examples, you can easily register at https://cedar.metadatacenter.org/ with github or ORCID account, and then can take a look at the public templates: This is my usual test template: CEDAR-NCBI Human Tissue Once you go to the editor, at the bottom you can always see the JSON Schema associated with the template. |
Sizing:
Sizing:
|
😉
A comment: Yes, we have. It's old and outdated and should be replaced. Can be done in the same go. I'm 65% sure that library is also not capable of creating a schema, just do validation of JSON with a given schema. So we might need to write out the schema using some JSON-P. |
@beepsoft thanks! Interesting! |
Sizing:
Steps:
reference: Next Steps:
|
@poikilotherm I've set this up as a backlog deliverable. Note - As a backlog deliverable:
Once we don't need this anymore we just move it in the backlog to "clear of the backlog" When you create the three issues:
At that point they will be queued for an upcoming sprint. Thank you so much stepping up during the meeting to create the issues! |
Sizing:
|
Stub issues for next steps, will be fleshed out by Oliver: |
Grooming:
Next Steps:
|
Prio:
|
For anyone watching this issue, yesterday we merged this PR: |
The latest PR to watch: |
As a "bklog: Deliverable" This is decomposed into smaller issues.
This is related to AUSSDA/pyDataverse#48 and to
dvcli
as a CLI tool for Dataverse. (Tagging @skasberger here)On the Dataverse side of life, this is related to all mighty IQSS/dataverse#6030 and loosley coupled to IQSS/dataverse#4451 (which might make the creation of the schema easier).
When creating a new dataset via the web UI, you will be provided with a nice interface and validation before a dataset is created. What is required, what is available as a field, etc is nicely integrated into the UI both for users and curators.
However, this is not the case with uploading new datasets via API. Before you send a JSON representation, there is not possibility to validate the dataset in terms of metadata schemas, required fields, etc.
It would be nice to provide an API endpoint to retrieve a JSON Schema for a given Dataverse, that contains precise constraints and requirements, what your dataset JSON has to look like and what other fields might be available.
This is useful not only for pre-creation validation, but also for automatic creation of options on command lines (think autocompletion, ncurses interfaces, ...) or client-side forms.
The text was updated successfully, but these errors were encountered: