-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connect metadata field names and blocks to (de facto) standard ontologies #2357
Comments
👍 I agree this is very relevant, and we had planned to do this with @posixeleni who create useful spreadsheets to map Dataverse metadata to several standards. I'm assign this issue to her. In particular we need metadata export support for: We'll also export metadata native json format |
Once a given field such as "title" is flagged as being part of various standards (DataCite, DDI, etc.) it would be nice to be able to see which standards it's part of with an API call. Perhaps we could add it to this existing (but undocumented) API endpoint that I added back when I wanted more information about a given field for seach/indexing purposes:
Also, I wanted to make sure people reading this issue know about the nice metadata reference at http://guides.dataverse.org/en/latest/user/appendix.html Thanks for creating this ticket @bencomp . I plan to send it around to at least a few people who attended the API breakout meeting at the community meeting where we discussed some of this. |
I'm glad you appreciate this input. I'll happily discuss with you and @posixeleni to make sure we understand each other's goals and requirements. @mcrosas let me stress that RDF is a model. You need an ontology/ontologies to take the terms from in which you express properties of datasets (i.e. metadata). I like RDF a lot, even though I know it has limitations. @pdurbin as above, the value of JSON comes with documentation of inputs and outputs. If you want inline context, try JSON-LD :) When I was at IQSS, @scolapasta explained the |
@bencomp yes, I understand that RDF is a model, not the same as metatadata standards such as Dublin Core Tems or DataCite schema. Still important to support it. Thanks for all the thorough descriptions in this issue; it will be very useful as we work on this. |
I think this is a good idea but this issue hasn't attracted very many comments since it was opened two years ago. Closing. Please open a fresh one if anyone out there is still interested in this and I'll try to remember to link it back to this issue to see the conversation we had here. |
Reviewing issues and thought I'd add a note. Metadatablocks now support specifying the URI for terms (and/or a default base URI for the block) which is then used in the creation of the OAI-ORE metadata export file, which is in-turn added to the Bags created for archiving. https://github.com/GlobalDataverseCommunityConsortium/dataverse/tree/IQSS/6497-semantic_api also has an updated metadata api that would allow for submitting metadata in json-ld format (i.e. the same format as in the OAI-ORE export). |
For interoperability with the world outside Dataverse, each metadata field should have a public definition outside the code or database. While humans have a general understanding of the meaning of 'title', 'publication date' and 'description', for instance, machines need a pointer to a machine-actionable definition in order to understand the relationship between a thing and the values of the metadata fields.
This principle powers the Semantic Web and Linked Data and automated agents working on it. DANS would like to make the DataverseNL dataset metadata available as Linked Open Data in the future, but that is not the main use case. This touches on exporting metadata via OAI-PMH (#813) or as BibTeX (#1013) and other formats (#2116), embedding it in web pages as Schema.org (#2243) or in meta tags (#1393) and registering metadata for persistent identifiers (e.g. #24). It could help with API development (#899 links various ontologies already, #1430 could definitely benefit).
Ontologies for describing scholarly works have existed for a long time. The Dublin Core Terms are very general but widely used. DDI is well known, but a bit more geared towards specific types of scholarly research. Datasets can be described in the DataCite metadata schema, DCAT or other ontologies. For metadata blocks specific to certain dataverses (e.g. #2310) I'm sure there either is an ontology available or one could be created with little effort in the same way a metadata block is created. For general metadata, ontologies definitely exist. For specific metadata, you don't want to come up with fields (descriptors, properties) that only have meaning within Dataverse. (Let me throw in #27 for a link to general/specific metadata.)
The de facto way of publishing ontologies on the web in a machine-actionable format is using RDF Schema and/or OWL. Each property in an ontology gets a URI for identification and using that URI, its meaning and domain (and other aspects) can be described. These URIs and the domain should be used by Dataverse. The domain is important, because some fields don't describe the dataset itself, but related things like creator, publications that cite the dataset or specific files.
Although I'm not a fan of the way the TSV format is used to specify metadata blocks (which are essentially ontologies) and controlled vocabularies, you wouldn't need to get away from it to include fields' URIs and domain.
The use of ontologies (a.k.a. metadata schemas) for blocks and fields is complementary to using (de facto) standard controlled vocabularies for values, which I mentioned before (#947, #434).
(This is a follow-up from #2243 (comment) and discussions with @scolapasta and @pdurbin at IQSS in June 2015)
The text was updated successfully, but these errors were encountered: