-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tool for converting metadata standards to SOSO #238
Comments
Thanks, @clnsmth . I think your idea is useful and would help adoption. Off the top of my head, I know of a few tools that set up for metadata crosswalking and conversion that may be of interest or inspiration.
Interested to see where this goes. |
Thanks @mbjones. I'll review these resources, think more about design, and circle back with some thoughts. |
@mbjones Thanks again for pointers to these tools. I found them helpful in understanding how to design a crosswalk. While I couldn't see an opportunity to integrate with one of these projects, I might be missing something obvious and am happy to be convinced otherwise. Below is a design that draws inspiration from codemeta and codemetar. Guidance on modifying this design is welcome and much appreciated. I've never done something like this, and can use all kinds of help. Thanks in advance. Goals
DesignA Python package containing a semantic mapping, conversion specification, and implementation for each supported metadata standard, which are accessed through a common workflow framework. A schematic of the proposed design: Questions
Other thoughts? |
I need to create SPARQL scripts, and perhaps an RDFLib Python script to run them and any other supporting logic, needed to convert spatial DCAT to schema.org. This is part of the ANZGeoDCAT Profile (formal definition: https://linked.data.gov.au/def/anzgeodcat). So this will be a simple enough tool BUT that same profile will also maintain a mapping from (ANZGeo)DCAT to ISO19115-1/-3 and tooling for such conversions (RDF to XML), so we will have chained mappings from SOSO to ISO19115-1/-3 via (ANZGeo)DCAT. We are really aiming at CKAN delivering ANZGeoDCAT and then being able to convert to SOSO and/or ISO19115. |
Hi @nicholascar. This sounds great. If I'm understanding, you’re mapping to SOSO from two profiles/standards:
Do you have any interest in developing and maintaining this work within a community supported tool like the one being pitched above? If not, where could others find your work? https://github.com/Kurrawong/anzgeodcat? |
@clnsmth yes, I will have to maintain this with a community comprised of multiple Australian and New Zealand government agencies, at the very least - we would love wider involvement! That's right, the ANZGeoDCAT work lives at https://github.com/Kurrawong/anzgeodcat I've done a first pass DCAT variant to SOSO converter, but it's not for ANZGeoDCAT but for the Australian Indigenous Data Network's profile of DCAT. All that profile requires is the use of qualified attribution roles rather than direct roles (e.g. Here is the IDN CP DCAT profile resource listing: https://w3id.org/idn/def/cp Here is the IDN CP's specification document: https://w3id.org/idn/def/cp/spec But probably more interesting is the schema.org mapping: https://w3id.org/idn/def/cp/sdo Alongside the conceptual mapping, I've made an RDF mapping https://w3id.org/idn/def/cp/sdo.ttl And now I've made a conversion Python script: https://w3id.org/idn/def/cp/sdo.py Here is a before and after IDN CP / schema.org result:
You'll surely notice the 'before' is really just DCAT with a couple of small additions that aren't really indigenous per se. We have 5 more months of active development on that profile and the mappings, so we have some time yet to improve the conceptual, RDF & scripted mapping. |
We discussed this topic in yesterday's meeting to gain perspective on which of two implementation pathways to pursue (thanks @ashepherd , @nein09, @datadavev, @pbuttigieg, and Bill Manley (sorry Bill I couldn't find your GitHub handle)). A summary: Option 1 (Direct Transform)A direct transformation of metadata dialects from their typical format to SOSO via a programming language (e.g. EML.xml => Python list => SOSO.jsonld).
Option 2 (JSON-LD Framing)Taking a dialect in JSON-LD, applying a crosswalk to get the equivalent SOSO properties, and then structuring the result with a JSON-LD Frame (e.g. EML.jsonld => crosswalk => Frame.jsonld => SOSO.jsonld). This is one of @mbjones's original recommendations.
Did I miss anything? I'm going to push forward with Option 2, using EML as a test case, and report back at next month's meeting. P.S. I'm revisiting @nicholascar's ANZGeoDCAT work in light of all this (see above), and it kind of looks like a nice blend of the two options. I'm going to take a closer look (thanks @nicholascar!). |
Happy to see you there! A topic of major interest is discussed there, on standards mapping ! In french Biodiversity e-infrastructure, PNDB, we are using EML as pivotal format to create several others metadata and data standards through mappings. For now, we started to use first versions for ISO19115, INSPIRE Europe, DCAT, and are testing using EML annotations field on attributes to help creation of data standards as Darwin Core from raw EML based data package. We are notably working on these mappings because we have to harvest all biodiversity information systems in France then convert metadata to EML then apply enrichment on it before having a validation and proposing the enriched metadata in our catalog and propose feedback to original information systems if possible with enriched metadata transformed in their standard to help elevate FAIRness of biodiversity data in all systems. We are testing and validating the entire workflow this year. Moreover, we are partner of a french 8 years project called "GAIA DATA" focusing on creating a common distributed national infrastructure for Biodiversity, Climate and Earth system . In this project, we are using a pivotal metadata standard derived from geoDCAT based on O&M and we need here create mapping between every infrastructure data silos and this standard, so for biodiversity with EML. We are discussing the method and for now, we were thinking the same 2 options but to start, first option, "simple tabular files" seems to be better at least to help everyone start the process and then we will ameliorate the process taking complex cases as examples. Si, it seems to me 1/ there is possibilities to capitalize on existing or ongoing initiative to start (for exemple, maybe the work I have done on EML-DCAT can help in combination with the work mentioned by @nicholascar on a DCAT to SOSO mapping. 2/ there is possibilities to mutualized for upcoming effort creating such mappings, maybe a "hat" can be the GO FAIR "BiodiFAIRse" implementation network I am coordinating with Anne-Sophie Archambeau from France GBIF node and the new roadmap we propose to link notably with GEO BON work on EBV and "Bon in a box" linked with @jmlord comment on EML to JSON-LD issue. Please don't hesitate to comment, I will try to centralize all cited informations there, but this week I am off without computer so not easy ;) |
We discussed implementation options at the meeting last week (2023-04-27). A summary is provided here. While pursuing "Option 2 (JSON-LD Framing)" listed above, it became apparent that transformation using only JSON-LD algorithms would lead to information loss. Specifically, the JSON-LD flatten, expand, and compaction process doesn't handle nested structures common to metadata dialects, and JSON-LD framing doesn't allow for the construction of new data encountered when combining a dialect's properties into SOSO. After some discussion, a third option emerged. Option 3 (Mapping and Python Transform)Map a metadata dialect to SOSO and use it to drive a Python based transformer. The benefits of this approach include:
A first draft of the crosswalk for EML to SOSO is complete and entering a process of review (see soso-eml.sssom.yml and soso-eml.sssom.tsv). Next steps are to design a generalized transformer based on the SSSOM input and implement with Python. I'll report progress at the next SOSO meeting. |
I have set up a GitHub repository for prototyping this idea and will now begin testing it out on EML. |
Hi Folks. The EML to SOSO conversion functionality is now implemented and ready for use. You can find user documentation on how to run it here. The package architecture is designed to support conversion of other metadata standards in the future. You can learn more about this in the project design document. If the overall implementation looks reasonable, perhaps it should move to https://github.com/ESIPFed for broader community development and maintenance? Comments and suggestions are appreciated. Thanks! |
Amazing work @clnsmth ! I am, in fact @PaulineSGN ;), on the way to work on EML -> json-ld conversion to try things, I was planning using emld R package but maybe there is an interest to also / instead, use this EML -> SOSO converter ? Then the converted results and/or the conversion method might be of interest for us if we want to generate DCAT ? Looking forward to look at it deeper!! |
While discussing strategies to help data repositories in the adoption of SOSO conventions at yesterday‘s meeting, the idea was raised to develop a tool for converting metadata standards to the SOSO representation. One implementation of this could be a crosswalk and record builder within a Python package.
Is this worthwhile? Has this already been done? What would be a “good” design for such a tool? Other thoughts?
The text was updated successfully, but these errors were encountered: