-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
review and provide recommendation on future of WCMP #11
Comments
@tomkralidis Maybe you need to add to the excellent outline above something around measures of success (some of which could be applied to the earlier WCMPs). Or at least describe what success would loook like. HTH |
I agree with @chris-little. I would say that we need to clarify objectives and benefits for the user. |
I think it will be valuable to create an inventory of existing discovery portals including the standards and formats supported. |
Most recent documentation on https://community.wmo.int/wis/wis2-implementation |
As part of this work, we must also keep in mind the W3C Spatial Data on the Web Best Practices. @wmo-im/tt-wismd please review as part of our WCMP 2.0 efforts. Thanks again. |
And do not forget the underlying W3C Data on the Web Best Practices . |
I originally included this, but then removed it given lineage. Nevertheless good to articulate explicitly, thanks @chris-little! |
.. and don't forget that generic best practices need to be tailored for our needs. |
IMHO our needs are rooted in lowering the barrier to our data. We need to satisfy discovery of WIS resources for both power users and mass market. |
2021-06-04 TT-WISMD meeting : @wmo-im/tt-wismd please review the W3C DCAT standard as a possible candidate for WCMP 2.0 for discussion at our next meeting. |
@wmo-im/tt-wismd please also review the OGC API - Records core metadata record model. This is based on DCAT and is based on GeoJSON which provides robust/broad interoperability. |
Thanks for the interesting information about OGC API - Records, Tom. |
@josusky thanks for testing. The OGC API - Records schemas are currently in development. You are correct that the validation issues are rooted in the Having said this (if you want to dig deeper), the following import json
import os
import sys
from jsonschema import RefResolver, validate
import yaml
def validate_json(instance: dict, schema: dict, schema_dir: str) -> bool:
resolver = RefResolver(base_uri=f'file://{schema_dir}/', referrer=schema)
validate(instance, schema, resolver=resolver)
return True
if __name__ == '__main__':
if len(sys.argv) < 3:
print(f'Usage: {sys.argv[0]} <instance> <schema>')
sys.exit(1)
schema_dir = os.path.dirname(os.path.abspath(sys.argv[2]))
with open(sys.argv[1]) as fh1, open(sys.argv[2]) as fh2:
instance = json.load(fh1)
schema = yaml.load(fh2, Loader=yaml.SafeLoader)
try:
validate_json(instance, schema, schema_dir)
except Exception as err:
msg = f'ERROR: {err}'
print(msg) and then assuming you cloned https://github.com/opengeospatial/ogcapi-records: python foo.py core/examples/json/record.json core/openapi/schemas/recordGeoJSON.yaml |
Thanks, @tomkralidis , PS: I did try your code, with the tools I have, and it ended with "ERROR: Expecting value: line 1 column 1 (char 0) |
@josusky once we have a 1.0 schema, then we will have published a single/all in one YAML (see the OGC API - Features example: http://schemas.opengis.net/ogcapi/features/part1/1.0/openapi/ogcapi-features-1.yaml), which we can use without worrying about external references. Having said this, validating would need an extra step to tie to a given shared component in the schema (vs. a single file of a building block) to ensure proper validation. |
As said in out last meeting, we tried to play around with the OGC API Records (OARec). Below I added first our first try and after that the original XML we wanted to translate. Thanks to my lovely colleague Antje Schremmer for her help and contribution. Here are some points we found while working :
And other questions regarding the further WMO context (coming from discussions with other colleagues):
JSON: OGC
XML
|
Excellent job @jsieland and Antje! Thanks for the valuable feedback.
This was raised in the OGC API - Records working group in opengeospatial/ogcapi-records#138, and a resulting schema update proposal in opengeospatial/ogcapi-records#144, which basically means adding the following to the root of the JSON: "conformsTo": [
"http://www.opengis.net/spec/ogcapi-records-1/1.0/req/record-core",
"http://www.wmo.int/spec/wmo-core-metadata-profile-1/1.0/req/discovery-metadata-record"
]
Let's start with using alternate representations. For example, from your English JSON: {
"id": 123,
"geometry": ...
"links" [
{
"rel": "alternate",
"type": "application/json",
"title": "This document in German",
"href": "https://example.org/foo.de.json",
"hreflang": "de"
},
}
Let's try adding the following to "wmo:maintenanceFrequency": "continual"
"wmo:originator": "foo"
"wmo:priority": "foo"
In the context of WIS 2.0, the harvesting workflow would support OGC API - Records, and our resulting "profile/extension" of the metadata model. Is this what you mean?
The goal is that WCMP 2.0 will have a JSON schema (which is based off the OGC API - Records record JSON schema), and will enforce cardinality accordingly.
We will need migration tools to go from WCMP 1.0 -> 2.0, is this what you mean? Any overall feedback on experiences with working with the metadata in a JSON format (compared to XML)? Is this easier from a user or programmer experience? How hard was it to make the above translation? Any feedback on this front is valuable. Thanks again Julia and Antje! |
Looks good, especially the possibility to add more than just one.
I like this approach because it allows to add/remove local versions.
Looks good, I will try to add this to the example! Not sure if I'm able to do this until our next meeting....
Both questions go in the same direction somehow... I hope I can explain this so it makes more sense: We use OAI-PMH for making our own metadata available to other parts of the Federal Government (like the Spatial Data Infrastructure Germany (SDI Germany) which has to be in ISO-XML and/or INSPIRE. So it would be nice to either have a tranlator which can convert XML to JSON and vice versa. Otherwise we might have to find our own solution for that.
Just a disclaimer: We did that all by hand. So no programming involved, just a simple editor ;) |
Hi all, Thanks to @jsieland for the great analysis (with a lot of very relevant points) and kick starting things. We also have started to have a deeper looker in EUMETSAT at the OGC API Records metadata standard and below are our comments. We definitely think that it is a already great improvements but would like to fix additional things while we are working on a new standard. In particular, we would like to avoid if possible the work of translating our internal OGC API Records to have a WMO OGC API Records. With ISO we had to downgrade our internal version to export a WMCP records by striping and re-formating some information. This increases the maintenance work and create different version of information in different places. First we would like to explain why we are making metadata records. It looks trivial but important to remind us about it. In EUMETSAT regarding the metadata business we're trying to focus on one aspect: providing enough information regarding a dataset to allow the user accessing the data by selecting the best access method or accessing information to best use the dataset. It can be some technical information regarding the dataset or associated dataset that will help the user do his job. To best guide the user, we have a web catalogue (https://navigator.eumetsat.int) which provides discovery services on the EUMETSAT datasets entirely based on the information defined in the ISO metadata. A lot of additional information has been added to allow us building the best possible discovery experience. We had in some case to massage a bit the ISO standard and it is not allowing us to do anything we would like to. A metadata record should self descriptive and self contained for allowing EUMETSAT but also any discovery provider presenting in the best possible way our datasets. So, to sum up, it is all about our users and the creation of a searchable / browsable catalogue and API information to give access to the datasets. We have the following necessary categories of information in the metadata to describe EUMETSAT products:
Below is a first set of comments/questions and improvements we would like to see/discuss with the team and are applicable to OARec.
The WIS 2.0 should be design in isolation from the GTS. The GTS now represents the tip of the iceberg in the ocean of available data (model data, satellite mission data, climate and reanalysis data) which is now daily accessed. The GTS is still very useful and servicable but for the benefit of WIS 2.0 it should be considered as an external service to the WIS 2.0 Catalogue, ie the metadata should not contain any specific GTS information (keywords like global exchange) outside of the "GTS access part" in the association property or in a given extension. Still with that architecture, the GTS data could be very efficiently retrieved and accessed in a dedicated data access service advertised in the metadata. Opinions, thoughts ?
In the WIS catalogues, some datasets are describing almost individual records (in situ observations) and others 30 years of data. Some of the individual records datasets are transforming the search experience in a bad experience because when doing a given search, the results are polluted by very similar records repeated n times. Should we would propose to define one granularity (the collection level) for avoiding recreating the same problems in the future WIS 2.0 catalogues ? Opinions, thoughts ?
There are a lot of specific information related to satellite datasets that are only interested for our community and we would like to take advantage of JSON and its extendability principles. We might want to have a section in the OGC API records where it is possible to describe the instrument infromation for instance or products specifics when necessary. How is the extendability forseen ? Can we imagine to have a "satellite" part that is provided without having to reformat to strip it from the produced metadata records to avoid extra work for managing the interface towards WMO ? We believe this should also apply to other communities. In that case the record should be extended. It would be good to define where and how it can be extended (can extra properties being added anywhere, should they re-grouped in top property, etc ? ) Opinions, thoughts ? Then going through each of the defined categories above, here are some comments relative to our OGC API records analysis:
The metadata is used to build discovery services (indexed information) but also to display the information in the best possible ways to our users. Big limitations with the current standards have been to have the ability to structure the textual information mainly in the abstract to best present it to users for instance using paragraphs and making use of editorial technics (bold, underlined, headers) as well as creating links. Could we imagine to have an optional edited abstract containing markdown information ? It would be in addition to the existing abstract and would greatly help in the representation of the information. Opinions, thoughts ?
This is also extremelly important to have some images to present in a graphical way the dataset to the user. This can be optional. Information regarding the image resolution (width/height) would be preferable to best display. Portait or Landscape should be recommended in a guide to make best use of the available space. Opinions, thoughts ?
Like @jsieland and DWD, EUMETSAT is providing access to a lot of 3rd party datasets so a publisher (the one providing access to the data) and an originator (the one creating the data and responsible for the data quality) are needed.
An additional contactPoint referencing a link providing all the information to contact the first line of support related to the dataset is really welcomed. This should be what is presented to the user to ask questions about the dataset. It already exist in API Records.
This can be provided using the links part of API records using the "rel" property to define the type of associated resource. We like it as it is really open but we recommend the definition of a set of existing relationships. We've seen some but could not find a definitive source for it. Where can it be found ? Opinions, thoughts ?
The current search experience in our catalog is based on keyword search and facets in a classical user interface with facets, keywords on the left and search results on the right, eg: https://navigator.eumetsat.int/search?query=SAF . However recent user consultations/surveys has demonstrated the need to have a hierachical brows-able presentation of the information with intermediate level if possible to explain to beginner, new comers the complexity of our datasets and our field. Starting from very high level thematics like Ocean, Weather, Climate and guiding the user to the datasets and or services (for instance here is an intermediate level about Ocean :https://www.eumetsat.int/what-we-monitor/ocean) . Can the themes used to create the thematic hierarchy in the catalogue ? We miss that information to know how they are related. Opinions, thoughts ?
The licensing information provision is a very complex topic with a lot of diversity and differences between the different license schemes. The responsibility of explaining and insuring that the user has been complying to the necessary conditions for having access to the data should be left to the data access service. We recommend simply providing a link to the data access license information (eg, https://www.eumetsat.int/eumetsat-data-licensing) . It seems that this is what has been done so far in the examples with the license field and rights field. The task force should recommend what is expected in the second one as most of the providers will want to retain their copyrights and depending on the datasets prevent re-distribution or not. Opinions, thoughts ?
This part is really essential as it gives to the user a lot of information regarding which service provides access to the found dataset and how. In EUMETSAT our catalogue defines using the ISO metadata record the list of format provided by each service. This is a really important part for the service. It seems that OGC API records doesn't provide that kind of linkage. Is it true and if yes is it possible to extend the access to link the format with the services. See here https://navigator.eumetsat.int/product/EO:EUM:DAT:MSG:HRSEVIRI for instance how our product navigator represents in the Access part the services and additional information related to the formats available for each services. It is also really key to provide some information related to service itself (link to build a service preview, number of files produced, links to example datasets not requiring registration). Is it possible to define a minimum set of mandatory properties and optional ones plus the ones that you can freely add without becoming uncompliant when validated ? Opinions, thoughts ?
More and more datasets like the re-analysises, climate records even now real-time ongoing datasets have citation information attached to them to allow any publications referencing them (mainly scientific). We provide that kind of information using an extension of the ISO standard and think that it is really important for the new WMO metadata standard to embed it. Here is an example of a dataset with the citation information ( https://navigator.eumetsat.int/product/EO:EUM:DAT:0080). Look at the citation part like DOI, authors, publisher, references. Is it possible to add that information in the OGC API records ? Opinions, thoughts ? Below is an example of a EUMETSAT dataset record with some additional more technical questions (they are also inserted with an non-conformat json way using a # comment):
For instance urn:x-wmo:md:int.eumetsat::EO:EUM:DAT:MSG:HRSEVIRI can be replaced by urn:x-wmo:md:int.eumetsat:EO:EUM:DAT:MSG:HRSEVIRI
JSON: OGC
|
Thanks for the extensive comments @gaubert. Notes from our discussion today (feel free to update as desired)
{
"formatted": {
"abstract": "`foo`, **bar**",
"markup_language": "markdown"
}
} Access links should be extended to be able to express supported formats: {
"rel": "self",
"title": "This document as JSON",
"href": "https://example.org/api",
"wmo:formats": [
"application/json",
"application/xml",
"text/plain"
]
}
"externalId": [
{
"scheme": "wmo-wis",
"value": "urn:x-wmo:md:int.wmo.wis::https://geo.woudc.org/def/data/ozone/total-column-ozone/totalozone"
},
{
"scheme": "doi",
"value": "doi:10.14287/10000004"
}
]
|
Thanks @gaubert, this is a very comprehensive and valuable analysis. And it reminded me of some additional points I forgot:
I found it here: https://github.com/opengeospatial/ogcapi-records/tree/master/core/openapi/schemas
I agree. |
Anything in The benefit here is:
From the schema definition:
Note that this is also derived from schema.org. We should consider any interoperability issues around having something specific for such a key primitive in our domain, as well as issues around "now" representing "not quite now" data (from a day ago, say). We should definitely include temporal resolution (issued here with the OARec SWG). For the moving window of data use case, should we consider this for only the data access perspective or beyond? An example use case is an organization that has been producing hourly observations since 2009-07-11, with a rolling window of 90 days. From the discovery metadata perspective, I would still see the temporal extent as In this view, it would be valuable to to express temporal resolution in {
"rel": "download",
"type": "application/json",
"title": "the last 90 days of data",
"href": "https://example.org/api",
"retention": "P90D"
} Thoughts? |
We also need to consider coordination with the WIGOS metadata model. In particular, we need to coordinate on the following types of information.
|
I'd like to have a summary of the standards we are evaluating.
|
This is the example I mentioned in our last meeting:
This example can not be realized with DCAT (and so OARec). The W3C issue mentioned in #11 has another example for rolling time windows. |
@gaubert thanks for your comprehensive answer. I am going to comment on this only
I absolutely agree that we need to design without GTS and plan to retire it. However, we need to realise that GTS will be retired very slowly over many years or decades. The transition plan is not ready yet. However, I think that we are going to expose all GTS through WIS2 pub/sub protocols and therefore I can imagine that the new metadata will be simply linked to a WIS2 style source. It will be new-style metadata with a new style pub/sub protocols. I think that very little of the current catalogue will remain as is. |
Thanks @efucile. Here's the current thinking around GTS links via pub/sub: MetPX/wmo_mesh#16 (comment), which would make its way into WCMP2 links. |
@efucile @tomkralidis Thanks for the answer. Ok so if I understand correctly, the GTS will be seen as one of the services providing access to some data (the GTS Observations) and the specific GTS information will only be in the access part of the metadata (like another service). That's good and simple to integrate. |
@tomkralidis I think we can close this |
Related #10 (comment), OGC API - Records has been used as the baseline for WCMP2, and the Global Discovery Catalogue (GDC). |
Summary and Purpose
WCMP Status
WIS
Current landscape
Proposal
@wmo-im/tt-wismd to assess and evaluate options for the future of WCMP against established criteria of requirements, e.g.:
Criteria
Reason
As TT-WISMD, we need to put forth next steps in realizing discovery in alignment with WIS 2.0 and current state.
cc @6a6d74 @joergklausen
The text was updated successfully, but these errors were encountered: