-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collections Discussion #140
Comments
(EDIT: 6/4/20) This issue is essentially agreed to. See: #140 (comment) Actions here are still
Both API-Coverages and API-Environmental Data are in an awkward position pending progression of this discussion. Here's my attempt at summarizing to get us moving. I think there are two issues at play here, let's call them data-resource and items. The data-resource issue comes down to:
See #17, #36, #39, #45, #47, #74, #86, #99, #105, #106, #111, #116, #120, #122, #128, #130 The items issue comes down to:
See #45, #80, #82, #87, #83, #107, #110, #128 (and likely others) The status of addressing these issues for now:Collections and items have been moved to its own specification part so core can move forward: http://docs.opengeospatial.org/DRAFTS/20-024.pdf In the current (5-21-20) collections spec, the The Narrative:Some are advocating for a flexible definition of collection to allow spatial data resources to be the resources that make up the Others are advocating for strict typing of the A path forward:In follow up comments here, please take care to focus on the issues and seek to find ways to communicate unique characteristics of proposals with clarity. Please make fully-fledge proposals and try to name them such that we can discuss with clarity. References OGC-API specifications of interest:There is some discussion of the API-Features approach to collections in the core requirements class. Coverages discusses their approach in collection access EDR discusses their approach in environmental resources Records discusses their approach in collection access Processes does not use the collection endpoint There may be other relevant specs (styles?) to consider, but I think these are the core that we need to consider at this juncture. |
To give the group something to get started, I want to propose that we
|
As recently discussed in #11 and #111, I believe the most controversial aspect of the Collections problem is its generic name which sows confusion, while what we are trying to define here is something better described as OGC API - Common (Geospatial data). As such, and in line with what both @dblodgett-usgs and @jeffharrison are suggesting, let's entertain the idea that we can drop the literal 'collections' from being fixed, and that a compliant client must instead rely on finding the list of geospatial data resource by following "rel" : "data" from the landing page. As far as compatibility with OGC API - Features is concerned, this would either require that any server offering Features (which may be served along with other APIs) stick to "/collections", or that the Features standard is revised with a breaking change where clients can no longer rely on "rel" : "data" pointing to "/collections". Let's assume we are okay with this for now and continue. Now "OGC API - Common - Part 2: Geospatial data" could say:
The Tiles API could also be tied here by having a "tiles" link relation within each element of that array. I also foresee the need for additional optional conformance classes to be able to arbitrarily retrieve data using bounding boxes and resolution, without the client having to know anything about the type of geospatial data. |
Maybe some ideas from https://www.w3.org/TR/ldp/#ldpc can be used? |
@akuckartz Please note the "A Path Forward" section above. Can you please flesh out what you think is of value from the Linked Data Platform Containers list? |
@dblodgett-usgs That same comment ends with "There may be other relevant specs (styles?) to consider, ..." LDP is such a spec - and even a standard.
I will try, but can not guarantee that I find enough time. |
I see -- that closing comment was meant to close out the list of OGC API Specifications that have work in progress related to the data-resource and items issues. If you think there are elements of the LDP specification that are useful in bringing closure, please present them here, but that comment was not intended to be an open-ended ask for additional concepts from outside the current baseline. |
@jerstlouis -- why carry Your response does not address the two aspects of the issue as posed in my top-level summary and seems to be mixing thing up. Re: the "data-resource" issue, which seems to be the one you are addressing, you are clearly interested in Your:
Indicates to me that it is a bit of a hack and is probably not a solution that we want to pursue. Specifically, what is wrong with:
Where API-Common would specify the common aspects of a spatial-resource but not the path semantics -- only that a given spatial-resource view over a data-resource should have a literal path for its API. |
I don't think there's anything wrong with it. There should be room in OGC API for straightforward geospatial resources too. Best Regards, |
In general, I see two general approaches for moving forward, if the existing Collections resource does not work for the SWGs that specify other spatial data resources. Let me start with the proposal from @dblodgett-usgs, which I would characterise as follows:
That is, For spatial data resources with other access characteristics, other resource types with other tokens should be defined / used in the respective specifications. For example, Tiles, Coverages, EDR. Note that I don't see how Common Part 2 could define "a reusable set of conformance classes for One aspect that would need more thought are the link relation types. In Features, the "data" link relation type references the Collections resource at
The second option works, if/since we use fixed tokens like "collections" or "tiles" for the spatial data resource types. However, for clients navigating the API by following links, the first option should be easier to use. The second approach that I see is to also move away from fixed paths like For simplicity, let's assume that we still restrict an API that shares spatial data to a single dataset. If this changes, the solution would become more complex. Without fixed paths we need to rely on other mechanisms so that humans and software can understand an API, both from the API definition and from navigating the resources:
A potential risk with all that is that the flexibility on the server side comes also at a cost for client developers. Developing a generic client that works out-of-the-box would require good knowledge about all these concepts. (NB: there is also more work for document editors / OGC as more and more IANA registrations would be the likely result.) One of the key drivers behind the WFS 3.0 / OGC API Features activity, and I hope also behind the OGC API idea in general, was/is to reduce the learning curve and the complexity for developers compared to many of the standards from the OWS/XML stack. Yes, we also want to improve the overall architecture in the OGC baseline in this process, but we should avoid approaches that add complexity/flexibility that is not needed by the majority of the deployed APIs. If we go down a path with a very flexible resource structure, there should be agreement that OGC API standards (e.g., Features) can remove flexibility for "their" resource types. In Features we ended up with the current structure after implementation feedback and intensive discussions (see, e.g., issues 90, 64 and others) and that approach has proven to work well for Features. |
Thanks for this @cportele. I am admittedly out on a limb with the The reason I'm leaning toward an approach where each API access pattern gets its own literal path is largely what you point to as a key driver for OGC API.
Your notes are really important to what I'm seeing as a path forward:
In this world view, an OGC API can be cataloged in an OGC API Records and referenced as a dataset in its own right. I've used ISO19139 (services metadata) to integrate dataset services into processing workflows very successfully and see this as putting the complexity in the right place but keeping it "in band". Where are others at on this? @cmheazel @joanma747 What would you suggest as a path forward? I am pushing here because of how much work is bound up in EDR and Coverages pending this discussion. Coverages and EDR folks: @Schpidi @pebau @chris-little @m-burgoyne where do you stand on this? |
@cportele @dblodgett-usgs @cmheazel @joanma747 What I was hoping to see in OGC API - Common Part 2: Geospatial data is the following...
/collections/{collectionID} in both OGC API - Features and the current draft of OGC API - Coverage satisfies this for the most part.
Currently, the "rel" : "items" of OGC API - Features, also used in the current draft of OGC API - Coverage, linking respectively to /collections/{collectionID}/items and /collections/{collectionID}/coverage/all also satisfies this. Relations could be changed as needed, additional properties for the links could be added, but this is the functionality I hope ends up in this Common approach to Geospatial data.
Similarly, rather than BBOX+resolution, one might use the Tiles API instead in a consistent manner, for retrieving the data either as vector and/or raster. And one might use the Maps API to render that data in a consistent way, and one might refer to this data layer the same way as an input to a Process. These are the use cases I care the most about, and so far the newer proposals seem to move away from this and I see it as a major setback in terms of having a common approach to geospatial data. If we can resolve this, then we could discuss about how one might represent a hierarchical structure both within a single dataset, and as a way to organize multiple datasets, and whether that capability could be one and the same, or implemented in a similar manner, but that is largely a separate issue. My rationale for wishing to have this functionality is also entirely based on reducing the learning curve and the complexity for developers. By implementing these simple capability once, clients automatically handle the generic aspect of working with any type of geospatial data, and can gradually implement additional support for the special handling or capabilities specific to a particular data type or retrieval mechanism. As a practical example of the value of this, based on the current draft Coverage specifications, the only thing currently missing in our Features & Tiles API client from supporting Coverages is parsing CoverageJSON, because the current generic common geospatial data approach already allows it to follow the links all the way to /collections/{collectionID}/coverage/all which returns the data as CoverageJSON. Without writing any special code for Coverage, it could already see the titled coverages and their geospatial and temporal extents. |
@jerstlouis - You have lost me now. I thought you wanted to get rid of |
@cportele I never wanted to get rid of /collections, but because I thought the name collection in the path was the source of all this controversy, I suggested in a previous post that if we could figure out a way for the published Features standard to relax that 'collections' literal in /collections, and understand /collections to be wherever the landing page "rel" : "data" point to, then it might be easier to move forward. However you seem to indicate that Features would like to remain restrictive in this regard, which would at least imply that the literal 'collections' must remain if a dataset contains at least one Features data layer. In my last post, you could substitute /collections to /roses, with the exception that currently OGC API - Features, Coverage and Common (Collections) draft all prescribe /collections at the moment. The dataset / hierarchy discussion is separate and I was trying to avoid it until the most fundamental aspects are settled (i.e. points 1 & 2 which currently work with current draft specs). In an ideal world, I would combine the datasets/collections landing page, collections, and 'collection resource' to a single schema. Then such a resource could have links to data representations/views at the current level, links to sub-datasets and/or links to sub-collections, And you would have an indicator saying whether a particular resource constitutes a dataset per the DCAT definition. A service could have a higher up service landing page with service info, but not representing any specific datasets, linking to "data" (the root hierarchy for datasets and collections) and "processes". There could be links to api and conformance at whichever level(s) it makes sense. That root data resource being "/collections" would have been the easiest way to be compatible with Features as it is currently specified. |
@jerstlouis - I don't see Features moving away from I also don't think it is the name; if the resource definition (contents, sub-resources, parameters, etc) would work for other data items, the name shouldn't be a real issue. Also note that in the current drafts we already have dataset distributions that are not under To move forward on this issue, I think we need broader input, e.g. from those mentioned by @dblodgett-usgs. |
@cportele /tiles at the root of the dataset works for tiles containing all data layers, but we also have /tiles inside each {collectionID} to retrieve tiled layers individually. Also a service may serve both the raw data tiles, or may want rendered map tiles, which should be distinguished. I agree that the name shouldn't be a real issue, but I believe for some it is the main issue (e.g. see #11 (comment) , and contrast that with Jeff's previous comments.). |
Uhh, what I said was -> OGC shouldn't mandate the use of the term 'collections' as the identifier for all geospatial resources. But at this point in the OGC API development process it's reasonable for OGC to say the identifier of a {geospatialResource} could be "/collections/{collectionId}" or a coverage or another geospatialResource. Best Regards, |
@jeffharrison is it the term 'collection' that you have an issue with, or the idea of a common approach to geospatial data consistent across different APIs (common way to get from a landing page to your data layers, which has e.g. a spatiotemporal extent / volume, and links to resources to access that data, e.g. features items, coverage, bounding volume hierarchy tileset for 3D data)? In that comment I linked on issue 11 you seemed to welcome that proposal without the term 'collection'. |
Thanks @cportele. We really do need input from others here. So far, most of the discussion between @jerstlouis and others has been talking past one another without some shared use cases and assumptions to root the discussion in. I attempted to provide some focus in my opening comment: #140 (comment) and we need to focus this and iterate toward consensus rather than continue to air old arguments. |
@dblodgett-usgs @cportele @jeffharrison @jerstlouis @joanma747 I think that the OGC API Common Part 1 can do this, as can Part 2 Collections, and probably Records. Grouping of several resources quite tightly is desirable (e.g. all the Météo-France forecasts for today at a certain resolution, both upper air and surface), as are more loosely coupled groups (e.g. all forecasts and observation datasets for NW Europe, at differing resolutions, from Latvia to Portugal, issued on 13 October 1987) There are some use cases for compatibility with OGC API - Features collections/collectionId/items. "Layers" do not make sense to EDR, as a single datastore resource may have 10 million "layers", each of which could be MBs or even a GB in size. I am not sure that this gives you a clear direction. |
I'm doing my best to remain neutral but also push people on the issues and try to focus this discussion. I want to bring some comments from opengeospatial/ogcapi-coverages#65 over here. Thus far, the discussion is focusing heavily on the nature of the @jerstlouis offers a helpful set of benefits for treating the
excerpting @jerstlouis:
I find this very helpful for the following reason:
OK, so running with this a bit, what is a I think @pvretano offers some good words over in opengeospatial/ogcapi-coverages#65 (comment).
@pvretano, your attempt at self deprecation isn't working on me. I know you are way ahead of us. ;) I find this idea of "a collection of measurements (samples)" to be the profound bit. In API Features, API Coverages, and API Environmental Data, we are all circling around this notion of accessing a digital representation of the world, potentially bounded to some spatial domain. In Features, the representation is entities we have identified and want to share for whatever reason. In Coverages, the representation is a tessellation that, in an ideal world, approaches the continuum it is sampling. EDR accepts (cynically?) that people don't really care about features and coverages, and just want to ask what the dataset's estimate of the value of the real-world is for a location, point, area, trajectory, etc. So is that what a collection is? A spatially bounded collection of samples of a real world phenomena that (depending on the nature of the samples) can be accessed via a variety of APIs? One other interesting comment before I call out some others and look for a way forward. @tomkralidis says:
I want to call attention to: "Or maybe even an OGC API - Records record model? because, if we are going to go down this road, we must define the relationship (it can be flexible) between a collections and datasets that are going to show up in API Records. Elsewhere in @tomkralidis' comment, he points out that "this would also help servers provide "on board" catalogues of the data they serve pretty easily". The question that might get people thinking is: "Is there a cross walk between collection metadata and DCAT?!?" Now -- let's assume we go with How do we fix the issue that you have to parse a bunch of garbage you might not care about and find the stuff you do care about / have client code to deal with? @jyutzler described it over here: #47 (comment) Some have suggested a "collectionType" enum but that's gotten quick push back with counter suggestion of an "accessTypes" array. but I don't think that goes quite far enough. There is a strong desire to minimize the diversity of functionality that exists at a given API path. Is there a middle-way here? Can we define common collection info that sets us up to allow diversity without introducing undue complexity when implementing general client code? How do we bridge the gap between the advanced geospatial perspective where we have these abstract hierarchical datasets made up of collections with varied access patterns and a non-geospatial web developer who just needs to get their client or server code to work and be conformant? I want to suggest that the path forward
If we can define the initial building blocks in the APIs that are in motion (including Common), get our shared definitions right, and define this architecture in common, I think we can move forward. But we must stop talking past each other and seeking to understand other's requirements and find common ground. At this point, I'm curious where @jyutzler and @cmheazel are at on the issues. |
Hi Dave, many thanks for having done the painful work of collecting all these insights into the collection conundrum!!! what I'm seeing is two worlds colliding:
Trying to force data from the 2nd world into the simple clean concepts stemming from the first doesn't seem to be working, the reason we have our SensorThings (STA) and to my understanding the background of EDR.
Taking this a step further, I see many cases where the provision of the spatial (1) vs. data (2) aspects are performed by different organizations or institutions, thus firming up the requirements on being able to link data on a spatial object (or area to also support EDR) from one source with spatial information from a different source. Sorry, no solutions, just the concern that by ignoring the dichotomies engendered by the 2 worlds described above, we will continue to come up short of real world requirements. My 2 cents :) Kathi |
@KathiSchleidt @dblodgett-usgs In an attempt to bridge these two worlds, I would like to clarify what I meant by this "leaf (most granular) data entity" concept. Leaf / most-granular might have been an overstatement, as e.g. you could split a FeatureCollection into individual features, polygons, points. Similarly you could split a coverage in its individual grid cells or samples. So what I was picturing as the "leaf data layer" in the case of sensor data, is not the individual sensor or its measurements, but a collection of mutiple sensors, along with their geospatial and temporal aspects. Potentially, a single SensorThings API could be the source of one or more such data layers, or multiple SensorThings API could be sourced to provide one or more integrated "leaf data layer(s)" (e.g. based on the thematic context). Each of these data layer could then additionally be offered as either or both Feature Collections and Coverages, to facilitate the use of this information in GIS tools without built-in support for SensorThings API. When one SensorThings API maps directly to one such data layers, or when describing the SensorThings API itself, the spatio-temporal extent for it would be the overall extent of the temporal and geospatial coordinates for all measurements provided by that API. |
@jerstlouis thanks for this clarification!
I'd much appreciate a simple sketch of how to bring this fairly simple STA world into OAF |
@KathiSchleidt By class type, am I correct in understanding that you were referring to SensorThings conformance classes? In other OGC API specifications, such as Features and Coverage, conformance classes describe different capabilities of the API, which applies to the multiple available collections. Moving sensors -- each set of observation is taken at certain time, and the difference with non-moving sensors is that the geospatial coordinates changes along with the time. A collection of sensor measurements/observations still has an overall spatio-temporal extent. I don't think Observable properties would be a collection on their own. If one creates a coverage out of information coming from SensorThings API, then again the sensor position becomes the coordinates of the coverage sample, the measurement/observation is the value (sample) at that position, while time is an additional dimension of the coverage, and separate types of measurements can either be represented on separate planes (extra dimension?) or by splitting it into separate coverages. So the idea of how to regroup this sensor information would be to have the possibility to present this dataset of observations/measurements, which could potentially be retrieved using one or more SensorThings API, as one or more features collection, and/or as one or more coverages. I don't really believe that these worlds are that far apart, because people have been building GIS vector and raster datasets from measurements and observations for a long time. The only difference with SensorThings API and the IoT is a lot more information is available and it is real-time. But I don't think this prevents the representation of the information as classic Features collections and/or Coverages. However it presents some additional challenges due to that greater quantity and flow of information, and I think space partitioning mechanisms and dynamic distributed processing are key tools to solve those challenges. |
It would be good to hear @liangsteve's and @sarasaeedi 's perspective on the above :) |
Resolved through PR 149 |
Motion: The SWG moves that this issue has been resolved by Pull Request 149 and can be closed. |
Definition used for Collection in overview (section 7.1) - replace with definition 'A geospatial resource that may be available as one or more sub-resource distributions that conform to one or more OGC API standards.' |
@cmheazel PR #149 that closed this issue had actually added that definition: https://github.com/opengeospatial/ogcapi-common/pull/149/files I cannot find it anywhere in the latest draft however: http://docs.opengeospatial.org/DRAFTS/20-024.html It was actually changes to That addition was also unambiguously clear about an OGC API collection being a collection of data:
Therefore I would suggest that the SWG considers updating the definition to something that includes the term data like:
|
Updated definition of collection in section 7.1 to 'A geospatial resource that may be available as one or more sub-resource distributions that conform to one or more OGC API standards.' I'm reluctant to restrict this definition any more than is absolutely necessary. |
This issue attempts to pull the various /collections discussions into a single issue.
The text was updated successfully, but these errors were encountered: