Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GEO] Support for Spatial Coordinate Reference Systems #48953

Open
1 of 3 tasks
nknize opened this issue Nov 11, 2019 · 6 comments
Open
1 of 3 tasks

[GEO] Support for Spatial Coordinate Reference Systems #48953

nknize opened this issue Nov 11, 2019 · 6 comments
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >feature Meta Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@nknize
Copy link
Contributor

nknize commented Nov 11, 2019

Target License

This feature will be licensed Gold.

Overview

GeoJSON (and, indirectly, TopoJSON) optionally include a Coordinate Reference System to pass projection information along with a document. The CRS supports Standard EPSG Codes or custom projection information by way of explicit URN definitions:

"crs": {
   "type": "name",
  "properties": {
    "name": "urn:ogc:def:crs:OGC:1.3:CRS84"
  }
}

Or a link (URI or aux file):

"crs": {
  "type": "link", 
  "properties": {
    "href": "http://example.com/crs/42",
    "type": "proj4"
  }
}

"crs": {
  "type": "link",
  "properties": {
    "href": "data.crs",
    "type": "ogcwkt"
  }
}

This feature adds support for the crs object parameter to the geo_shape field mapper enabling users to select or define the projection for indexed geo_shapes.

Scope

  1. Any OGC Standard or user defined projection should be supported.
  2. This feature will initially apply to the geo_shape field type but should also probably support geo_point types as well.
  3. Mult-projection hashing (e.g., Equirectangular / Equidistant / Equal-Area) will be worked under a separate issue.

Design

  1. The crs field will simply define the meaning of the values in the coordinates/points field. (e.g., [-10300392.4831016, 3782546.51099759] in mercator maps to [-92.53, 32.32] in lon/lat.
  2. Like the orientation field the crs field can be specified in the field mapping and overridden on each document insert.
  3. The document will store coordinates/points in the provided projection, it will not rewrite those points to lon/lat.
  4. A reprojection processor will be worked in a separate issue.

Questions / Considerations

  1. Should a geo query/filter accept an optional crs field so that a user can request coordinates returned in their preferred projection? This will introduce overhead on the query, especially for geodesic to geocentric conversions where convergence is required. Default behavior will be standard WGS84 lon/lat on return. Default behavior will use the projection specified in the field mapper.
  2. There are 4,362 different projections.

Criticisms

  1. Technically the crs field was removed in GeoJSON v1.4. For web-based visualizations this is not a bad thing as browser applications shouldn't attempt to support over four thousand projections. While the standards police may criticize for using an "obsolete" field, a storage system designed to handle GIS applications and operations requires projection information.

Tasks

@nknize nknize added >feature :Analytics/Geo Indexing, search aggregations of geo points and shapes v8.0.0 7x labels Nov 11, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Geo)

@thomasneirynck
Copy link
Contributor

thomasneirynck commented Jun 23, 2020

Will this effect current clients who are expecting geo_shape fields to have lat/lon EPSG:4326 coordinates? In other words, when this merges, will Elasticsearch-clients still be able to make the assumption that searches on geo_shape fields will return coordinates in lat/lon? Or will clients now explicitly have to check the field-mapping? Will this require certain additional parameters in the qeuery-DS? imho this has the potential to break general purpose ES-clients e.g. Kibana, possibly gdal, ... without corresponding changes on the client-end.

@nknize
Copy link
Contributor Author

nknize commented Jun 23, 2020

The expected behavior is that search results will be returned in the same projection as they were indexed. Search time reprojection is a separate feature that will be further explored (as it is extremely costly, and likely not scalable, to reproject every result). To achieve what you're describing it is suggested that clients use the reproject ingest processor to index in a separate lat/lon field, then use that field for searches in lat/lon projection.

@thomasneirynck
Copy link
Contributor

thomasneirynck commented Jun 23, 2020

The expected behavior is that search results will be returned in the same projection as they were indexed.

We should coordinate then that before this merges on the ES-side, the necessary changes can be made on the Kibana-side so they are released in the same minor. Otherwise, Kibana will display indices that cannot be visualized. This would be a poor user experience.

Search time reprojection is a separate feature that will be further explored

So that is different then from what we discussed earlier, where I though the recommendation was that clients could query for the documents in lon/lat and get the results in lon/lat. In effect, this would mean that elastic/kibana#67476 is not an option, at least initially?

@thomasneirynck
Copy link
Contributor

thomasneirynck commented Jul 13, 2020

Some notes from a client-perspective about possible impact:


Kibana (and vast majority of web-apps using common mapping toolkits like mapbox, leaflet, google maps, ...) expect coordinates to be latitude/longitudes in the WGS84 datum

  • Will a client be able to determine whether the reference system returns coordinates in lat/lon WGS84? For example, EPSG:4326 and EPSG:32662 would both be usable for Kibana.
    • (presumably, we could hardcode a list of codes in Kibana. This approach falls apart when users use custom Proj4 definitions and not a known EPSG code).

Will these new reference systems support all aggregation with the current semantics?

  • geo_tile: This does not return coordinates, but has some implications in how intersection between cell and point/shape is computed (see below)
  • geo_centroid: the centroid is returned as Lon/lat WGS84

Because of geo_tile and geo_centroid, even though no raw documents are retrieved, Kibana can show usable visualziations (e.g. clusters, heatmaps).

The semantics of what it means for a point/shape to be indexed is currently somewhat fuzzy in ES.

  • geo_point/geo_shape field is lat/lon wgs84 (EPSG:4326). This is conceptually how most users think about it.
  • the bounding-box queries/filters use cartesian logic. This implies that the reference system is essentially uses a platte-carree orojection (EPSG:32662)
  • (geo_point only): the geo_distance aggregation uses a spherical approximation of the of the globe (Haversine). That does not quite correspond to either the WGS84-datum ellipsoid, and neither is it cartesian distance.

These are a fair compromise, and imho strike a nice balance:

  • The majority of data, especially the kind indexed into Elasticsearch, (GPS coordinates, RFID locations, "data-in-the-wild"), is generated or stored in lat/lon WGS84. ES causes no friction indexing this as-is.
  • The majority of clients will rely on bounding-box queries to only query for data in the view. In cylindrical-projections (like webmercator), the semantics (from a client-perspective) are not that relevant. As long as the bounds-param captures what is currently in the view. (And changing from straight-line in platte-carree to great-circle interpretation of lines, will only expand the capture-area, not degrading end-user experience).
  • for other topological operations (e.g. intersects with arbitrary shapes), the mismatch may be higher, but really only relevant at small (zoomed out) scales.
  • Distance is computed with Haversine. Only at small (zoomed out) scales, this is really problematic. For 95% of use-cases, this is not problematic.
  • Data-in-the-wild is very-often stored with some implied precision. e.g. GeoJson with great-circle lines will generally have some discretization of the lines applied in the data itself. This minimized discrepancies between topological operations (e.g. intersects) and display as well.

Some specific questions:

  • Will geo_distance work differently for geodetic and projected systems?
  • Will binary topology (e.g. intersects/contains) behave differently for geodetic and projected systems?
  • Will geodetic system always imply great-circle lines (or also rhumb-lines)?
  • Will all aggregations, in particular geo_tile and geo_centroid, remain supported with the current semantics?
  • Will geo_distance be added to the current geo_shape (without projection defined) and what approach will it use?

@timrobertson100
Copy link

This feature will be licensed Gold

Now that gold is discontinued, are there any thoughts on whether projection support is going to be open for all, please? We may be interested in contributing but would like to know the target license.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >feature Meta Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

6 participants