Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generalize encoding/decoding of coordinates to helper classes #46154

Merged
merged 4 commits into from
Aug 30, 2019

Conversation

talevy
Copy link
Contributor

@talevy talevy commented Aug 29, 2019

It is important that the GeometryTreeReader/Writers support different encodings more naturally. As of now, there is a need for GeoEncodingUtil encoding/decoding and XYEncodingUtil encoding/decoding.

This PR achieves three goals

  • abstract the encoding of double-valued x/y coordinates to the int-values that are used when serializing
  • make the GeometryTree structures re-usable between shape and geo_shape types
  • have the tree structures take in the raw coordinate values so that more accurate centroid calculations can be done at index-time for the geo-centroid aggregation

This PR leaves one open question / leak in the model

  • the Extent that is queried and returned by these tree structures is still in the encoded form. The reason this is left out of the PR is because Extent is an object that is repetitively used in recursive checks of bounds, and re-encoding the coordinates every time can be costly.
  • The idea is that the aggregations that rely on intersects queries will convert the BoundingBox structures to Extent objects. This would be done in the future by aggregations like geo*grid.

This PR also leaves out an implementation of ShapeCoordinateEncoder that would be used by the shape field-type. This can be added in the future.

@talevy talevy added the :Analytics/Geo Indexing, search aggregations of geo points and shapes label Aug 29, 2019
@talevy talevy requested a review from imotov August 29, 2019 20:08
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

@talevy
Copy link
Contributor Author

talevy commented Aug 30, 2019

run elasticsearch-ci/2

@talevy
Copy link
Contributor Author

talevy commented Aug 30, 2019

run elasticsearch-ci/2

I believe the failure here is related to #46195.

Copy link
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left one comment about one possible improvement. We can discuss it and implement it in another PR though.

BoundingBox(Extent extent) {
this.top = GeoEncodingUtils.decodeLatitude(extent.top);
this.bottom = GeoEncodingUtils.decodeLatitude(extent.bottom);
BoundingBox(Extent extent, CoordinateEncoder coordinateEncoder) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would simplify things if we "taught" extent how to decode itself, by passing it as a constrictor parameter into Extent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a good point! I will explore this when it is being used more in intersection queries in CellIdSource

@talevy talevy merged commit 04224e9 into elastic:geoshape-doc-values Aug 30, 2019
@talevy talevy deleted the gdv-generalize-encoding branch August 30, 2019 21:22
talevy added a commit that referenced this pull request Sep 20, 2019
It is important that the GeometryTreeReader/Writers support different encodings more naturally. As of now, there is a need for GeoEncodingUtil encoding/decoding and XYEncodingUtil encoding/decoding.

This PR achieves three goals
- abstract the encoding of double-valued x/y coordinates to the int-values that are used when serializing
- make the GeometryTree structures re-usable between shape and geo_shape types
- have the tree structures take in the raw coordinate values so that more accurate centroid calculations can be done at index-time for the geo-centroid aggregation

This PR leaves one open question / leak in the model

- the Extent that is queried and returned by these tree structures is still in the encoded form. The reason this is left out of the PR is because Extent is an object that is repetitively used in recursive checks of bounds, and re-encoding the coordinates every time can be costly.
- The idea is that the aggregations that rely on intersects queries will convert the BoundingBox structures to Extent objects. This would be done in the future by aggregations like geo*grid.

This PR also leaves out an implementation of ShapeCoordinateEncoder that would be used by the shape field-type. This can be added in the future.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants