Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-8632: New XYShape Field and Queries for indexing and searching general cartesian geometries #726

Closed
wants to merge 13 commits into from

Conversation

nknize
Copy link
Member

@nknize nknize commented Jun 17, 2019

The LatLonShape field and LatLonShape query classes added the ability to index and search geospatial geometries in the WGS-84 latitude, longitude coordinate reference system. The foundation for this capability is provided by the Tessellator that converts an array of vertices describing a Point Line or Polygon into a stream of 3 vertex triangles that are encoded as a seven dimension point and indexed using the BKD POINT structure. A nice property of the Tessellator is that lat, lon restrictions are artificial and really only bound by the API.

This commit builds on top of / abstracts the Tessellator LatLonShape and LatLonShapeQuery classes to provide the ability to index & search general cartesian (non WGS84 lat,lon restricted) geometry. It does so by introducing two new base classes: ShapeField and ShapeQuery that provide the indexing and search foundation for LatLonShape and the LatLonShape derived query classes (LatLonShapeBoundingBoxQuery, LatLonShapeLineQuery, LatLonShapePolygonQuery) and introducing a new XYShape factory class along with XYShape derived query classes (XYShapeBoundingBoxQuery, XYShapeLineQuery, XYShapePolygonQuery). The heart of the cartesian indexing is achieved through XYShapeEncodingUtils that converts the double precision vertices into an integer encoded seven dimension point (similar to LatLonShape).

The test framework is also further abstracted and extended to provide a full test suite for the new XYShape capability that works the same way as the LatLonShape test suite (but applied to non GIS geometries).

@dsmiley
Copy link
Contributor

dsmiley commented Jun 17, 2019

This all sounds great; I like the use of the "XY" as a naming prefix to help clarify the coordinate system so we don't confuse it with LatLon.

In your mind... with the increase spatial code in core, are you still resolute in your opinion of Lucene-core having all this spatial code? I mean... wow... it's not like Lucene core doesn't have a rich module system already. Yeah I know this is kinda settled already but this just gnaws at me.

@nknize
Copy link
Member Author

nknize commented Jun 19, 2019

With extensions like this (and possibly more to come) it certainly adds a lot to core. I'm not at all bound to the idea that LatLonShape must go in core. The spatial module is largely empty so I'm certainly open to relocating both LatLonShape and XYShape to the spatial module while keeping it dependency free and having only LatLonPoint in core.

@dsmiley
Copy link
Contributor

dsmiley commented Jun 19, 2019

Yeah, lets do that then (another issue). When the spatial stuff "graduates", as some say, out of sandbox. This stuff here can stay in sandbox since it's where LatLonShape is now.

@iverase
Copy link
Contributor

iverase commented Jun 25, 2019

Thanks Nick!

I am having problems with Polygon and XYPolygon and Line and XYLine and so on. Are we keeping Polygon for backwards compatibility? It looks like it should be called LatLonPolygon for consistency.

@nknize
Copy link
Member Author

nknize commented Jun 25, 2019

@iverase +1 for renaming Polygon and Line to LatLonPolygon and LatLonLine . It's cleaner that way.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me. It would be nice if the new XY points, lines and polygons could take floats from an API perspective since this is the accuracy they provide, even if we keep storing doubles internally in order to be able to share code with LatLon shapes?

@jpountz
Copy link
Contributor

jpountz commented Jun 26, 2019

The renames make sense to me but maybe they should go into a separate change and only go to master since Polygon is a quite a high level user-facing API?

@nknize
Copy link
Member Author

nknize commented Jun 26, 2019

I went ahead and updated XYPolygon, XYLine, XYPoint API to take float instead of double. Internally we still use doubles for the tessellation but the user must pass floats to the Geometry so it's consistent with the encoding.

I'm also in favor of handling the rename to LatLonPolygon and LatLonLine in a separate PR.

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how ShapeField and ShapeQuery work, they glue everything together nicely.

I don't think we need a XYPolygon2D and I am wondering if we need a XYRectangle2D at all. We can use Rectangle2D for XY as well. WDYT?

@iverase
Copy link
Contributor

iverase commented Jul 2, 2019

I am wondering if we can test the tessellator with XYPolygons as well. What I am thinking is to modify the current test and build the polygons as XYPolygons as well and run the tessellation through them.

In a similar fashion we might want to test the encoding/decoding for XYShapes.

@iverase
Copy link
Contributor

iverase commented Jul 2, 2019

Note to us: Currently XY shapes are under the geo package. I don't think that is correct and maybe we should add a different package (xy package?). I don't think we need it in this iteration but just writing it here so it is considered later on.

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to change the name of the methods in Tessellator.Triangle as it still have methods referring to Lat and Lon but is generically used by LatLonShape and XYShape

@nknize
Copy link
Member Author

nknize commented Jul 8, 2019

Cleaned up the API a bit:

  • refactors Tessellator.Triangle getLat / getLon methods to getY / getX, respectively
  • Changed XYShape.createIndexableFields and XYShape.newBoxQuery to accept floats instead of doubles

Refactored TestLatLonShapeEncoding to derive from a new BaseShapeEncodingTestCase class along with a new TestXYShapeEncoding class to equally test the XYShapeEncoding logic.

I think this PR is just about ready to merge and continue iterating in sandbox?

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

We can then iterate in the sandbox, thanks!

@nknize
Copy link
Member Author

nknize commented Jul 8, 2019

Closing: Merged in commit 0c09481

@nknize nknize closed this Jul 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants