-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build triangle tree from geo_shape indexed fields #50012
Build triangle tree from geo_shape indexed fields #50012
Conversation
Pinging @elastic/es-analytics-geo (:Analytics/Geo) |
Ups, I think the extent logic is wrong. In this case the extent is built using the bounding box of the triangles and it seems the result is not the expected one. I am starting having doubts if this logic really design for points can work for shapes. |
Ok I think I fix the extent logic. Whenever you have a shape that crosses the dateline, we flag that the extent cannot we wrapped around the dateline by setting the negRight and posLeft values to 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me in general. I left a couple of structural suggestions and some questions.
if (fieldType().hasDocValues()) { | ||
geometryIndexer.indexDocValueField(context, shape); | ||
// doc values are generated from the indexed fields. | ||
ShapeField.DecodedTriangle[] triangles = new ShapeField.DecodedTriangle[fields.size()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would push this inside GeometryIndexer's indexShape() it will encapsulate all decomposition/tessellation/convertion to lucene logic. It should allow us to write some unit tests for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we support index = false
in geo_shape? It seems that is ignored and the shape will always be indexed regardless of that setting. My suggestion would be to change the indexShape method to something like:
List<IndexableField> indexShape(ParseContext context, Processed shape, boolean index, boolean docValues);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And again is a bit more difficult. because we need to support arrays of shapes and we can only have one geo shape doc value per document.
@@ -32,12 +45,13 @@ | |||
private double sumY; | |||
private int count; | |||
|
|||
public CentroidCalculator() { | |||
public CentroidCalculator(Geometry geometry) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class feels a bit brittle since it is used for calculation and then passed as the final result. I wonder if we could pass just X, Y after we are done with calculations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we cannot do that because we need to support arrays of shapes and therefore the calculation of the centroid needs the calculator. This is something it should change in #49887 so I think we should not worry too much. Depending in how we decide to calculate the centroid we would need to change this class accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that if we are to change how the calculation is structured, it is best to do that in the dedicated calculator issue Ignacio mentioned
@@ -95,13 +104,14 @@ public GeoPoint geoPoint() { | |||
|
|||
@Override | |||
public BoundingBox boundingBox() { | |||
return new BoundingBox(geoPoint); | |||
boundingBox.reset(geoPoint); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment why this doesn't happen in the constructor?
public final double posLeft; | ||
public final double posRight; | ||
|
||
public BoundingBox(Extent extent, CoordinateEncoder coordinateEncoder) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how much this really buys us performance-wise in modern JVM? Did you run some benchmarks by any chance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some performance testing in a throw-away-branch (independent of this) where I limited the creation of Extents and such. I didn't see a significant performance difference, just a difference in object allocations which may help reduce garbage collection time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all short lived objects. So, my gut feel it shouldn't be bad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true. it was less significant for when there was one BoundingBox per document/field-value.
The more important place where there is significant churn is in the geo-grid tiler that creates a new Rectangle for every tile queried. There is where I did not see a performance difference, just an object allocation difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throughput of the query should not be much different but latency can suffer if we create millions of this objects as gc will play a role. This code runs on a tight loop and you can have several of this loops running in parallel so creating objects per document is a bad idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good. I do agree that the centroid-calculator feels awkward, but it is going to be required to change a little when #49887 is worked on
The current version of the triangle tree copies quite a lot of Lucene code for generating the triangles. This PR uses the already generated Indexed fields and use the method ShapeField#decodeTriangle to generate those triangles reducing the complexity of the TriangleTreeWriter.
In addition this Pr adds: