-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: ST_EXTENT_AGG optimize envelope extraction from doc-values for cartesian_shape #118802
ESQL: ST_EXTENT_AGG optimize envelope extraction from doc-values for cartesian_shape #118802
Conversation
2369a33
to
cd5b519
Compare
cd5b519
to
a519779
Compare
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Hi @GalLalouche, I've created a changelog YAML for you. |
…icsearch into optimization/st_extent
e61f7a8
to
47966cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are still a few things to fix.
server/src/main/java/org/elasticsearch/index/mapper/AbstractShapeGeometryFieldMapper.java
Show resolved
Hide resolved
return columnReader ? DOC_VALUES : NONE; | ||
} | ||
|
||
public boolean isColumnReader() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appear to be unused. I think we can remove this boolean from the enum entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later I did see some code that could have used this, but at that point I also wondered if it would be better to just stick with the original boolean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must be a left-over. Removed.
|
||
private void testBoundsBlockLoaderAux( | ||
CoordinateEncoder encoder, | ||
Supplier<Geometry> generator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameter not used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Still unused.
Function<String, ShapeIndexer> indexerFactory, | ||
Function<Geometry, Optional<Rectangle>> visitor | ||
) throws IOException { | ||
var geometries = IntStream.range(0, 20).mapToObj(i -> ShapeTestUtils.randomGeometryWithoutCircle(0, false)).toList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this should call generator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still does not call generator.
iw.addDocument(doc); | ||
} | ||
} | ||
var indices = IntStream.range(0, geometries.size() / 2).map(x -> x * 2).toArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for this only looking at every second geometry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verifying that the block loader extracts data from the correct docs.
...in/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/PhysicalPlanOptimizerTests.java
Outdated
Show resolved
Hide resolved
* \_AggregateExec[[],[SPATIALSTEXTENT(location{f}#48,true[BOOLEAN]) AS extent],INITIAL,[ | ||
* minNegX{r}#73, minPosX{r}#74, maxNegX{rb#75, maxPosX{r}#76, maxY{r}#77, minY{r}#78],21] | ||
* \_FieldExtractExec[location{f}#48][location{f}#48] | ||
* \_EsQueryExec[airports], indexMode[standard], query[{"exists":{"field":"location","boost":1.0}}][ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can the EsQueryExec refer to the airports
index when that was not in the query? This comment seems incorrect. Perhaps copied from another test? Either delete the comment, or generate the correct one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
...in/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/PhysicalPlanOptimizerTests.java
Outdated
Show resolved
Hide resolved
@@ -6912,12 +7066,23 @@ private EsQueryExec assertChildIsGeoPointExtract(UnaryExec parent, boolean useDo | |||
} | |||
|
|||
private EsQueryExec assertChildIsExtractedAsDocValues(UnaryExec parent, boolean useDocValues, DataType dataType) { | |||
// TODO(gal) why is this OK To vacuously true? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this question mean? The line after it asserts that the child node is a FieldExtractExec, and that assertion is reflected in the method name above too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If attributesToExtract
is empty, this test would pass, which is probably not what we want if useDocValues
is true. I'll add a check that it isn't empty.
...in/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/PhysicalPlanOptimizerTests.java
Outdated
Show resolved
Hide resolved
…icsearch into optimization/st_extent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were no tests that cover the case where shapes were ingested without doc-values. If these existed, they would have failed since the current optimization triggers even if there are no doc-values. There are a few things to do to cover this scenario:
- Update PhysicalPlanOptimizerTests to have the airportCityBoundary index with and without doc-values and make assertions that the plan is only optimized to load extents from doc values if they are there.
- Add csv-spec tests that have cartesian_shapes with and without doc-values, and assert that the queries run with the same results in both.
- Add SpatialPushDownCartesianShapeIT tests that test the same results with and without doc-values and filter pushdown
For this PR it is sufficient to have either the csv-spec, or the SpatialPushDownCartesianShapeIT tests asserting the same results. Having both is even better.
f823bc5
to
d62e37d
Compare
NONE; | ||
|
||
public static FieldExtractPreference forColumnReader(boolean columnReader) { | ||
return columnReader ? DOC_VALUES : NONE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not really make sense to me. Both DOC_VALUES and EXTRACT_SPATIAL_BOUNDS will require a column reader. I suspect the situation is that the places this is used actually never test the EXTRACT_SPATIAL_BOUNDS case, so this does not matter. But once we add tests for that, it will.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is for backward compatibility. I'll just inline it and add a comment so it's clearer.
…icsearch into optimization/st_extent
And made a few updates to comments and javadocs
…icsearch into optimization/st_extent
…le extractions When there are multiple FieldExtractExec nodes, it is incorrect to allow the modification of the foundAttributes, but should take a copy, and modify the copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the latest tests and the grouping bug fixed, I think this looks good to go!
💚 Backport successful
|
…Cartesian_shape (elastic#118802) When we cartesian shapes to Lucene, we encode additional information in the binary, including the extent, so we can read the extent directly from the binary encoding instead of (re-)computing it per shape, and replace the (possibly very complicated) shape with a rectangle. At the moment, this is only done for Cartesian shapes, since for Geo shapes, we need to take dateline wrapping into account, which means it can't be directly encoded as a rectangle. We will deal with geo shapes in a future PR.
…Cartesian_shape (elastic#118802) When we cartesian shapes to Lucene, we encode additional information in the binary, including the extent, so we can read the extent directly from the binary encoding instead of (re-)computing it per shape, and replace the (possibly very complicated) shape with a rectangle. At the moment, this is only done for Cartesian shapes, since for Geo shapes, we need to take dateline wrapping into account, which means it can't be directly encoded as a rectangle. We will deal with geo shapes in a future PR.
…s for Cartesian_shape (#118802) (#119187) * ESQL: ST_EXTENT_AGG optimize envelope extraction from doc-values for Cartesian_shape (#118802) When we cartesian shapes to Lucene, we encode additional information in the binary, including the extent, so we can read the extent directly from the binary encoding instead of (re-)computing it per shape, and replace the (possibly very complicated) shape with a rectangle. At the moment, this is only done for Cartesian shapes, since for Geo shapes, we need to take dateline wrapping into account, which means it can't be directly encoded as a rectangle. We will deal with geo shapes in a future PR. * Use oldstyle switch
…Cartesian_shape (elastic#118802) When we cartesian shapes to Lucene, we encode additional information in the binary, including the extent, so we can read the extent directly from the binary encoding instead of (re-)computing it per shape, and replace the (possibly very complicated) shape with a rectangle. At the moment, this is only done for Cartesian shapes, since for Geo shapes, we need to take dateline wrapping into account, which means it can't be directly encoded as a rectangle. We will deal with geo shapes in a future PR.
This PR optimizes
ST_EXTENT_AGG
when performed on Cartesian shapes.When we cartesian shapes to Lucene, we encode additional information in the binary, including the extent, so we can read the extent directly from the binary encoding instead of (re-)computing it per shape, and replace the (possibly very complicated) shape with a rectangle. At the moment, this is only done for Cartesian shapes, since for Geo shapes, we need to take dateline wrapping into account, which means it can't be directly encoded as a rectangle. We will deal with geo shapes in a future PR.