-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply the date histogram rewrite optimization to range aggregation #13865
Conversation
Signed-off-by: bowenlan-amzn <[email protected]>
Signed-off-by: bowenlan-amzn <[email protected]>
Signed-off-by: bowenlan-amzn <[email protected]>
Signed-off-by: bowenlan-amzn <[email protected]>
Signed-off-by: bowenlan-amzn <[email protected]>
Signed-off-by: bowenlan-amzn <[email protected]>
❌ Gradle check result for c5d2175: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for ed79e02: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <[email protected]>
support all numeric types minus one on the upper range Signed-off-by: bowenlan-amzn <[email protected]>
6d41421
to
8ec5f58
Compare
8ec5f58
to
67c281c
Compare
❌ Gradle check result for 8ec5f58: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 67c281c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <[email protected]>
❌ Gradle check result for c10c775: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 783b14a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 783b14a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@github-actions commented on Jun 18, 2024, 11:17 PM PDT:
|
…13865) * Refactor the ranges representation Signed-off-by: bowenlan-amzn <[email protected]> * Refactor try fast filter Signed-off-by: bowenlan-amzn <[email protected]> * Main work finished; left the handling of different numeric data types Signed-off-by: bowenlan-amzn <[email protected]> * buildRanges accepts field type Signed-off-by: bowenlan-amzn <[email protected]> * first working draft probably Signed-off-by: bowenlan-amzn <[email protected]> * add change log Signed-off-by: bowenlan-amzn <[email protected]> * accommodate geo distance agg Signed-off-by: bowenlan-amzn <[email protected]> * Fix test support all numeric types minus one on the upper range Signed-off-by: bowenlan-amzn <[email protected]> * [Refactor] range is lower inclusive, right exclusive Signed-off-by: bowenlan-amzn <[email protected]> * adding test Signed-off-by: bowenlan-amzn <[email protected]> * Adding test and refactor Signed-off-by: bowenlan-amzn <[email protected]> * refactor Signed-off-by: bowenlan-amzn <[email protected]> * add test Signed-off-by: bowenlan-amzn <[email protected]> * add test and update the compare logic in tree traversal Signed-off-by: bowenlan-amzn <[email protected]> * fix test, add random test Signed-off-by: bowenlan-amzn <[email protected]> * refactor to address comments Signed-off-by: bowenlan-amzn <[email protected]> * small potential performance update Signed-off-by: bowenlan-amzn <[email protected]> * fix precommit Signed-off-by: bowenlan-amzn <[email protected]> * refactor Signed-off-by: bowenlan-amzn <[email protected]> * refactor Signed-off-by: bowenlan-amzn <[email protected]> * set refresh_interval to -1 Signed-off-by: bowenlan-amzn <[email protected]> * address comment Signed-off-by: bowenlan-amzn <[email protected]> * address comment Signed-off-by: bowenlan-amzn <[email protected]> * address comment Signed-off-by: bowenlan-amzn <[email protected]> * Fix test To understand fully about the double and bigdecimal usage in scaled float field will take more time. Signed-off-by: bowenlan-amzn <[email protected]> --------- Signed-off-by: bowenlan-amzn <[email protected]> (cherry picked from commit 57fb50b) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…13865) (#14463) * Refactor the ranges representation * Refactor try fast filter * Main work finished; left the handling of different numeric data types * buildRanges accepts field type * first working draft probably * add change log * accommodate geo distance agg * Fix test support all numeric types minus one on the upper range * [Refactor] range is lower inclusive, right exclusive * adding test * Adding test and refactor * refactor * add test * add test and update the compare logic in tree traversal * fix test, add random test * refactor to address comments * small potential performance update * fix precommit * refactor * refactor * set refresh_interval to -1 * address comment * address comment * address comment * Fix test To understand fully about the double and bigdecimal usage in scaled float field will take more time. --------- (cherry picked from commit 57fb50b) Signed-off-by: bowenlan-amzn <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@bowenlan-amzn Maybe a test failure here related to this change: https://build.ci.opensearch.org/job/gradle-check/41411/consoleText
That will reproduce for me, though not every time |
…pensearch-project#13865) * Refactor the ranges representation Signed-off-by: bowenlan-amzn <[email protected]> * Refactor try fast filter Signed-off-by: bowenlan-amzn <[email protected]> * Main work finished; left the handling of different numeric data types Signed-off-by: bowenlan-amzn <[email protected]> * buildRanges accepts field type Signed-off-by: bowenlan-amzn <[email protected]> * first working draft probably Signed-off-by: bowenlan-amzn <[email protected]> * add change log Signed-off-by: bowenlan-amzn <[email protected]> * accommodate geo distance agg Signed-off-by: bowenlan-amzn <[email protected]> * Fix test support all numeric types minus one on the upper range Signed-off-by: bowenlan-amzn <[email protected]> * [Refactor] range is lower inclusive, right exclusive Signed-off-by: bowenlan-amzn <[email protected]> * adding test Signed-off-by: bowenlan-amzn <[email protected]> * Adding test and refactor Signed-off-by: bowenlan-amzn <[email protected]> * refactor Signed-off-by: bowenlan-amzn <[email protected]> * add test Signed-off-by: bowenlan-amzn <[email protected]> * add test and update the compare logic in tree traversal Signed-off-by: bowenlan-amzn <[email protected]> * fix test, add random test Signed-off-by: bowenlan-amzn <[email protected]> * refactor to address comments Signed-off-by: bowenlan-amzn <[email protected]> * small potential performance update Signed-off-by: bowenlan-amzn <[email protected]> * fix precommit Signed-off-by: bowenlan-amzn <[email protected]> * refactor Signed-off-by: bowenlan-amzn <[email protected]> * refactor Signed-off-by: bowenlan-amzn <[email protected]> * set refresh_interval to -1 Signed-off-by: bowenlan-amzn <[email protected]> * address comment Signed-off-by: bowenlan-amzn <[email protected]> * address comment Signed-off-by: bowenlan-amzn <[email protected]> * address comment Signed-off-by: bowenlan-amzn <[email protected]> * Fix test To understand fully about the double and bigdecimal usage in scaled float field will take more time. Signed-off-by: bowenlan-amzn <[email protected]> --------- Signed-off-by: bowenlan-amzn <[email protected]>
…pensearch-project#13865) (opensearch-project#14463) * Refactor the ranges representation * Refactor try fast filter * Main work finished; left the handling of different numeric data types * buildRanges accepts field type * first working draft probably * add change log * accommodate geo distance agg * Fix test support all numeric types minus one on the upper range * [Refactor] range is lower inclusive, right exclusive * adding test * Adding test and refactor * refactor * add test * add test and update the compare logic in tree traversal * fix test, add random test * refactor to address comments * small potential performance update * fix precommit * refactor * refactor * set refresh_interval to -1 * address comment * address comment * address comment * Fix test To understand fully about the double and bigdecimal usage in scaled float field will take more time. --------- (cherry picked from commit 57fb50b) Signed-off-by: bowenlan-amzn <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Signed-off-by: kkewwei <[email protected]>
…pensearch-project#13865) * Refactor the ranges representation Signed-off-by: bowenlan-amzn <[email protected]> * Refactor try fast filter Signed-off-by: bowenlan-amzn <[email protected]> * Main work finished; left the handling of different numeric data types Signed-off-by: bowenlan-amzn <[email protected]> * buildRanges accepts field type Signed-off-by: bowenlan-amzn <[email protected]> * first working draft probably Signed-off-by: bowenlan-amzn <[email protected]> * add change log Signed-off-by: bowenlan-amzn <[email protected]> * accommodate geo distance agg Signed-off-by: bowenlan-amzn <[email protected]> * Fix test support all numeric types minus one on the upper range Signed-off-by: bowenlan-amzn <[email protected]> * [Refactor] range is lower inclusive, right exclusive Signed-off-by: bowenlan-amzn <[email protected]> * adding test Signed-off-by: bowenlan-amzn <[email protected]> * Adding test and refactor Signed-off-by: bowenlan-amzn <[email protected]> * refactor Signed-off-by: bowenlan-amzn <[email protected]> * add test Signed-off-by: bowenlan-amzn <[email protected]> * add test and update the compare logic in tree traversal Signed-off-by: bowenlan-amzn <[email protected]> * fix test, add random test Signed-off-by: bowenlan-amzn <[email protected]> * refactor to address comments Signed-off-by: bowenlan-amzn <[email protected]> * small potential performance update Signed-off-by: bowenlan-amzn <[email protected]> * fix precommit Signed-off-by: bowenlan-amzn <[email protected]> * refactor Signed-off-by: bowenlan-amzn <[email protected]> * refactor Signed-off-by: bowenlan-amzn <[email protected]> * set refresh_interval to -1 Signed-off-by: bowenlan-amzn <[email protected]> * address comment Signed-off-by: bowenlan-amzn <[email protected]> * address comment Signed-off-by: bowenlan-amzn <[email protected]> * address comment Signed-off-by: bowenlan-amzn <[email protected]> * Fix test To understand fully about the double and bigdecimal usage in scaled float field will take more time. Signed-off-by: bowenlan-amzn <[email protected]> --------- Signed-off-by: bowenlan-amzn <[email protected]>
Description
Background
Previously we optimized date histogram aggregation by utilizing the index structure of date field — BKD tree/Points to provide the documents count of date histogram buckets results. We first build date ranges from the date histogram query input like daily interval histogram. Then we "query" the index of date for how many documents are in these date ranges.
Idea
Date is actually saved as numeric data of long data type. This leads to the idea that we can extend the optimization to the RangeAggregator which also perform ranges aggregation on numeric data.
Implementation
The core optimization algorithm remains the same. (Details #13317)
Note the algorithm has a hidden pre-condition: all the ranges are not overlapping (because date interval cannot produce overlapping date ranges)
Another difference is that range aggregation buildRange method won't be built from segment level match all path. The ranges are provided by user directly, not like date histogram needs to check the boundaries of shard or segment or accommodate top level range query. (Details #12073)
Changes
long[]
Possible Follow Ups
Benchmarks
However, considering this is already ~10ms. We can look into the reason later as a follow up. For the other workloads, big5, nyc, http, they are all even faster.
big5
noaa
Related Issues
Resolves #13531
Check List
New functionality has been documented.API changes companion pull request created.Public documentation issue/PR createdBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.