-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make TDigestState configurable #96787
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
More work needed for TDigestPercentile*Tests and the TDigestTest (and the rest of the tests) in the tdigest lib to pass.
Remove wrong asserts from tests and MergingDigest.
Remove redundant serializing interfaces from the library.
These tests don't address compatibility issues in mixed cluster tests as the latter contain a mix of older and newer nodes, so the output depends on which node is picked as a data node since the forked TDigest library is not backwards compatible (produces slightly different results).
The latest library version is 50% slower and less accurate, as verified by ComparisonTests.
This helps with mixed cluster tests, where some of the tests where blocked.
Update quantile calculations between centroids and min/max values to match v.3.2.
The SortingDigest tracks all samples in an ArrayList that gets sorted for quantile calculations. This approach provides perfectly accurate results and is the most efficient implementation for up to millions of samples, at the cost of bloated memory footprint. The HybridDigest uses a SortingDigest for small sample populations, then switches to a MergingDigest. This approach combines to the best performance and results for small sample counts with very good performance and acceptable accuracy for effectively unbounded sample counts.
These will be submitted in a follow-up PR for enabling MergingDigest.
Delete dead and commented out code, make the remaining tests run reasonably fast. Remove unused annotations, esp. SuppressWarnings.
This reverts commit 2336b11.
Add SortingDigest as a simple structure for percentile calculations that tracks all data points in a sorted array. This is a fast and perfectly accurate solution that leads to bloated memory allocation. Add HybridDigest that uses SortingDigest for small sample counts, then switches to MergingDigest. This approach delivers extreme performance and accuracy for small populations while scaling indefinitely and maintaining acceptable performance and accuracy with constant memory allocation (15kB by default). Provide knobs to switch back to AVLTreeDigest, either per query or through ClusterSettings.
# Conflicts: # benchmarks/src/main/java/org/elasticsearch/benchmark/tdigest/TDigestBench.java # libs/tdigest/src/main/java/org/elasticsearch/tdigest/AVLGroupTree.java # libs/tdigest/src/main/java/org/elasticsearch/tdigest/AVLTreeDigest.java # libs/tdigest/src/main/java/org/elasticsearch/tdigest/AbstractTDigest.java # libs/tdigest/src/main/java/org/elasticsearch/tdigest/Dist.java # libs/tdigest/src/main/java/org/elasticsearch/tdigest/MergingDigest.java # libs/tdigest/src/main/java/org/elasticsearch/tdigest/TDigest.java # libs/tdigest/src/test/java/org/elasticsearch/tdigest/ComparisonTests.java # libs/tdigest/src/test/java/org/elasticsearch/tdigest/MedianTests.java # libs/tdigest/src/test/java/org/elasticsearch/tdigest/MergingDigestTests.java # libs/tdigest/src/test/java/org/elasticsearch/tdigest/ScaleFunctionTests.java # libs/tdigest/src/test/java/org/elasticsearch/tdigest/TDigestTests.java # modules/aggregations/build.gradle # modules/aggregations/src/yamlRestTest/resources/rest-api-spec/test/aggregations/percentile_ranks_tdigest_metric.yml # modules/aggregations/src/yamlRestTest/resources/rest-api-spec/test/aggregations/percentiles_tdigest_metric.yml # server/src/main/java/org/elasticsearch/search/aggregations/metrics/EmptyTDigestState.java # server/src/main/java/org/elasticsearch/search/aggregations/metrics/TDigestState.java # x-pack/plugin/analytics/src/test/java/org/elasticsearch/xpack/analytics/AnalyticsTestsUtils.java # x-pack/plugin/build.gradle # x-pack/plugin/sql/qa/server/src/main/resources/unsigned-long.csv-spec # x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/ml/evaluate_data_frame.yml # x-pack/qa/xpack-prefix-rest-compat/build.gradle
…903-merging # Conflicts: # server/src/test/java/org/elasticsearch/search/aggregations/metrics/InternalTDigestPercentilesRanksTests.java # server/src/test/java/org/elasticsearch/search/aggregations/metrics/InternalTDigestPercentilesTests.java
Update tests.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add SortingDigest as a simple structure for percentile calculations that
tracks all data points in a sorted array. This is a fast and perfectly
accurate solution that leads to bloated memory allocation.
Add HybridDigest that uses SortingDigest for small sample counts, then
switches to MergingDigest. This approach delivers extreme
performance and accuracy for small populations while scaling
indefinitely and maintaining acceptable performance and accuracy with
constant memory allocation (15kB by default).
Make TDigestState a TDigest decorator instead of a subclass. Its
factories hide the details of the underlying TDigest implementation,
offering a default and an optimizer for accuracy version. Both point
to AVLTreeDigest for now; the default will be switched to HybridDigest
in a follow-up PR.
Related to #96086.