-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Post Aggregators for Tuple Sketches #13819
Add Post Aggregators for Tuple Sketches #13819
Conversation
...ry/aggregation/datasketches/tuple/ArrayOfDoublesSketchToEncodedStringPostAggregatorTest.java
Fixed
Show resolved
Hide resolved
.../aggregation/datasketches/tuple/ArrayOfDoublesSketchToMetricsEstimatePostAggregatorTest.java
Fixed
Show resolved
Hide resolved
@techdocsmith Hello, if you could help me tag a reviewer for these new features. |
Thanks for the contribution @anshu-makkar . I'm not sure who should do this review--I focus mostly on markdown :D . @abhishekagarwal87 do you have a recommendation? |
@anshu-makkar - Thank you for your contribution. can you please address build failures? You also need to update docs - https://github.com/apache/druid/blob/master/docs/development/extensions-core/datasketches-tuple.md#post-aggregators |
@abhishekagarwal87 : Yes I will work on adding the documentation, and address the build failures. Any other comments if you have I can address those. |
...e/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchConstantPostAggregator.java
Show resolved
Hide resolved
...e/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchConstantPostAggregator.java
Outdated
Show resolved
Hide resolved
...e/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchConstantPostAggregator.java
Outdated
Show resolved
Hide resolved
@anshu-makkar - are you going to update the code later? You have resolved the comments. |
Hi @abhishekagarwal87 I have modified the code and addressed all the comments. |
…hu-makkar/druid into feature/add-post-aggs-tuple-sketches
Co-authored-by: Victoria Lim <[email protected]>
@abhishekagarwal87 : Request to you review if there are more changes required. |
...e/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchConstantPostAggregator.java
Outdated
Show resolved
Hide resolved
|
||
### Estimated metrics values for each column of ArrayOfDoublesSketch | ||
|
||
This post aggregator returns a list of estimated values(sum) from a given ArrayOfDoublesSketch. The result is _N_ double values, where _N_ is the number of double values kept in the sketch per key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This post aggregator returns a list of estimated values(sum) from a given ArrayOfDoublesSketch. The result is _N_ double values, where _N_ is the number of double values kept in the sketch per key. | |
This post aggregator returns a list of estimated values(sum of distincts) from a given ArrayOfDoublesSketch. The result is _N_ double values, where _N_ is the number of double values kept in the sketch per key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs an overall rephrase, basically with each distinct value, there is a array
of doubles
, with this post-aggregator we want to provide a double array with sum.
For E.g.
Key_1, {1.0, 2.0}
Key_2, {2.0, 5.0}
Output from Post-aggregator is {3.0, 7.0}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@techdocsmith @vtlim Can you help rephrase this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This explanation works. Though it does need to be called out that result is an estimated sum and not accurate sum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could This post aggregator returns a list of estimated values(sum) from a given ArrayOfDoublesSketch.
be more like This post aggregator returns a list of estimated sum for each metric value from a given ArrayOfDoublesSketch.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, just saw this. The example isn't clear to me; is the sum being added row-wise or column-wise?
For this example,
Key_1, {1.0, 3.0}
Key_2, {2.0, 5.0}
Would the result be {3.0, 8.0}
or {4.0, 7.0}
?
Commented a suggestion in https://github.com/apache/druid/pull/13819/files?file-filters%5B%5D=.md&show-viewed-files=true#r1120990146. It would be great to include this example in the docs as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Result would be {3.0, 8.0}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anshu-makkar : Thanks a lot for the changes! I have left some comments on the core changes, please let me know your thoughts on them.
...e/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchConstantPostAggregator.java
Outdated
Show resolved
Hide resolved
.../query/aggregation/datasketches/tuple/ArrayOfDoublesSketchToEncodedStringPostAggregator.java
Outdated
Show resolved
Hide resolved
.../query/aggregation/datasketches/tuple/ArrayOfDoublesSketchToEncodedStringPostAggregator.java
Outdated
Show resolved
Hide resolved
...uery/aggregation/datasketches/tuple/ArrayOfDoublesSketchToMetricsEstimatePostAggregator.java
Outdated
Show resolved
Hide resolved
...e/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchConstantPostAggregator.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the review @anshu-makkar! 👍 I have done a second pass over the changes and left some comments. Request you to please take a look at it.
...uery/aggregation/datasketches/tuple/ArrayOfDoublesSketchToMetricsEstimatePostAggregator.java
Outdated
Show resolved
Hide resolved
...n/java/org/apache/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchModule.java
Outdated
Show resolved
Hide resolved
...e/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchConstantPostAggregator.java
Show resolved
Hide resolved
...e/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchConstantPostAggregator.java
Outdated
Show resolved
Hide resolved
...e/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchConstantPostAggregator.java
Show resolved
Hide resolved
...e/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchConstantPostAggregator.java
Show resolved
Hide resolved
|
||
### Estimated metrics values for each column of ArrayOfDoublesSketch | ||
|
||
This post aggregator returns a list of estimated values(sum) from a given ArrayOfDoublesSketch. The result is _N_ double values, where _N_ is the number of double values kept in the sketch per key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could This post aggregator returns a list of estimated values(sum) from a given ArrayOfDoublesSketch.
be more like This post aggregator returns a list of estimated sum for each metric value from a given ArrayOfDoublesSketch.
?
processing/src/main/java/org/apache/druid/query/aggregation/AggregatorUtil.java
Outdated
Show resolved
Hide resolved
...uid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchConstantPostAggregatorTest.java
Show resolved
Hide resolved
@abhishekagarwal87 @rohangarg Thanks for taking time to review this. I will work on remaining review comment to add a test case to |
Based on Druid native query support changes being added on: apache#13819
Based on Druid native query support changes being added on: apache#13819
thanks @anshu-makkar. Please let us know once you think it's ready for review. I think we are almost there. |
@abhishekagarwal87 : I have added the tests as suggested for native query testing for newly added post-aggs. |
Add description of post-aggregator with example Co-authored-by: Victoria Lim <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor comments. LGTM otherwise.
...g/apache/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchAggregationTest.java
Outdated
Show resolved
Hide resolved
...g/apache/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchAggregationTest.java
Outdated
Show resolved
Hide resolved
...n/java/org/apache/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchModule.java
Outdated
Show resolved
Hide resolved
…oMetricsSumEstimate Co-authored-by: Abhishek Agarwal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for docs
Request to merge, since we have another development activity on top of this to add |
Merged. Thank you for your contribution @anshu-makkar |
Includes tests which exercise the use-case outlined on: apache#13819
Includes tests which exercise the use-case outlined on: apache#13819
Includes tests which exercise the use-case outlined on: apache#13819
Includes tests which exercise the use-case outlined on: apache#13819
You can now do the following operations with TupleSketches in Post Aggregation Step Get the Sketch Output as Base64 String Provide a constant Tuple Sketch in post-aggregation step that can be used in Set Operations Get the Estimated Value(Sum) of Summary/Metrics Objects associated with Tuple Sketch
This PR is a follow-up to #13819 so that the Tuple sketch functionality can be used in SQL for both ingestion using Multi-Stage Queries (MSQ) and also for analytic queries against Tuple sketch columns.
The version originally added in apache#13819 was missing an annotation for the "value" property. Fixes apache#16539. Line endings for ArrayOfDoublesSketchConstantPostAggregator.java are changed from \r\n to \n. Adds a serde test, and improves various other datasketches post-aggregator serde tests to deserialize into PostAggregator. This verifies that the type information is set up correctly.
* Fix serde for ArrayOfDoublesSketchConstantPostAggregator. The version originally added in #13819 was missing an annotation for the "value" property. Fixes #16539. Line endings for ArrayOfDoublesSketchConstantPostAggregator.java are changed from \r\n to \n. Adds a serde test, and improves various other datasketches post-aggregator serde tests to deserialize into PostAggregator. This verifies that the type information is set up correctly. * Fix excessive imports. * Fix equals, hashCode.
Description
Added New Post Aggregators for Tuple Sketches
Release note
New: You can now do the following operations with TupleSketches in Post Aggregation Step
Key changed/added classes in this PR
Added
ArrayOfDoublesSketchToEncodedStringPostAggregator.java
ArrayOfDoublesSketchToMetricsEstimatePostAggregator.java
ArrayOfDoublesSketchConstantPostAggregator.java
ArrayOfDoublesSketchToMetricsEstimatePostAggregatorTest.java
ArrayOfDoublesSketchToEncodedStringPostAggregatorTest.java
Modified
AggregatorUtil.java
ArrayOfDoublesSketchModule.java
This PR has: