-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datetime aggregation fixes. #1061
Datetime aggregation fixes. #1061
Conversation
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #1061 +/- ##
============================================
+ Coverage 98.34% 98.36% +0.01%
- Complexity 3600 3614 +14
============================================
Files 343 343
Lines 8908 8973 +65
Branches 567 574 +7
============================================
+ Hits 8761 8826 +65
Misses 142 142
Partials 5 5
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
core/src/main/java/org/opensearch/sql/expression/aggregation/AvgAggregator.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/aggregation/AvgAggregator.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Yury-Fridlyand <[email protected]>
4a9dfda
to
8ff4298
Compare
Please, see fix in 8ff4298. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes!
Minor comment: I just realized that our cast function doesn't accept ExprType
, so the switch in AvgAggregator
is required. Besides improving cast function, probably one thing we can do now is to pass a cast function to an unique AvgState
class. For example, in switch statement, pass DSL::castToTimestamp
, DSL::castToDouble
accordingly etc. I think the Instant.ofEpochMilli()
call in each AvgState
should be part of cast function from double to timestmap/date/time etc?
We can do it later if it makes sense and if you want. Thanks!
Just to confirm - do you want to add new cast |
Never mind. Just some thought about if it makes sense to add this logic to |
* Add aggregator fixes and some useless unit tests. * Add `avg` aggregation on datetime types. * Rework in-memory `AVG`. Fix parsing value returned from the OpenSearch node. Signed-off-by: Yury-Fridlyand <[email protected]>
9d38566
to
3743581
Compare
Rebased, please, re-review |
...va/org/opensearch/sql/opensearch/storage/script/aggregation/ExpressionAggregationScript.java
Outdated
Show resolved
Hide resolved
Co-authored-by: MaxKsyunz <[email protected]> Signed-off-by: Yury-Fridlyand <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
64c14fc
to
af13d83
Compare
} | ||
|
||
@Override | ||
protected AvgState iterate(ExprValue value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to reuse DSL.adddate()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but DateAvgState
will be less clear and readable:
protected static class DateAvgState extends AvgState {
public DateAvgState() {
this.count = new ExprIntegerValue(0);
this.total = new ExprDateValue(LocalDate.EPOCH);
}
@Override
public ExprValue result() {
if (0 == count.integerValue()) {
return ExprNullValue.of();
}
return DSL.adddate(DSL.literal(new ExprDateValue(LocalDate.EPOCH)),
DSL.literal(DSL.divide(DSL.literal(DAYS.between(LocalDate.EPOCH, total.dateValue())),
DSL.literal(count)).valueOf().longValue())).valueOf();
}
@Override
protected AvgState iterate(ExprValue value) {
total = DSL.adddate(DSL.literal(total),
DSL.literal(DAYS.between(LocalDate.EPOCH, value.dateValue())))
.valueOf();
return super.iterate(value);
}
}
core/src/main/java/org/opensearch/sql/expression/aggregation/AvgAggregator.java
Show resolved
Hide resolved
* Revert recent changes in `OpenSearchExprValueFactory`. * Update `BucketAggregationBuilder` to specify how to interpret datetime values. Signed-off-by: Yury-Fridlyand <[email protected]>
@Yury-Fridlyand it makes sense to switch base branch to |
@@ -64,6 +70,10 @@ private CompositeValuesSourceBuilder<?> buildCompositeValuesSourceBuilder( | |||
.missingBucket(true) | |||
.missingOrder(missingOrder) | |||
.order(sortOrder); | |||
// Time types values are converted to LONG in ExpressionAggregationScript::execute | |||
if (List.of(TIMESTAMP, TIME, DATE, DATETIME).contains(expr.getDelegated().type())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this impact other data type? for example, cast('1' as integer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because I check whether type is one of TIMESTAMP, TIME, DATE, DATETIME
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
Merge conflicts resolved |
* Update aggregation to support datetime types. Signed-off-by: Yury-Fridlyand <[email protected]> Signed-off-by: Yury-Fridlyand <[email protected]> Co-authored-by: MaxKsyunz <[email protected]> (cherry picked from commit 5220a98)
* Update aggregation to support datetime types. Signed-off-by: Yury-Fridlyand <[email protected]> Signed-off-by: Yury-Fridlyand <[email protected]> Co-authored-by: MaxKsyunz <[email protected]> (cherry picked from commit 5220a98) Co-authored-by: Yury-Fridlyand <[email protected]>
Signed-off-by: Yury-Fridlyand [email protected]
Description
Make listed aggregations work with datetime types.
Please, see team review and discussion in Bit-Quill#144.
Fixes
min
max
avg
Not included
var_samp
var_pop
stddev_samp
stddev_pop
sum
[1]Implementation details:
Presicion
Aggregation done with milliseconds precision - this follows OpenSearch approach. See code snippets: one, two.
We convert datetimes to millis and back on our own (see below), so we can scale up to nanoseconds. Such fix requires additional changes - few tests fail.
ExpressionAggregationScript::execute
A callback function, called by OpenSearch node to extract a value during processing aggregation script.
This added for backward compatibility:
https://github.com/Bit-Quill/opensearch-project-sql/blob/272bf67c9009727a7ce24958b58f0357f4e1dc4e/opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/aggregation/ExpressionAggregationScript.java#L53-L55
This - to extract value. Fortunately,
toEpochMilli()
returns negative values for pre-Epoch timestamps, so we are able to use it for group/compare any values.https://github.com/Bit-Quill/opensearch-project-sql/blob/272bf67c9009727a7ce24958b58f0357f4e1dc4e/opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/aggregation/ExpressionAggregationScript.java#L56-L67
OpenSearch accepts dates in
Joda
lib types and converts to milliseconds since Epoch. We have java datetime types, to I do conversion there.https://github.com/opensearch-project/OpenSearch/blob/140e8d3e6c91519edc47be07b4cd053fdfac1769/server/src/main/java/org/opensearch/search/aggregations/support/values/ScriptDoubleValues.java#L123
BucketAggregationBuilder::buildCompositeValuesSourceBuilder
To properly process datetime types it is required to provide a hint to the engine.
sql/opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/aggregation/dsl/BucketAggregationBuilder.java
Lines 73 to 76 in 3a902ef
ExprValueUtils
These two methods used by
avg
in-memory aggregation for datetime types.https://github.com/Bit-Quill/opensearch-project-sql/blob/272bf67c9009727a7ce24958b58f0357f4e1dc4e/core/src/main/java/org/opensearch/sql/data/model/ExprValueUtils.java#L190
https://github.com/Bit-Quill/opensearch-project-sql/blob/272bf67c9009727a7ce24958b58f0357f4e1dc4e/core/src/main/java/org/opensearch/sql/data/model/ExprValueUtils.java#L213
AvgAggregator
AvgState
was renamed toDoubleAvgState
DateTimeAvgState
does the same logic, but for datetime types.https://github.com/Bit-Quill/opensearch-project-sql/blob/272bf67c9009727a7ce24958b58f0357f4e1dc4e/core/src/main/java/org/opensearch/sql/expression/aggregation/AvgAggregator.java#L105-L117
AvgState
AggregatorFunction
New signature were added.
https://github.com/Bit-Quill/opensearch-project-sql/blob/272bf67c9009727a7ce24958b58f0357f4e1dc4e/core/src/main/java/org/opensearch/sql/expression/aggregation/AggregatorFunction.java#L69-L76
Actually, plugin was able to do push down aggregation on datetime types, but this weren't accepted.
[1] It works on MySQL, but I don't see any reason to implement this for datetime types.
Test queries
In-memory aggregation
https://github.com/Bit-Quill/opensearch-project-sql/blob/17e1ae98bd7e403bc94043a607ae1993d5062422/integ-test/src/test/java/org/opensearch/sql/sql/AggregationIT.java#L196-L199
(cast required to ensure that you're working with
TIME
).To try other types, use
Push Down aggregation (
metric
)Push Down aggregation (
composite_buckets
)Issues Resolved
Fixes #645
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.