Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datetime aggregation fixes. #1061

Merged

Conversation

Yury-Fridlyand
Copy link
Collaborator

@Yury-Fridlyand Yury-Fridlyand commented Nov 9, 2022

Signed-off-by: Yury-Fridlyand [email protected]

Description

Make listed aggregations work with datetime types.
Please, see team review and discussion in Bit-Quill#144.

Fixes

  • min
  • max
  • avg

Not included

  • var_samp
  • var_pop
  • stddev_samp
  • stddev_pop
  • sum [1]

Implementation details:

Presicion

Aggregation done with milliseconds precision - this follows OpenSearch approach. See code snippets: one, two.
We convert datetimes to millis and back on our own (see below), so we can scale up to nanoseconds. Such fix requires additional changes - few tests fail.

ExpressionAggregationScript::execute

A callback function, called by OpenSearch node to extract a value during processing aggregation script.
This added for backward compatibility:
https://github.com/Bit-Quill/opensearch-project-sql/blob/272bf67c9009727a7ce24958b58f0357f4e1dc4e/opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/aggregation/ExpressionAggregationScript.java#L53-L55
This - to extract value. Fortunately, toEpochMilli() returns negative values for pre-Epoch timestamps, so we are able to use it for group/compare any values.
https://github.com/Bit-Quill/opensearch-project-sql/blob/272bf67c9009727a7ce24958b58f0357f4e1dc4e/opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/aggregation/ExpressionAggregationScript.java#L56-L67
OpenSearch accepts dates in Joda lib types and converts to milliseconds since Epoch. We have java datetime types, to I do conversion there.
https://github.com/opensearch-project/OpenSearch/blob/140e8d3e6c91519edc47be07b4cd053fdfac1769/server/src/main/java/org/opensearch/search/aggregations/support/values/ScriptDoubleValues.java#L123

BucketAggregationBuilder::buildCompositeValuesSourceBuilder

To properly process datetime types it is required to provide a hint to the engine.

// Time types values are converted to LONG in ExpressionAggregationScript::execute
if (List.of(TIMESTAMP, TIME, DATE, DATETIME).contains(expr.getDelegated().type())) {
sourceBuilder.userValuetypeHint(ValueType.LONG);
}

ExprValueUtils

These two methods used by avg in-memory aggregation for datetime types.
https://github.com/Bit-Quill/opensearch-project-sql/blob/272bf67c9009727a7ce24958b58f0357f4e1dc4e/core/src/main/java/org/opensearch/sql/data/model/ExprValueUtils.java#L190
https://github.com/Bit-Quill/opensearch-project-sql/blob/272bf67c9009727a7ce24958b58f0357f4e1dc4e/core/src/main/java/org/opensearch/sql/data/model/ExprValueUtils.java#L213

AvgAggregator

  1. Added new field to store which type is being aggregated
  2. AvgState was renamed to DoubleAvgState
  3. DateTimeAvgState does the same logic, but for datetime types.
    https://github.com/Bit-Quill/opensearch-project-sql/blob/272bf67c9009727a7ce24958b58f0357f4e1dc4e/core/src/main/java/org/opensearch/sql/expression/aggregation/AvgAggregator.java#L105-L117
  4. Added abstract class AvgState

AggregatorFunction

New signature were added.
https://github.com/Bit-Quill/opensearch-project-sql/blob/272bf67c9009727a7ce24958b58f0357f4e1dc4e/core/src/main/java/org/opensearch/sql/expression/aggregation/AggregatorFunction.java#L69-L76
Actually, plugin was able to do push down aggregation on datetime types, but this weren't accepted.

[1] It works on MySQL, but I don't see any reason to implement this for datetime types.

Test queries

In-memory aggregation

https://github.com/Bit-Quill/opensearch-project-sql/blob/17e1ae98bd7e403bc94043a607ae1993d5062422/integ-test/src/test/java/org/opensearch/sql/sql/AggregationIT.java#L196-L199

SELECT avg(CAST(time1 AS time)) OVER(PARTITION BY datetime1) from calcs;

(cast required to ensure that you're working with TIME).
To try other types, use

CAST(date0 AS date)            
datetime(CAST(time0 AS STRING))
CAST(time1 AS time)            
CAST(datetime0 AS timestamp)   

Push Down aggregation (metric)

SELECT min(CAST(date0 AS date)) from calcs;

Push Down aggregation (composite_buckets)

SELECT CAST(datetime0 AS timestamp) FROM calcs GROUP BY 1;

Issues Resolved

Fixes #645

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@codecov-commenter
Copy link

codecov-commenter commented Nov 9, 2022

Codecov Report

Merging #1061 (3365a7a) into main (ff9ac16) will increase coverage by 0.01%.
The diff coverage is 100.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##               main    #1061      +/-   ##
============================================
+ Coverage     98.34%   98.36%   +0.01%     
- Complexity     3600     3614      +14     
============================================
  Files           343      343              
  Lines          8908     8973      +65     
  Branches        567      574       +7     
============================================
+ Hits           8761     8826      +65     
  Misses          142      142              
  Partials          5        5              
Flag Coverage Δ
sql-engine 98.36% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../org/opensearch/sql/data/model/ExprValueUtils.java 100.00% <ø> (ø)
...sql/expression/aggregation/AggregatorFunction.java 100.00% <100.00%> (ø)
...arch/sql/expression/aggregation/AvgAggregator.java 100.00% <100.00%> (ø)
...cript/aggregation/ExpressionAggregationScript.java 100.00% <100.00%> (ø)
...ript/aggregation/dsl/AggregationBuilderHelper.java 100.00% <100.00%> (ø)
...ript/aggregation/dsl/BucketAggregationBuilder.java 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

MitchellGale
MitchellGale previously approved these changes Nov 10, 2022
@Yury-Fridlyand
Copy link
Collaborator Author

Please, see fix in 8ff4298.

dai-chen
dai-chen previously approved these changes Nov 28, 2022
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

Minor comment: I just realized that our cast function doesn't accept ExprType, so the switch in AvgAggregator is required. Besides improving cast function, probably one thing we can do now is to pass a cast function to an unique AvgState class. For example, in switch statement, pass DSL::castToTimestamp, DSL::castToDouble accordingly etc. I think the Instant.ofEpochMilli() call in each AvgState should be part of cast function from double to timestmap/date/time etc?

We can do it later if it makes sense and if you want. Thanks!

@Yury-Fridlyand
Copy link
Collaborator Author

Minor comment: I just realized that our cast function doesn't accept ExprType, so the switch in AvgAggregator is required. Besides improving cast function, probably one thing we can do now is to pass a cast function to an unique AvgState class. For example, in switch statement, pass DSL::castToTimestamp, DSL::castToDouble accordingly etc. I think the Instant.ofEpochMilli() call in each AvgState should be part of cast function from double to timestmap/date/time etc?

Just to confirm - do you want to add new cast LONG/DOUBLE to all datetime types? Then to call it in AvgState::result() implementations.

@dai-chen
Copy link
Collaborator

dai-chen commented Dec 1, 2022

Minor comment: I just realized that our cast function doesn't accept ExprType, so the switch in AvgAggregator is required. Besides improving cast function, probably one thing we can do now is to pass a cast function to an unique AvgState class. For example, in switch statement, pass DSL::castToTimestamp, DSL::castToDouble accordingly etc. I think the Instant.ofEpochMilli() call in each AvgState should be part of cast function from double to timestmap/date/time etc?

Just to confirm - do you want to add new cast LONG/DOUBLE to all datetime types? Then to call it in AvgState::result() implementations.

Never mind. Just some thought about if it makes sense to add this logic to AbstractExprNumberValue.timestampValue/timeValue/dateValue/dateTimeValue() method. But in that case we still need a generic CAST function DSL.cast(field, type) anyway if we want to have a single AvgState.

* Add aggregator fixes and some useless unit tests.
* Add `avg` aggregation on datetime types.
* Rework in-memory `AVG`. Fix parsing value returned from the OpenSearch node.

Signed-off-by: Yury-Fridlyand <[email protected]>
@Yury-Fridlyand
Copy link
Collaborator Author

Rebased, please, re-review

dai-chen
dai-chen previously approved these changes Dec 9, 2022
docs/user/dql/aggregations.rst Outdated Show resolved Hide resolved
docs/user/dql/aggregations.rst Outdated Show resolved Hide resolved
docs/user/dql/aggregations.rst Outdated Show resolved Hide resolved
dai-chen
dai-chen previously approved these changes Dec 12, 2022
Yury-Fridlyand and others added 2 commits December 12, 2022 15:34
Co-authored-by: MaxKsyunz <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
}

@Override
protected AvgState iterate(ExprValue value) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to reuse DSL.adddate()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but DateAvgState will be less clear and readable:

  protected static class DateAvgState extends AvgState {

    public DateAvgState() {
      this.count = new ExprIntegerValue(0);
      this.total = new ExprDateValue(LocalDate.EPOCH);
    }

    @Override
    public ExprValue result() {
      if (0 == count.integerValue()) {
        return ExprNullValue.of();
      }

      return DSL.adddate(DSL.literal(new ExprDateValue(LocalDate.EPOCH)), 
          DSL.literal(DSL.divide(DSL.literal(DAYS.between(LocalDate.EPOCH, total.dateValue())),
              DSL.literal(count)).valueOf().longValue())).valueOf();
    }

    @Override
    protected AvgState iterate(ExprValue value) {
      total = DSL.adddate(DSL.literal(total),
          DSL.literal(DAYS.between(LocalDate.EPOCH, value.dateValue())))
          .valueOf();

      return super.iterate(value);
    }
  }

@Yury-Fridlyand Yury-Fridlyand marked this pull request as draft December 19, 2022 21:13
* Revert recent changes in `OpenSearchExprValueFactory`.
* Update `BucketAggregationBuilder` to specify how to interpret datetime values.

Signed-off-by: Yury-Fridlyand <[email protected]>
@Yury-Fridlyand Yury-Fridlyand marked this pull request as ready for review January 9, 2023 18:49
@MaxKsyunz
Copy link
Collaborator

@Yury-Fridlyand it makes sense to switch base branch to main and then use backport to merge it to 2.x branch.

@Yury-Fridlyand Yury-Fridlyand changed the base branch from 2.x to main January 10, 2023 19:56
@@ -64,6 +70,10 @@ private CompositeValuesSourceBuilder<?> buildCompositeValuesSourceBuilder(
.missingBucket(true)
.missingOrder(missingOrder)
.order(sortOrder);
// Time types values are converted to LONG in ExpressionAggregationScript::execute
if (List.of(TIMESTAMP, TIME, DATE, DATETIME).contains(expr.getDelegated().type())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this impact other data type? for example, cast('1' as integer)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because I check whether type is one of TIMESTAMP, TIME, DATE, DATETIME

penghuo
penghuo previously approved these changes Jan 25, 2023
dai-chen
dai-chen previously approved these changes Jan 25, 2023
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@Yury-Fridlyand Yury-Fridlyand dismissed stale reviews from dai-chen and penghuo via 3365a7a January 25, 2023 19:11
@dai-chen dai-chen added bug Something isn't working labels Jan 25, 2023
@Yury-Fridlyand
Copy link
Collaborator Author

Merge conflicts resolved

@dai-chen dai-chen requested review from penghuo and MaxKsyunz January 27, 2023 19:15
@Yury-Fridlyand Yury-Fridlyand merged commit 5220a98 into opensearch-project:main Jan 31, 2023
@Yury-Fridlyand Yury-Fridlyand deleted the integ-datetime-aggreg branch January 31, 2023 21:39
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 31, 2023
* Update aggregation to support datetime types.

Signed-off-by: Yury-Fridlyand <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Co-authored-by: MaxKsyunz <[email protected]>
(cherry picked from commit 5220a98)
dai-chen pushed a commit that referenced this pull request Feb 1, 2023
* Update aggregation to support datetime types.

Signed-off-by: Yury-Fridlyand <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Co-authored-by: MaxKsyunz <[email protected]>
(cherry picked from commit 5220a98)

Co-authored-by: Yury-Fridlyand <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x bug Something isn't working SQL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Aggregate on date function does not work as expected
6 participants