Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/support timestamp plus minus interval #3110

Merged

Conversation

JasonLi-cn
Copy link
Contributor

@JasonLi-cn JasonLi-cn commented Aug 12, 2022

Which issue does this PR close?

Closes #3103 (comment)

Rationale for this change

Support Timestamp +/- Interval

What changes are included in this PR?

Update DateIntervalExpr‘ evaluate() function mainly:
https://github.com/apache/arrow-datafusion/blob/b80c85359bdfc6d483cb2f222a988a9125ed8670/datafusion/physical-expr/src/expressions/datetime.rs#L88-L160

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions labels Aug 12, 2022
@JasonLi-cn JasonLi-cn closed this Aug 12, 2022
@JasonLi-cn JasonLi-cn reopened this Aug 12, 2022
@codecov-commenter
Copy link

Codecov Report

Merging #3110 (51888a1) into master (9956f80) will decrease coverage by 0.01%.
The diff coverage is 74.07%.

@@            Coverage Diff             @@
##           master    #3110      +/-   ##
==========================================
- Coverage   85.95%   85.93%   -0.02%     
==========================================
  Files         290      290              
  Lines       52260    52356      +96     
==========================================
+ Hits        44919    44993      +74     
- Misses       7341     7363      +22     
Impacted Files Coverage Δ
datafusion/physical-expr/src/planner.rs 92.36% <ø> (ø)
...tafusion/physical-expr/src/expressions/datetime.rs 82.20% <73.33%> (-4.60%) ⬇️
datafusion/expr/src/binary_rule.rs 84.70% <100.00%> (+0.08%) ⬆️
datafusion/optimizer/src/eliminate_limit.rs 100.00% <0.00%> (ø)
...atafusion/physical-expr/src/aggregate/array_agg.rs 95.91% <0.00%> (+0.04%) ⬆️
datafusion/expr/src/logical_plan/plan.rs 77.60% <0.00%> (+0.17%) ⬆️
datafusion/core/src/physical_plan/metrics/value.rs 87.43% <0.00%> (+0.50%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JasonLi-cn - this is a great contribution

The only thing I would like to see are some "integration" level SQL tests of this feature -- perhaps we could extend the test / add some new tests in https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/tests/sql/timestamp.rs

cc @avantgardnerio who added the initial version of this functionality in #2797

@github-actions github-actions bot added the core Core DataFusion crate label Aug 15, 2022
Copy link
Contributor

@avantgardnerio avantgardnerio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits. The functional change looks great, ty!

datafusion/core/tests/sql/timestamp.rs Show resolved Hide resolved
) -> Result<ColumnarValue> {
let ret = match array.data_type() {
DataType::Date32 => {
let array = array.as_any().downcast_ref::<Date32Array>().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this shouldn't panic, but since this is returning a result anyway, would it improve auditability to use the error propagation operator here instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this?

let array = array.as_any().downcast_ref::<Date32Array>().ok_or(
    DataFusionError::Execution("Cast Date32Array error".to_string()),
)?;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit more verbose, but I think it makes it easier to tell that it will never panic. @andygrove @alamb do you have thoughts on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think by this point in the plan it is ok to panic! as it indicates a bug in the code and I think having to always do the ceremony to check the type makes it less readable. Maybe we could make a function like as_date32_array that returned an error (similar to the arrow ones that panic?): https://docs.rs/arrow/20.0.0/arrow/array/index.html#functions ?

Sadly it appears we are inconsistent in the codebase with this regards: https://github.com/apache/arrow-datafusion/search?l=Rust&q=downcast 😢

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in other words, I think this PR is ok as written.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will file a ticket as this is confusing and it would be nice to get DataFusion consistent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #3152

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great @JasonLi-cn so I plan to merge it. There are some good potential follow ons (like upstreaming the date/time arithmetic to arrow as well as standardizing the handling of "what to do when downcasting and array fails") but I think they would be best done in a follow on PR.

datafusion/core/tests/sql/timestamp.rs Show resolved Hide resolved
datafusion/core/tests/sql/timestamp.rs Show resolved Hide resolved
) -> Result<ColumnarValue> {
let ret = match array.data_type() {
DataType::Date32 => {
let array = array.as_any().downcast_ref::<Date32Array>().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think by this point in the plan it is ok to panic! as it indicates a bug in the code and I think having to always do the ceremony to check the type makes it less readable. Maybe we could make a function like as_date32_array that returned an error (similar to the arrow ones that panic?): https://docs.rs/arrow/20.0.0/arrow/array/index.html#functions ?

Sadly it appears we are inconsistent in the codebase with this regards: https://github.com/apache/arrow-datafusion/search?l=Rust&q=downcast 😢

) -> Result<ColumnarValue> {
let ret = match array.data_type() {
DataType::Date32 => {
let array = array.as_any().downcast_ref::<Date32Array>().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in other words, I think this PR is ok as written.

@alamb
Copy link
Contributor

alamb commented Aug 15, 2022

I plan to merge this PR in when the checks pass

@ursabot
Copy link

ursabot commented Aug 15, 2022

Benchmark runs are scheduled for baseline = 3666305 and contender = 6509d0d. 6509d0d is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@JasonLi-cn JasonLi-cn deleted the feature/support_timestamp_plus_minus_interval branch August 16, 2022 02:06
@JasonLi-cn JasonLi-cn restored the feature/support_timestamp_plus_minus_interval branch August 16, 2022 03:56
@JasonLi-cn JasonLi-cn deleted the feature/support_timestamp_plus_minus_interval branch August 16, 2022 04:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: support Timestamp +/- Interval
5 participants