-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for month & year intervals #2797
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2797 +/- ##
==========================================
+ Coverage 85.25% 85.34% +0.09%
==========================================
Files 275 276 +1
Lines 49001 49281 +280
==========================================
+ Hits 41774 42061 +287
+ Misses 7227 7220 -7
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution @avantgardnerio -- this is looking like a great start. I don't fully understand some of the calculations but I may be misunderstanding
cc @ovr
}; | ||
|
||
// Add interval | ||
let posterior = match scalar { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would b great to eventually put some/all of this logic into the arrow-rs kernels -- for example
https://docs.rs/arrow/16.0.0/arrow/compute/kernels/temporal/index.html
or perhaps https://docs.rs/arrow/16.0.0/arrow/compute/kernels/arithmetic/fn.add.html
That would also likely result in support for columnar execution support (aka adding a column of integers)
Maybe we can (I will file a ticket) start with kernels in datafusion and then port them to arrow.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10e7b3d
to
874a5ed
Compare
NaiveDate::from_ymd(target.year(), target.month(), day) | ||
} | ||
|
||
fn chrono_add_months(dt: NaiveDate, delta: i32) -> NaiveDate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think I would like to PR this to chrono when I get a chance, and we can remove it from datafusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 - I like filing a ticket / PR in the target repo and then leaving a link in the comments of DataFusion
Btw I poked around and found this code in arrow2 (from @jorgecarleitao ) that is similar: https://docs.rs/arrow2/latest/src/arrow2/temporal_conversions.rs.html#342-368
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @avantgardnerio I think this PR's code is looking good
The only thing I think it needs before merging are tests covering more of the date/time conversion logic. Once that is done I can write up a "move this to arrow" type ticket as well
NaiveDate::from_ymd(target.year(), target.month(), day) | ||
} | ||
|
||
fn chrono_add_months(dt: NaiveDate, delta: i32) -> NaiveDate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 - I like filing a ticket / PR in the target repo and then leaving a link in the comments of DataFusion
Btw I poked around and found this code in arrow2 (from @jorgecarleitao ) that is similar: https://docs.rs/arrow2/latest/src/arrow2/temporal_conversions.rs.html#342-368
Sorry this stalled out. I was really close on the subqueries, and I think I have them all working now, so I'll jump back on this. |
8348470
to
06dbb21
Compare
Added unit tests. Will file PRs to respective dependency repos as well.
|
I PRed the |
@alamb , do you know where (in
? |
Okay, I think I understand this. Ideally
So that we could add an array of
🤔 |
I think I see how to PR this into the arrow kernels. I'll try that shortly. |
What this is looking like when done inside |
I recommend https://github.com/apache/arrow-rs/blob/master/arrow/src/compute/kernels/temporal.rs |
For what it is worth apache/arrow-rs@master...spaceandtimelabs:arrow-rs:bg_date_math looks pretty good to me 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is looking quite good -- thank you @avantgardnerio . I like the idea of getting the test coverage and then expanding support by adding kernels to arrow.
datafusion/physical-expr/Cargo.toml
Outdated
@@ -44,6 +44,7 @@ arrow = { version = "17.0.0", features = ["prettyprint"] } | |||
blake2 = { version = "^0.10.2", optional = true } | |||
blake3 = { version = "1.0", optional = true } | |||
chrono = { version = "0.4", default-features = false } | |||
chronoutil = "0.2.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am somewhat concerned about adding dependencies (and this particular crate seems to be maintained by a single person) -- however, since it is so small (and MIT licensced), we could also just inline the functions we care about.
Any thoughts @andygrove or @thinkharderdev ?
https://github.com/olliemath/chronoutil/blob/main/src/delta.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed the PR so am not sure how much of that create is used but I agree if it is a small amount of code it is better to just copy the code rather than add another dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I inlined it in my latest push. If we want to go back to including it in cargo.toml, it's an easy revert.
My $0.02 would be to leave it in cargo.toml, so that we can get updates if the author fixes something, and so that RAT checkers can see declaratively that we depend on that code, rather than trying to use heuristics of sematic code or binary diff.
Co-authored-by: Andrew Lamb <[email protected]>
dd100d0
to
8ba1efe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good to go from my perspective. Does anyone else care to review?
cc @ovr this may be interesting / relevant to your project
cc @paddyhoran
}; | ||
|
||
// Add interval | ||
let posterior = match scalar { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
🐱 🚀 |
Which issue does this PR close?
Closes #2796.
Rationale for this change
In order to pass TPC-H benchmarks, datafusion will need to be able to support month & year date intervals.
What changes are included in this PR?
A refactoring of the datetime module to:
IntervalYearMonth
in addition to the currently supportedIntervalDayTime
?
) operator.Are there any user-facing changes?
More queries should work. Nothing that worked previously should break.