-
Notifications
You must be signed in to change notification settings - Fork 809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for adding intervals to dates #2031
Conversation
3093692
to
ef88801
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @avantgardnerio. It would be good to get a review from @alamb as well.
Looks like there is a minor clippy error: https://github.com/apache/arrow-rs/runs/7256871633?check_suite_focus=true |
I had to re-install ubuntu to get clippy to work, but the next iteration should pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @avantgardnerio -- I really like where this is headed.
cc @tustvold @viirya and @HaoYang670
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
// SOFTWARE. | ||
|
||
// Copied from chronoutil crate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why copying the code, instead of using it as dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a conversation in the equivalent arrow-rs
commit: apache/datafusion#2797 (comment)
I would vote to make it a dependency, but am fine either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the contradictory feedback -- I defer to @viirya in terms of using dependency or inlining. Given how little actual code it is, I kind of like the avoidance of a new dependency but I agree this is an opinion rather than anything driven by other considerations.
The reason I personally prefer to avoid dependencies are they are really hard to remove (e.g. #1882) and there are many of these rust crates which seem to have an initial spurt of brilliance and then basically become unmaintained as the authors move on to something else. Though to be honest chrono itself kind of falls into that category too 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Chrono folks re-opened my PR, so I don't think this will be a concern for too long - it looks like they are interested in building a add_months()
function into chrono, we're just debating where and how.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @alamb. The code copied is little so not a strong concern from me. I think we can copy the code now and maybe consider to remove it by adding chrono as dependency if it shows stable maintenance in the future.
Codecov Report
@@ Coverage Diff @@
## master #2031 +/- ##
==========================================
+ Coverage 83.54% 83.61% +0.07%
==========================================
Files 222 223 +1
Lines 58178 58460 +282
==========================================
+ Hits 48604 48884 +280
- Misses 9574 9576 +2
Continue to review full report at Codecov.
|
arrow/src/datatypes/types.rs
Outdated
@@ -191,3 +194,166 @@ impl ArrowTimestampType for TimestampNanosecondType { | |||
TimeUnit::Nanosecond | |||
} | |||
} | |||
|
|||
impl IntervalYearMonthType { | |||
/// Creates a IntervalYearMonthType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, looks like new
returns the native value, not IntervalYearMonthType
type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I struggled a lot with this. It's very confusing the distinction between these. It technically returns a <Date64Type as IntervalYearMonthType>::Native
which is just an i32
. I question where to even put these methods, or if they belong on this impl, or what to call them in the first place. I think the awkwardness comes from arrow not really having a row, so heretofor there was nothing that operated on an individual value, but also thus some bad tests that did not properly parse/populate these, so I think some standard way to convert would be very helpful, hence the addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the functionality for manipulating values on to this impl
makes sense
Perhaps @viirya was referring to the rust convention that Type::new()
returns an instance of Type
-- perhaps if we renamed this method make_value
or something like that it would be less surprising for other rust developers.
We could do it as a follow on PR as well (before releasing arrow 19.x) too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, makes sense to me.
Could anyone help me with reproducing this locally, or understanding what went wrong?
Thanks! |
@avantgardnerio It's a flaky test #1931 that should have now been fixed apache/arrow#13573. I'll rerun the CI job and it should clear |
Looks great, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks great -- thank you @avantgardnerio for figuring out a pattern and starting to implement this arithmetic. ❤️
I left one more comment but I also think this PR could be merged as is.
@viirya or @tustvold would you like to review again prior to merging?
NaiveDate::from_ymd(2000, 1, 1), | ||
)]); | ||
let b = IntervalYearMonthArray::from(vec![IntervalYearMonthType::new(1, 2)]); | ||
let c = add_dyn(&a, &b).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is so great to see add_dyn
used like this ❤️
arrow/src/datatypes/types.rs
Outdated
@@ -191,3 +194,166 @@ impl ArrowTimestampType for TimestampNanosecondType { | |||
TimeUnit::Nanosecond | |||
} | |||
} | |||
|
|||
impl IntervalYearMonthType { | |||
/// Creates a IntervalYearMonthType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the functionality for manipulating values on to this impl
makes sense
Perhaps @viirya was referring to the rust convention that Type::new()
returns an instance of Type
-- perhaps if we renamed this method make_value
or something like that it would be less surprising for other rust developers.
We could do it as a follow on PR as well (before releasing arrow 19.x) too
I think that is both valid, and non-trivial. I've renamed and pushed 🙂 |
Benchmark runs are scheduled for baseline = 9d8f0c9 and contender = cb7e5b0. cb7e5b0 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
🎉 |
@avantgardnerio I wonder if you are tracking the "add support for adding interval columns to dates/timestamps" in datafusion somewhere? I ask because apache/datafusion#3110 by @JasonLi-cn is starting to add support for timestamps |
I was not, but am now, ty! |
Can drop this after rebase on commit cb7e5b0 "Add support for adding intervals to dates (apache#2031)", first released in 19.0.0
Can drop this after rebase on commit cb7e5b0 "Add support for adding intervals to dates (apache#2031)", first released in 19.0.0
Can drop this after rebase on commit cb7e5b0 "Add support for adding intervals to dates (apache#2031)", first released in 19.0.0
Re #527.
Rationale for this change
Support for adding scalar intervals to scalar dates was added in datafusion, but in order to support adding columns to columns, we need to move that logic down into an arrow kernel.
What changes are included in this PR?
add_dyn
to allow it to accept lambda functions with two differing parameter typesAre there any user-facing changes?
Adding intervals to dates should work now.