-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(rust): Arithmetics refactor and cleanup (part 1) #7537
Conversation
c854f06
to
ca2bf80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This touches a lot of old code that indeed could use some refactoring/cleaning up.
One important question: You did not change any semantics right? Especially the owned
arithmetic is an optimization I don't want to break and only works if the ref count of Arc<ChunkedArray>
is 1.
@@ -246,7 +246,7 @@ fn pivot_impl( | |||
debug_assert_eq!(value_agg_phys.len(), row_locations.len()); | |||
|
|||
let mut cols = if value_agg_phys.dtype().is_numeric() { | |||
macro_rules! dispatch { | |||
macro_rules! pivot { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should call this pivot
as we don't pivot here. dispatch
is clearer to me that we simply have a utility macro to dispatch the inner function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. This is purely for error reporting, it will include name of the macro if it fails for whatever reason. I can change it back to dispatch
or pivot_dispatch
or any better name if you have any ideas.
Btw (1): here's another case when you check is_numeric()
. Is decimal numeric or not? :) For pivoting purposes - sure! For many other purposes, no... 🤦
Btw (2), it just occurred to me, as this is yet another point where you check if something is numeric in point 1 but use this implicit assumption in point 2, wouldn't it be cleaner to instead have
fn as_numeric(&self) -> Option<NumericChunked<'_>> {
match self.dtype() { ... }
}
enum NumericChunked<'a> {
UInt8(&'a UInt8Chunked),
...
}
Yet another case where we can potentially dodge implicit assumptions and unwraps by using type system? Then errors simply can't occur and no unwraps are needed because you can do
if let Some(ca) = agg_phys.as_numeric() {
... // do stuff, it can't fail now
}
let rhs = rhs.struct_().unwrap(); | ||
let s_fields = s.fields(); | ||
) -> PolarsResult<Series> { | ||
let lhs = lhs.struct_().unwrap_or_else(|_| unreachable!()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the benefit of this over unwrap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was a marker for myself while reading this, basically saying this is only called with struct dtypes.
This is one of my biggest gripes with polars codebase 🙂 When staring at a particular piece of code you have no idea what assumptions are currently being used here, you see a bunch of unwraps and have no idea which of them CAN potentially trigger and which cannot ever happen, which are unhandled errors and which are completely unreachable. You have to scan all usages of a particular thing trying to infer it.
I think a lot of these cases you can convert to traits/errors or just restructure the code. E.g.:
- This function should probably take
&StructChunked, &StructChunked
and then no unwraps are needed here - And upstream instead of matching
(lhs.dtype(), rhs.dtype()) = (Struct(_), Struct(_))
you could instead do(lhs.struct_(), rhs.struct_()) = (Some(_), Some(_))
, so no unwraps are needed again. No implicit assumptions either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Maybe just put an unwrap
with a comment why we can unwrap or expect("impl error")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of my biggest gripes with polars codebase
I agree we can do better here. All unwraps should be considered asserts, but it isn't always the case yet.
I don't think I've changed any semantics there. As you can see, all of the owned arithmetic logic. Here's a gist of recursive expansion of A few potential changes:
Add for &ChunkedArray<T>
Add<Self> for &ChunkedArray<T>
Add<&'a ChunkedArray<T>> for &'a ChunkedArray<T> which is basically requiring matching lifetimes on LHS and RHS of all arithmetic ops if you're dealing with references. This is overly restrictive and a better way is to explicitly defined
|
Could you point to the snippet you are referring? |
I don't think this is the case. This compiles on master. I believe the longer living lifetime can always match the shortest one. fn try_different_lifetimes<'a>(a: &'a Int32Chunked, b: &'static Int32Chunked) -> Int32Chunked {
a + b
}
fn try_different_lifetimes2<'a, 'b>(a: &'a Int32Chunked, b: &'b Int32Chunked) -> Int32Chunked {
a + b
}
fn try_different_lifetimes3<'a>(a: &'a Int32Chunked) -> Int32Chunked {
// this lifteme is always shorter than 'a as it is local
let ca = &Int32Chunked::full_null("", 1);
ca + a
} |
I think this was a remnant of earlier testing. This can be removed indeed. 👍 |
Any updates @aldanor? I think there is some great stuff in here. |
@ritchie46 apologies, was busy with work, will get back re: this PR tomorrow to address the comments and rebase |
c9742a7
to
c548a05
Compare
Ok. My fault for not taking enough effort to get this merged :/ (although it was 99% there I thi) Now that #9123 has been merged, I'm not entirely sure it's rebase-able... However, if there's still interest in pushing it through (I think it does clean up the whole array-arithmetics mess by quite a bit), I can try a rebase locally to see if it's possible at all. |
Me neither. :/ I recall cleaning up a bit on the arithmetics as well. |
@stinodego > As you requested to be pinged. |
I think it's pretty sad it wasn't merged in back in the day - because it did clean up tons of inconsistencies and weird points in the codebase (also simplifying NumDispatch etc, unifying fallible ops, unifying downcasting, converting some unwraps to proper errors etc). This was a preliminary step to improving arithmetics on decimals, so there's that. But - I don't think any of the above has been fully resolved (maybe a few things here and there), perhaps it would make sense to break this apart into more sizeable PRs and resubmit them, idk. Rebasing would be pretty tough indeed due to some refactoring that happened inbetween... (I'll double check a few things in the coming days, re: the current state of the codebase) |
I don't think there's much use in keeping this PR open any longer. I am not sure what the current state of this part of the code is, but if you feel like (some of) the improvements made here would still be applicable, you're very welcome to open a new PR! Indeed it might help to split it up in smaller parts if at all possible. |
This is a preliminary refactor before starting to integrate decimal arithmetics in.
Currently, there's the
PolarsNumericType
which (unfortunately) plays way too many roles – from question like "is it a simple numerical thing we can just copy" to assumptions like "if it's a 'numeric' type, then we can just add and subtract the underlying physical values". This logic breaks for decimals, becausei128
is a physical type, howeverPrimitiveArray<i128>
does not use native arithmetics (moreover, we'd want to reimplement those arithmetics ops for it so as to allow some flexibility in maintaining scale/precision etc). So some rearrangement is unavoidable, and I figured I should push what I currently have in progress instead of keeping it so as not to suffer from constant rebasing.A few changes:
NumOpsDispatch
/NumOpsDispatchInner
are removed completelyArrayArithmetics
has been removed (will implement respective traits for decimals manually at a later time)NativeArithmetics
bound being used now for native arithmetics ops (may need to add our own 1:1 trait when implementing decimals due to potential shadowing problems)Try<...>
traits to unify fallible opsPolarsFloatNative
, this helps simplify trait bound copy/paste$op_scalar()
methods from arrow2 for native arithmetics (wasn't the case previously)env::var("ASSIGN")
check which seemed to have been something temporary and has been removedSeriesTrait
toseries_$op()
for clarityChunkedArray::with_name()
helperpolars-error
use$crate::
now