-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust, python): Add Run-length Encoding functions #9826
Conversation
py-polars/polars/expr/expr.py
Outdated
""" | ||
return self._from_pyexpr(self._pyexpr.rle()) | ||
|
||
def rleid(self) -> Self: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following our naming convention this should be rle_id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So much extra typing! But ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny violin ;)
I wasn't aware of the actual name until someone let me know in the replies but I believe this will close: #9328 +1 |
pub fn rle_id(s: &Series) -> PolarsResult<Series> { | ||
let (s1, s2) = (s.slice(0, s.len() - 1), s.slice(1, s.len())); | ||
// Run numbers start at zero | ||
Ok(std::iter::once(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can speed this up by not using chains and flattens, but directly extend
into a preallocated Vec
.
This will save a lot of branches and ensures we only allocate once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the iterators would be efficient, but I guess you have to allocate somewhere. And your version is much simpler to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chain iterator is force to branch on both iterators. This branch can block other optimizations. And the iterators lenght becomes unknown, which leads to wrong allocations sizes and memcopys on realloc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, gross. I didn't know that. I guess iterators aren't as "free" as they're made out to be.
Thank you @magarick. I love the functionality and the PR looks good. I have a comment on the implementation of |
s_neq | ||
.into_iter() | ||
.enumerate() | ||
.for_each(|(i, v)| out.push(out[i] + v.unwrap() as u32)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep a latest_value
here? out[i]
requires to keep a register for i
and we do a bound check on out
. The bound check may be elided by the compiler, but if we write it differently we are sure that there is no bound check.
A second optimization is that we can iterate over the arrays in s_neq
by calling downcast_iter
. Then we get BooleanArray
. From those we can directly use the values_iter
. This saves a few branches in the iterator and the unwrap
itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I tried something like that. It now dumps the inequality checks directly into the vector and does a cumsum in place.
It always feels a little awkward to iterate over each chunk but I guess that's what's really underneath so that's what has to be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It always feels a little awkward to iterate over each chunk but I guess that's what's really underneath so that's what has to be done.
Yes, then the code is closer to what we have in memory and has to go through less abstractions between the get_value
from here and write_value
there.
.for_each(|(i, v)| out.push(out[i] + v.unwrap() as u32)); | ||
.downcast_iter() | ||
.for_each(|a| out.extend(a.values_iter().map(|v| v as u32))); | ||
out.iter_mut().fold(0, |a, x| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there. :)
I think we can do this in a single pass. Something like this:
// keep track of last written value
let mut last_value = 0u32;
s_neq
.downcast_iter()
.for_each(|a| {
let iter = a.values_iter();
for v in iter {
let v = v as u32;
out.push(last_value + v );
last_value = v;
}
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I thought I was into micro-optimizations :-)
Ideally we could do everything, even the comparisons, in a single pass and only store what we need but I didn't see a clear way to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we could do everything, even the comparisons, in a single pass and only store what we need but I didn't see a clear way to do that.
Yes, that would require us to go down into the known types with some generics. We could follow up with that. The benefit of this implementation is that it has little compiler bloat.
And I thought I was into micro-optimizations :-)
Haha, I have put a lot of time in making ensuring what we advoncate. A fast dataframe library. I cannot unsee the potential branches, cache misses and allocations ^^
Great functionality to have @magarick and thanks for quickly iterating the perf improvements. 🙌 |
Nice one :) |
Adds
rle
andrle_id
.Closes #9328