Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust, python): Add Run-length Encoding functions #9826

Merged
merged 9 commits into from
Jul 12, 2023

Conversation

magarick
Copy link
Contributor

@magarick magarick commented Jul 12, 2023

Adds rle and rle_id.
Closes #9328

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jul 12, 2023
"""
return self._from_pyexpr(self._pyexpr.rle())

def rleid(self) -> Self:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following our naming convention this should be rle_id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much extra typing! But ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny violin ;)

@cmdlineluser
Copy link
Contributor

I wasn't aware of the actual name until someone let me know in the replies but I believe this will close: #9328

+1

pub fn rle_id(s: &Series) -> PolarsResult<Series> {
let (s1, s2) = (s.slice(0, s.len() - 1), s.slice(1, s.len()));
// Run numbers start at zero
Ok(std::iter::once(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can speed this up by not using chains and flattens, but directly extend into a preallocated Vec.

This will save a lot of branches and ensures we only allocate once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the iterators would be efficient, but I guess you have to allocate somewhere. And your version is much simpler to read.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chain iterator is force to branch on both iterators. This branch can block other optimizations. And the iterators lenght becomes unknown, which leads to wrong allocations sizes and memcopys on realloc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, gross. I didn't know that. I guess iterators aren't as "free" as they're made out to be.

@ritchie46
Copy link
Member

Thank you @magarick. I love the functionality and the PR looks good. I have a comment on the implementation of rle_id. That can be faster.

s_neq
.into_iter()
.enumerate()
.for_each(|(i, v)| out.push(out[i] + v.unwrap() as u32));
Copy link
Member

@ritchie46 ritchie46 Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep a latest_value here? out[i] requires to keep a register for i and we do a bound check on out. The bound check may be elided by the compiler, but if we write it differently we are sure that there is no bound check.

A second optimization is that we can iterate over the arrays in s_neq by calling downcast_iter. Then we get BooleanArray. From those we can directly use the values_iter. This saves a few branches in the iterator and the unwrap itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I tried something like that. It now dumps the inequality checks directly into the vector and does a cumsum in place.
It always feels a little awkward to iterate over each chunk but I guess that's what's really underneath so that's what has to be done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It always feels a little awkward to iterate over each chunk but I guess that's what's really underneath so that's what has to be done.

Yes, then the code is closer to what we have in memory and has to go through less abstractions between the get_value from here and write_value there.

.for_each(|(i, v)| out.push(out[i] + v.unwrap() as u32));
.downcast_iter()
.for_each(|a| out.extend(a.values_iter().map(|v| v as u32)));
out.iter_mut().fold(0, |a, x| {
Copy link
Member

@ritchie46 ritchie46 Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there. :)

I think we can do this in a single pass. Something like this:

 
   // keep track of last written value
   let mut last_value = 0u32;

   s_neq
        .downcast_iter()
        .for_each(|a| {
            let iter = a.values_iter();

           for v in iter {
              let v = v as u32;
              out.push(last_value + v );
              last_value = v;
           }
});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I thought I was into micro-optimizations :-)
Ideally we could do everything, even the comparisons, in a single pass and only store what we need but I didn't see a clear way to do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we could do everything, even the comparisons, in a single pass and only store what we need but I didn't see a clear way to do that.

Yes, that would require us to go down into the known types with some generics. We could follow up with that. The benefit of this implementation is that it has little compiler bloat.

And I thought I was into micro-optimizations :-)

Haha, I have put a lot of time in making ensuring what we advoncate. A fast dataframe library. I cannot unsee the potential branches, cache misses and allocations ^^

@ritchie46
Copy link
Member

Great functionality to have @magarick and thanks for quickly iterating the perf improvements. 🙌

@ritchie46 ritchie46 merged commit b87ff01 into pola-rs:main Jul 12, 2023
@magarick magarick deleted the rle branch July 12, 2023 20:28
@alexander-beedie
Copy link
Collaborator

Nice one :)

c-peters pushed a commit to c-peters/polars that referenced this pull request Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pl.Expr.streaks() (simplify identifying consecutive same value sequences)
4 participants