Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement rolling_product #9954

Open
stinodego opened this issue Jul 18, 2023 · 8 comments
Open

Implement rolling_product #9954

stinodego opened this issue Jul 18, 2023 · 8 comments
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature

Comments

@stinodego
Copy link
Contributor

Problem description

As mentioned in #8437

This functionality is currently missing. Would be nice to have, seeing as we have lots of other rolling functionality.

@stinodego stinodego added enhancement New feature or an improvement of an existing feature accepted Ready for implementation labels Jul 18, 2023
@github-project-automation github-project-automation bot moved this to Ready in Backlog Jul 18, 2023
@mcrumiller
Copy link
Contributor

mcrumiller commented Jul 18, 2023

@stinodego I can take a shot at this, since rolling_sum is very similar so I have a pretty simple reference.

One thing that I'm wondering about is the center parameter. If the window size is even, what happens here? In my little example:

pl.Series([1, 2, 3, 4, 5, 6]).rolling_sum(window_size=4, min_periods=1,center=True)
shape: (6,)
Series: '' [i64]
[
        3
        6
        10
        14
        18
        15
]

I can try to break this down. An X indicates where the computed sum is placed, and the |---| indicates the window used.

Series:  1     2     3     4     5     6
3        X-----|
6        |-----X-----|
10       |-----------X-----|
14             |-----------X-----|
18                   |-----------X-----|
15                         |-----------X

I think the rule here is a little simple: If we have an even number of bins, wait until we reach one of the middle two bins, then start to build the output array. Stop when our output array is the same size as the input array.

@magarick
Copy link
Contributor

  1. Windows: All of the rolling functions share window generation code. Changing that behavior would be separate. Do you think the way it's done now is no good? Looks like all it does is round up the index when the length is even and truncate intervals on the left and right.
  2. Product: A version of log-sum-exp that keeps track of signs and zeros might be the way to go. Otherwise it's easy to run into issues with over and underflow. And updating the window would require division otherwise, which can be slower than other operations (I think).
  3. Is there a way to consolidate work for rolling operations in one place? I've done a good bit of work on them recently but there's a long way to go. It would be nice if we could organize initiatives to work on sets of related features.

@mcrumiller
Copy link
Contributor

@magarick I have an almost-implementation ready, but yes I'm running into your second point about updating the window. The rolling sum can easily subtract values when we go past the edge, but you can't undo a multiplication by zero, so we need some form of memory, or better edge detection. To be honest, I'm not entirely comfortable what the "if we exceed the end, we have a completely new window so we recompute" part does or when it's called.

I was thinking of looking into ways to consolidate, but in this case where we're just adding rolling product, it didn't seem worth it to spend the extra energy. I can't think of other simple arithmetic rollers (rolling exponent is absurd, rolling division doesn't really make sense) so this doesn't seem like opening the door to other rolling functions that aren't significantly more complex.

@mcrumiller
Copy link
Contributor

FYI I just found where you reference #9277 (comment) so I'm reading that.

@magarick
Copy link
Contributor

Oh, for consolidate I just meant some way of tracking. It wasn't a suggestion for doing this any differently. But I can imagine simple things like rolling geometric means or application of smoothing kernels, as well as more complex, like refactoring to produce multiple quantiles at once and avoid wasting work.

Hopefully that documentation is helpful. I probably need to update it to include the new version that accounts for sorted runs. I think you can use a similar technique for the product and keep track of the most recent zero in the series. If that's in your window you know you don't need to do any work. But even if you're going between two zero-free windows, I think there's some trouble with multiplying and dividing. It might be better to keep track of the signs separately and do log-sum-exp on the absolute values.

@mcrumiller
Copy link
Contributor

Thanks @magarick. With @SeanTroyUWO's help I managed to set up debugging/breakpointing rust called from python and it's helping a lot. I see with update for rolling add (which I think you implemented?), if windows are overlapping you simply subtract the first value and then add the new ones. I now get what's going on.

Let me think about the log-sum-exp. While mathematically identcal, doing these shenanigans with floating points can muck things up in some cases.

@magarick
Copy link
Contributor

I didn't implement rolling sum, but I've touched the code.
Yes, that's the general rule for rolling. If you want it to be fast, you can't duplicate work.
With products, I think we're damned-if-we-do-damned-if-we-don't. Log-sum-exp has floating point weirdness. Products have over and underflows :-( Pick your poison I guess (or use a heuristic to guess which one is better for a given series?)

I would like to learn more about this way of debugging too!

@mcrumiller
Copy link
Contributor

Re: the debugging, I'm going to write up a guide. I'm attempting to streamline it at the moment, as right now it involves a couple steps/keyboard strokes and it would be nice to get it into one single config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature
Projects
Status: Ready
Development

No branches or pull requests

3 participants