Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/BUG: center=True in expanding operations returns non-sensical results #20647

Closed
jorisvandenbossche opened this issue Apr 10, 2018 · 11 comments · Fixed by #34887
Closed

API/BUG: center=True in expanding operations returns non-sensical results #20647

jorisvandenbossche opened this issue Apr 10, 2018 · 11 comments · Fixed by #34887
Assignees
Labels
Bug good first issue Window rolling, ewma, expanding
Milestone

Comments

@jorisvandenbossche
Copy link
Member

What is center=True supposed to do in df.expanding()... operations?

Using the example from the docstring of .expanding():

In [103]: df = DataFrame({'B': [0, 1, 2, np.nan, 4]})

In [105]: df.expanding(2).sum()
Out[105]: 
     B
0  NaN
1  1.0
2  3.0
3  3.0
4  7.0

Now adding center=True:

In [106]: df.expanding(2, center=True).sum()
Out[106]: 
     B
0  3.0
1  3.0
2  7.0
3  7.0
4  6.0

Two observations: it seems it just shifted the first rows (which is not the same as centering) and I don't know where the last values are coming from (and they should certainly not decrease again in an expanding sum with only positive values).

Further, I also cannot think of what it actually should do. If we really want to center it, that would mean returning with a new index (as the real center would only increase with 0.5 and be something like [0, 0.5, 1, 1.5, 2] for those data).

So maybe we should rather remove this keyword for expanding ?

@jorisvandenbossche
Copy link
Member Author

Hmm, I see that we in the past removed the center keyword in the pd.expanding_ functions, exactly because it did not make sense (#7934). But it seems with the refactor to the expanding() method, it was introduced again

@jorisvandenbossche jorisvandenbossche added the Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff label Apr 10, 2018
@jreback
Copy link
Contributor

jreback commented Apr 10, 2018

yes center doesn’t make sense in context of expanding

@jorisvandenbossche jorisvandenbossche added this to the Next Major Release milestone Apr 11, 2018
@pulkitmaloo
Copy link
Contributor

I'd like to work on this

@gfyoung
Copy link
Member

gfyoung commented Apr 11, 2018

Go for it! No need to ask for permission unless someone else is working on it.

stillmatic added a commit to stillmatic/pandas that referenced this issue Apr 22, 2018
@dragosthealex
Copy link
Contributor

Is anyone still working on this?

@jorisvandenbossche
Copy link
Member Author

I don't think so (unless @stillmatic did work on it? If you do, best to say so on this issue), so feel free to take a look at it!

@stillmatic
Copy link

i started working on this issue but don't think my solution is correct/finished. Haven't had time to finish (there are many dependencies). If you have a good idea go ahead, otherwise I might be able to find time and get this done

@dragosthealex
Copy link
Contributor

@stillmatic in the commit you removed center parameter from rolling method as well, which I believe introduced most broken dependencies. But I believe in that one, centering makes sense.

@jorisvandenbossche I believe in this example center should do nothing if it's supposed to be consistent with functionality of rolling.
As far as I understand, when center=False, the result of the operation applied on the window is placed at the index of last element in window. e.g. df = DataFrame({'B': [0, 1, 2, 3, 4, 5]})
df.rolling(3).sum() gives [NaN, NaN, 3.0, 6.0, 9.0, 12.0]

When setting center=True, the result is placed at the index of element in center of the window (index in window / 2), effectively shifting the results back.
df.rolling(3, center=True).sum() gives [NaN, 3.0, 6.0, 9.0, 12.0, NaN]

df.rolling(2).sum() gives same result as df.rolling(2, center=True).sum(), because the center of window of len=2, is index=1 -> second element in window, which is also the last. Thus, df.expanding(2).sum()should give same result asdf.expanding(2, center=True).sum()`, so there must be a bug somewhere.
Please let me know if you think I got it wrong.

In terms of whether it's actually useful for expanding I'm not sure as I am not really familiar, but it should have the same effect as in rolling e.g.
df.expanding(3).sum() gives [NaN, NaN, 3.0, 6.0, 10.0, 15.0]
And df.expanding(3, center=True).sum() should give [NaN, 3.0, 6.0, 10.0, 15.0, NaN] right?

@sumanau7
Copy link
Contributor

take

@mroeschke mroeschke added Bug Window rolling, ewma, expanding and removed Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels May 13, 2020
@MBrouns
Copy link
Contributor

MBrouns commented Jun 20, 2020

It seems like center is indeed a weird option for an expanding window. I just ran the test suite and there are no tests breaking when I remove the center option from expanding (except for one that checks whether there's validation on the center parameter). Also, a code search on GitHub doesn't reveal any instances where center is used in conjuction with expanding.

Combined with the current output simply being wrong, I think that means we can move forward with deprecating center. I'll submit a PR.

@MBrouns
Copy link
Contributor

MBrouns commented Jun 20, 2020

take

@jreback jreback modified the milestones: Contributions Welcome, 1.1 Jul 14, 2020
jukka pushed a commit to jukka/pyActigraphy that referenced this issue Jan 7, 2025
The pd.date_range() argument "closed" was deprecated in Pandas 1.4 with "inclusive" as the preferred replacement. See pandas-dev/pandas#40245.

The expanding() argument "center" was deprecated in Pandas 1.1 as "non-sensical" with no replacement. See pandas-dev/pandas#20647. As discussed in that issue, I believe the use of center=True is a bug in the Crespo implementation, and removing it is the right thing to do.

Both of these deprecated arguments were removed in Pandas 2.0, so this change is needed to make Crespo work with recent Pandas versions.

To ensure no breakage because of this change, I'm adding a pandas>=1.4.0 dependency version constraint to ensure the presence of the "inclusive" argument to pd.date_range().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug good first issue Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants