Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: rolling does not accept MultiIndex name #38877

Closed
2 of 3 tasks
metazoic opened this issue Jan 1, 2021 · 3 comments
Closed
2 of 3 tasks

BUG: rolling does not accept MultiIndex name #38877

metazoic opened this issue Jan 1, 2021 · 3 comments
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@metazoic
Copy link

metazoic commented Jan 1, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

If

df = pandas.DataFrame()
df['date'] = pandas.date_range('2021-1-1', '2021-1-5', freq='h')
df.loc[df.date <= '2021-1-3', 'category'] = 'A'
df.loc[df.date > '2021-1-3', 'category'] = 'B'
df.set_index(['category','date'], inplace=True)
df['value'] = 1

then

df.groupby('category').rolling('D', on='date')

breaks with

ValueError: invalid on specified as date, must be a column (of DataFrame), an Index or None

Problem description

The documentation states that the on argument can be a MultiIndex level (and then confusingly adds "rather than the DataFrame's index").

Expected Output

rolling should accept a MultiIndex level. Interestingly,

df.groupby('category').resample('D', level='date')

works, but it would be nice to unify the interfaces: resample accepts level and on for index and column respectively, while rolling accepts on for both.

@metazoic metazoic added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 1, 2021
@venaturum
Copy link
Contributor

I don't think the groupby is needed to illustrate the bug here

df.rolling('D', on='date')

fails with the same error

@venaturum
Copy link
Contributor

If we add

elif self.on in self.obj.index.names:
    self._on = self.obj.index.get_level_values(self.on)

to BaseWindow.init on line 127 in pandas/core/window/rolling.py then the following works

>>> df.groupby('category').rolling('D', on='date')
RollingGroupby [window=D,min_periods=1,center=False,axis=0,on=date,method=single]

Continuing the method chain with something like .sum gives us:

category category date
A        A        2021-01-01 00:00:00    1.0
                  2021-01-01 01:00:00    2.0
                  2021-01-01 02:00:00    3.0
                  2021-01-01 03:00:00    4.0
                  2021-01-01 04:00:00    5.0
...                                      ...
B        B        2021-01-04 20:00:00   24.0
                  2021-01-04 21:00:00   24.0
                  2021-01-04 22:00:00   24.0
                  2021-01-04 23:00:00   24.0
                  2021-01-05 00:00:00   24.0

[97 rows x 1 columns]

I thought the repeated category in the index might have been buggy but from discussions in #38737 maybe it is by design.

Happy to take this one, make changes and write tests, if I'm on the right track, but perhaps there needs to be a decision around whether the rolling should use a "level" parameter for index names (to be consistent with resample) as @metazoic suggested

@mroeschke
Copy link
Member

Thanks for the report @metazoic. We have a similar request in #34642, so I'm going to consolidate the discussion to that issue and close this one. Happy to have a pull request implementing this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

3 participants