BUG: rolling does not accept MultiIndex name #38877

metazoic · 2021-01-01T09:50:55Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

If

df = pandas.DataFrame()
df['date'] = pandas.date_range('2021-1-1', '2021-1-5', freq='h')
df.loc[df.date <= '2021-1-3', 'category'] = 'A'
df.loc[df.date > '2021-1-3', 'category'] = 'B'
df.set_index(['category','date'], inplace=True)
df['value'] = 1

then

df.groupby('category').rolling('D', on='date')

breaks with

ValueError: invalid on specified as date, must be a column (of DataFrame), an Index or None

Problem description

The documentation states that the on argument can be a MultiIndex level (and then confusingly adds "rather than the DataFrame's index").

Expected Output

rolling should accept a MultiIndex level. Interestingly,

df.groupby('category').resample('D', level='date')

works, but it would be nice to unify the interfaces: resample accepts level and on for index and column respectively, while rolling accepts on for both.

The text was updated successfully, but these errors were encountered:

venaturum · 2021-01-01T10:52:16Z

I don't think the groupby is needed to illustrate the bug here

df.rolling('D', on='date')

fails with the same error

venaturum · 2021-01-01T12:18:23Z

If we add

elif self.on in self.obj.index.names:
    self._on = self.obj.index.get_level_values(self.on)

to BaseWindow.init on line 127 in pandas/core/window/rolling.py then the following works

>>> df.groupby('category').rolling('D', on='date')
RollingGroupby [window=D,min_periods=1,center=False,axis=0,on=date,method=single]

Continuing the method chain with something like .sum gives us:

category category date
A        A        2021-01-01 00:00:00    1.0
                  2021-01-01 01:00:00    2.0
                  2021-01-01 02:00:00    3.0
                  2021-01-01 03:00:00    4.0
                  2021-01-01 04:00:00    5.0
...                                      ...
B        B        2021-01-04 20:00:00   24.0
                  2021-01-04 21:00:00   24.0
                  2021-01-04 22:00:00   24.0
                  2021-01-04 23:00:00   24.0
                  2021-01-05 00:00:00   24.0

[97 rows x 1 columns]

I thought the repeated category in the index might have been buggy but from discussions in #38737 maybe it is by design.

Happy to take this one, make changes and write tests, if I'm on the right track, but perhaps there needs to be a decision around whether the rolling should use a "level" parameter for index names (to be consistent with resample) as @metazoic suggested

mroeschke · 2021-01-02T02:21:33Z

Thanks for the report @metazoic. We have a similar request in #34642, so I'm going to consolidate the discussion to that issue and close this one. Happy to have a pull request implementing this feature.

metazoic added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 1, 2021

mroeschke closed this as completed Jan 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: rolling does not accept MultiIndex name #38877

BUG: rolling does not accept MultiIndex name #38877

metazoic commented Jan 1, 2021

venaturum commented Jan 1, 2021

venaturum commented Jan 1, 2021

mroeschke commented Jan 2, 2021

BUG: rolling does not accept MultiIndex name #38877

BUG: rolling does not accept MultiIndex name #38877

Comments

metazoic commented Jan 1, 2021

Code Sample, a copy-pastable example

Problem description

Expected Output

venaturum commented Jan 1, 2021

venaturum commented Jan 1, 2021

mroeschke commented Jan 2, 2021