ENH: Support rolling over one level of a MultiIndex #34642

miccoli · 2020-06-08T12:23:27Z

I have searched the [pandas] tag on StackOverflow for similar questions.
I have asked my usage related question on StackOverflow.

I had a hard time understanding how df.rolling works when df is indexed by a MultiIndex

This is an example data frame:

import pandas as pd
idx = pd.MultiIndex.from_product(
    [pd.date_range("2020-01-01", "2020-1-10"), ["a", "b"]], names=["date", "obs"],
)
df = pd.DataFrame(index=idx)
df['c1'] = range(len(df))

print(df)

which outputs

                c1
date       obs    
2020-01-01 a     0
           b     1
2020-01-02 a     2
           b     3
2020-01-03 a     4
           b     5
2020-01-04 a     6
           b     7
2020-01-05 a     8
           b     9
2020-01-06 a    10
           b    11
2020-01-07 a    12
           b    13
2020-01-08 a    14
           b    15
2020-01-09 a    16
           b    17
2020-01-10 a    18
           b    19

Now I want to apply a rolling window on the date level, keeping the obs level separate.

I tried with no success obvious and simple (least surprise) commands like

df.rolling("7d", index="date") or
df.rolling("7d", on="date")

but finally the desired result is obtained by

df_r = df.groupby(by="obs", group_keys=False).rolling(
    "7d", on=df.index.levels[0]
).mean().sort_index()

print(df_r)

which gives me the correct result:

                  c1
date       obs      
2020-01-01 a     0.0
           b     1.0
2020-01-02 a     1.0
           b     2.0
2020-01-03 a     2.0
           b     3.0
2020-01-04 a     3.0
           b     4.0
2020-01-05 a     4.0
           b     5.0
2020-01-06 a     5.0
           b     6.0
2020-01-07 a     6.0
           b     7.0
2020-01-08 a     8.0
           b     9.0
2020-01-09 a    10.0
           b    11.0
2020-01-10 a    12.0
           b    13.0

It seams to me that this should be a quite common situation, so I was wondering if there is a simpler way to obtain the same results. By the way my solution is not very robust, because there are hidden assumptions on how the objects returned by groupby are indexed, which do not necessarily hold for a generic data frame.

Moreover the doc of the on parameter in rolling was almost incomprehensible to me: I'm still wondering if my usage rolling( "7d", on=df.index.levels[0]) is the intended one or not.

The text was updated successfully, but these errors were encountered:

daskol · 2020-06-08T15:29:31Z

Probably, there is a related issue #34617. By the ways, if data frame is indexed with integer index then one can use level keyword of .rolling(). This example is not consistent and just shows API features.

import pandas as pd
idx = pd.MultiIndex.from_product(
    [range(7), ["a", "b"]], names=["date", "obs"],
)
df = pd.DataFrame(index=idx)
df['c1'] = range(len(df))

print(df)

df_r = df.groupby(by="obs", group_keys=False).rolling(
    7, level=1
).mean().sort_index()

print(df_r)

miccoli · 2020-06-08T20:09:47Z

In @daskol example the level keyword seems a noop, as any other arg passed to rolling.

from pandas.testing import assert_frame_equal

assert_frame_equal(
    df.groupby(by="obs", group_keys=False).rolling(7).mean(),
    df.groupby(by="obs", group_keys=False).rolling(7, level=1).mean(),
)
assert_frame_equal(
    df.groupby(by="obs", group_keys=False).rolling(7).mean(),
    df.groupby(by="obs", group_keys=False).rolling(7, foo=8, bar='8').mean(),
)

I'm still very confused.

miccoli · 2020-09-07T07:15:21Z

A brief update. After updating pandas to v1.1.1 the above walk around fails with SEGFAULT!

>>> df.groupby(by="obs", group_keys=False).rolling("7d", on=df.index.levels[0])
RollingGroupby [window=604800000000000,min_periods=1,center=False,win_type=freq,axis=0,on=DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',
               '2020-01-09', '2020-01-10'],
              dtype='datetime64[ns]', name='date', freq=None)]

but

>>> df.groupby(by="obs", group_keys=False).rolling("7d", on=df.index.levels[0]).mean()
Segmentation fault: 11

It seems to me that this issue should be labeled as BUG

pandas version info

INSTALLED VERSIONS
------------------
commit           : f2ca0a2665b2d169c97de87b8e778dbed86aea07
python           : 3.8.2.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 19.6.0
Version          : Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : None.UTF-8

pandas           : 1.1.1
numpy            : 1.19.1
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.2.2
setuptools       : 49.6.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : 1.0.1
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.17.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : 0.8.0
fastparquet      : None
gcsfs            : None
matplotlib       : 3.3.1
numexpr          : 2.7.1
odfpy            : None
openpyxl         : 3.0.3
pandas_gbq       : None
pyarrow          : 1.0.1
pytables         : None
pyxlsb           : None
s3fs             : 0.5.0
scipy            : 1.5.2
sqlalchemy       : None
tables           : 3.6.1
tabulate         : 0.8.7
xarray           : 0.16.0
xlrd             : 1.2.0
xlwt             : None
numba            : 0.51.1

miccoli · 2020-09-07T07:18:15Z

SEGFAULT possibly linked to #36018

drlakecoder · 2021-11-23T04:25:20Z

hi, I wonder if there were any update on this issue, in my situation, I am also unable to do rolling with non-integer windows when there is multi-index in the dataframe. With the simple example that was used in this thread, I would get different error like the following:

df.groupby(by="obs", group_keys=False).rolling("7d", on=df.index.levels[0])
Out[21]:
RollingGroupby [window=7d,min_periods=1,center=False,win_type=freq,axis=0,on=DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
'2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',
'2020-01-09', '2020-01-10'],
dtype='datetime64[ns]', name='date', freq='D'),method=single]

df.groupby(by="obs", group_keys=False).rolling("7d", on=df.index.levels[0]).mean()
Traceback (most recent call last):

File "C:\Users\c_yy\AppData\Local\Temp/ipykernel_15476/3165508401.py", line 1, in
df.groupby(by="obs", group_keys=False).rolling("7d", on=df.index.levels[0]).mean()

File "C:\Users\c_yy\anaconda3\lib\site-packages\pandas\core\window\rolling.py", line 1855, in mean
return super().mean(*args, engine=engine, engine_kwargs=engine_kwargs, **kwargs)

File "C:\Users\c_yy\anaconda3\lib\site-packages\pandas\core\window\rolling.py", line 1309, in mean
return self._apply(window_func, name="mean", **kwargs)

File "C:\Users\c_yy\anaconda3\lib\site-packages\pandas\core\window\rolling.py", line 594, in _apply
result = super()._apply(

File "C:\Users\c_yy\anaconda3\lib\site-packages\pandas\core\window\rolling.py", line 545, in _apply
return self._apply_blockwise(homogeneous_func, name)

File "C:\Users\c_yy\anaconda3\lib\site-packages\pandas\core\window\rolling.py", line 441, in _apply_blockwise
new_mgr = mgr.apply(hfunc, ignore_failures=True)

File "C:\Users\c_yy\anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 325, in apply
applied = b.apply(f, **kwargs)

File "C:\Users\c_yy\anaconda3\lib\site-packages\pandas\core\internals\blocks.py", line 381, in apply
result = func(self.values, **kwargs)

File "C:\Users\c_yy\anaconda3\lib\site-packages\pandas\core\window\rolling.py", line 431, in hfunc
res_values = homogeneous_func(values)

File "C:\Users\c_yy\anaconda3\lib\site-packages\pandas\core\window\rolling.py", line 535, in homogeneous_func
result = np.apply_along_axis(calc, self.axis, values)

File "<array_function internals>", line 5, in apply_along_axis

File "C:\Users\c_yy\anaconda3\lib\site-packages\numpy\lib\shape_base.py", line 379, in apply_along_axis
res = asanyarray(func1d(inarr_view[ind0], *args, **kwargs))

File "C:\Users\c_yy\anaconda3\lib\site-packages\pandas\core\window\rolling.py", line 521, in calc
start, end = window_indexer.get_window_bounds(

File "C:\Users\c_yy\anaconda3\lib\site-packages\pandas\core\window\indexers.py", line 333, in get_window_bounds
index_array = self.index_array.take(ensure_platform_int(indices))

IndexError: index 10 is out of bounds for axis 0 with size 10

When using my own real data set (see below), I get different error:

rxx.loc['2015-12-01':'2017-12-29',:]
TotRet SpecRet ... volumedays IsTrading
date order_book_id ...
2015-12-01 000155.XSHE 0.014580 0.007206 ... 53.0 1.0
2015-12-02 000155.XSHE -0.004785 0.002132 ... 53.0 1.0
2015-12-03 000155.XSHE 0.034894 0.006537 ... 53.0 1.0
2015-12-04 000155.XSHE 0.050001 0.021692 ... 53.0 1.0
2015-12-07 000155.XSHE 0.049832 0.026999 ... 53.0 1.0
... ... ... ... ...
2017-12-25 000155.XSHE -0.049707 -0.053313 ... 5.0 1.0
2017-12-26 000155.XSHE -0.023771 -0.013894 ... 6.0 1.0
2017-12-27 000155.XSHE 0.006497 0.014989 ... 7.0 1.0
2017-12-28 000155.XSHE -0.030648 -0.026356 ... 8.0 1.0
2017-12-29 000155.XSHE 0.003335 -0.005979 ... 9.0 1.0

[104 rows x 112 columns]

rxx.groupby(level='order_book_id',group_keys=False).rolling('63d',on=rxx.index.levels[0]).mean()
***** ValueError: Length of values (567) does not match length of index (150)**

pd.show_versions()

INSTALLED VERSIONS

commit : 945c9ed
python : 3.8.8.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22000
machine : AMD64
processor : AMD64 Family 25 Model 33 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : en
LOCALE : English_United States.1252

pandas : 1.3.4
numpy : 1.21.2
pytz : 2021.3
dateutil : 2.8.2
pip : 21.2.2
setuptools : 58.0.4
Cython : 0.29.24
pytest : 6.2.4
hypothesis : None
sphinx : 4.2.0
blosc : None
feather : None
xlsxwriter : 3.0.1
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : 1.0.2
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.29.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
fsspec : 2021.08.1
fastparquet : None
gcsfs : None
matplotlib : 3.4.3
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 6.0.0
pyxlsb : None
s3fs : None
scipy : 1.7.1
sqlalchemy : 1.4.22
tables : 3.6.1
tabulate : 0.8.9
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.1

jpodivin · 2022-07-30T11:18:07Z

I have to concur. This seems like a useful thing to have. Or if there is an alternative way to handle this use case it could be described in docs.

d-otto · 2022-12-06T18:00:11Z

I've run into this issue as well.

The documentation for df.rolling() states on= should be: "a column label or Index level on which to calculate the rolling window". My expectation was that I could pass the name of a multiindex level and .rolling() would group rows by unique index level values. This all might be better handled by .groupby(), but I'd love to see more integrated multiindex handling where convenient.

pollackscience · 2022-12-08T14:41:49Z

I have a similar issue, which again involves performing rolling actions on a dataframe with duplicate datetimes. I've gotten a complicated workaround using groupby and some other hacks, but a more native support would be greatly appreciated.

Kalaweksh · 2023-01-20T19:55:34Z

The groupby transform method has this exact functionality. The docs explain it pretty well. Although I agree that the windowing API should definitely natively have this functionality.

claudia-buss · 2023-02-09T15:58:44Z

This is the solution wanted
https://www.appsloveworld.com/pandas/100/251/pandas-dataframe-rolling-window-with-multiindex

meta-ks · 2023-11-09T04:56:39Z

The doc still mentions this as funtctionality but it doesn't work:

src: Rolling doc

For a DataFrame, a column label or **Index level** on which to calculate the rolling window, rather than the DataFrame’s index.

jdkworld · 2024-01-04T16:30:50Z

The doc still mentions this as funtctionality but it doesn't work:

src: Rolling doc
For a DataFrame, a column label or **Index level** on which to calculate the rolling window, rather than the DataFrame’s index.

This is still an issue

zdwhite · 2024-05-09T18:30:10Z

The doc still mentions this as functionality but it doesn't work:

src: Rolling doc
For a DataFrame, a column label or **Index level** on which to calculate the rolling window, rather than the DataFrame’s index.

This is still an issue however there is a hacky(?) work around.

Consider the following multi indexed series

 test.info()
<class 'pandas.core.series.Series'>
MultiIndex: 5442161 entries, ('1A1', Timestamp('2024-04-07 17:00:00.431000+0000', tz='UTC')) to ('3B9', Timestamp('2024-04-> 28 16:59:58.468000+0000', tz='UTC'))
Series name: supplytemp
Non-Null Count    Dtype  
--------------    -----  
5442159 non-null  float64
dtypes: float64(1)

test.index.names
FrozenList(['site', 'TimeStamp'])

Test['rolling_avg_supply_temp'] = Test.groupby(level='site').rolling('1h',on=Test.index.get_level_values(level='TimeStamp'))['supplytemp'].mean()

While this is a tad more verbose then pandas just acknowledging the closest index this is a workable replacement. The descriptive statistics for the series line up with this method and the non indexed method.

miccoli added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Jun 8, 2020

jbrockmendel added Window rolling, ewma, expanding and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 3, 2020

mroeschke added Enhancement and removed Usage Question labels Oct 5, 2020

mroeschke changed the title ~~DOC: Is it possible to apply 'rolling' to a level in a multindex?~~ ENH: Support rolling over one level of a MultiIndex Oct 5, 2020

mroeschke mentioned this issue Jan 2, 2021

BUG: rolling does not accept MultiIndex name #38877

Closed

3 tasks

mroeschke mentioned this issue Feb 9, 2021

DOC: Remove mention that on with MultiIndex is supported with Rolling Groupby #39686

Closed

mroeschke mentioned this issue Mar 23, 2021

BUG: groupby + rolling with a non-integer window on multi-index fails- #40579

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Support rolling over one level of a MultiIndex #34642

ENH: Support rolling over one level of a MultiIndex #34642

miccoli commented Jun 8, 2020

daskol commented Jun 8, 2020

miccoli commented Jun 8, 2020

miccoli commented Sep 7, 2020

miccoli commented Sep 7, 2020

drlakecoder commented Nov 23, 2021

jpodivin commented Jul 30, 2022

d-otto commented Dec 6, 2022

pollackscience commented Dec 8, 2022

Kalaweksh commented Jan 20, 2023

claudia-buss commented Feb 9, 2023

meta-ks commented Nov 9, 2023

jdkworld commented Jan 4, 2024

zdwhite commented May 9, 2024

ENH: Support rolling over one level of a MultiIndex #34642

ENH: Support rolling over one level of a MultiIndex #34642

Comments

miccoli commented Jun 8, 2020

daskol commented Jun 8, 2020

miccoli commented Jun 8, 2020

miccoli commented Sep 7, 2020

miccoli commented Sep 7, 2020

drlakecoder commented Nov 23, 2021

INSTALLED VERSIONS

jpodivin commented Jul 30, 2022

d-otto commented Dec 6, 2022

pollackscience commented Dec 8, 2022

Kalaweksh commented Jan 20, 2023

claudia-buss commented Feb 9, 2023

meta-ks commented Nov 9, 2023

jdkworld commented Jan 4, 2024

zdwhite commented May 9, 2024