BUG: Segmentation-fault during a "custom" rolling operation #44470

erezinman · 2021-11-15T15:07:54Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

from pandas.core.window.indexers import BaseIndexer
from typing import Union, Tuple, Optional

import numpy as np
import pandas as pd


class RollingFixedOffsetVariableWindowIndexer(BaseIndexer):

    def get_window_bounds(self, num_values: int = 0, min_periods: Optional[int] = None, center: Optional[bool] = None,
                          closed: Optional[str] = None) -> Tuple[np.ndarray, np.ndarray]:
        # Every hour
        start = np.arange(0, num_values, 60 * 60)
        # Every hour + day
        end = start + 60 * 60 * 24

        return start, end


ser = pd.Series(np.arange(100000), index=pd.date_range(start='2020', freq='1s', periods=100000))
idxr = RollingFixedOffsetVariableWindowIndexer(window_size=60 * 60 * 24)
out = ser.rolling(idxr, min_periods=10).mean()

Issue Description

(Maybe related to #43267 ? Couldn't tell.)

I'm trying to create a custom BaseIndexer for a .rolling operation. The extraction of the window is quick and efficient, yet when I attempt to perform various operations on it I get the seg-fault error:

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

I have created an MWE for this behavior using a fixed-indexer (that carries the same name). The original indexer was supposed to produce a rolling-operation with overlaps, so the above indexer is similar in the sense that it creates a 1-day windows 1-hour apart for a one-second index.

Expected Behavior

Hmmm... Not to have a seg-fault error?

Installed Versions

INSTALLED VERSIONS

commit : 945c9ed
python : 3.8.9.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-77-generic
Version : #86~18.04.1-Ubuntu SMP Fri Jun 18 01:23:22 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.3.4
numpy : 1.18.5
pytz : 2018.3
dateutil : 2.8.1
pip : 21.1.3
setuptools : 57.4.0
Cython : None
pytest : 6.2.2
hypothesis : None
sphinx : 3.5.3
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 0.999999999
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : None
pandas_datareader: None
bs4 : 4.6.0
bottleneck : None
fsspec : 0.8.7
fastparquet : None
gcsfs : None
matplotlib : 3.1.3
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : 3.6.1
tabulate : 0.8.9
xarray : None
xlrd : 1.2.0
xlwt : None
numba : None
None

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

The text was updated successfully, but these errors were encountered:

mroeschke · 2021-11-17T04:03:23Z

A stipulation of rolling operations is that the result is the same shape as the input; therefore, the arrays in get_window_bounds needs to return start and end arrays that are also the same shape as the input.

Instead of segfaulting, pandas should return an error checking that these lengths are equal.

erezinman · 2021-11-18T12:14:05Z

@mroeschke
Hi,
first, please document this issue. Maybe in the BaseIndexer.get_window_bounds you could add this assumption to the doc-string?

Also, when a rolling operation that requires equal window-sizes, you can check that it does satisfy that, and raise an error only if needed. Maybe .sum requires it, but .apply surely doesn't.

Moreover - why is this a requirement? I mean .groupby, for example, doesn't have this requirement, so why isn't the underlying mechanism more similar?

jreback · 2021-11-18T12:17:20Z

@erezinman there is already #44497

erezinman · 2021-11-18T13:20:07Z

Yes, thank you. I see it takes care of my first statement.

mroeschke · 2021-11-18T20:57:39Z

Moreover - why is this a requirement? I mean .groupby, for example, doesn't have this requirement, so why isn't the underlying mechanism more similar?

This is to achieve API output consistency with native rolling operations which also returns a result is the same shape as the input. https://pandas.pydata.org/docs/user_guide/window.html

If this API breaking change was made, it's also unclear what the corresponding index (or column with axis=1) label should be associated with a window result that is different than the shape of the input.

erezinman · 2021-11-19T10:21:02Z

One way is to solve this is to allow returning the labels (much like in .groupby), but that might be too big a change (even though it can be made backward-compatible so it's mostly a matter of an internal change that you can test).

Another possibility is to just have the first/center item in the window (depending on the center argument) be the label. This would just result in non-unique indices which could occur only in future flows. That shouldn't be much of a change.

What do you think?

mroeschke · 2021-11-19T17:19:01Z

The second sound reasonable, but currently the code needs significant refactoring to support outputs that are not the same shape as the input.

mroeschke · 2021-11-19T17:21:26Z

As a work around, you can always iterate over the window object and use concat.

e.g.
result = pd.concat([window.mean() for window in df.rolling(MyIndexer)])

erezinman added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 15, 2021

erezinman changed the title ~~BUG: Segmentation-fault during "custom" rolling opertation~~ BUG: Segmentation-fault during "custom" rolling operation Nov 15, 2021

erezinman changed the title ~~BUG: Segmentation-fault during "custom" rolling operation~~ BUG: Segmentation-fault during a "custom" rolling operation Nov 15, 2021

CloseChoice added Segfault Non-Recoverable Error Window rolling, ewma, expanding labels Nov 16, 2021

mroeschke added Enhancement Error Reporting Incorrect or improved errors from pandas Bug and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member Enhancement labels Nov 17, 2021

mroeschke mentioned this issue Nov 17, 2021

ERR: Raise ValueError when BaseIndexer start & end bounds are unequal length #44497

Merged

4 tasks

mroeschke added this to the 1.4 milestone Nov 17, 2021

jreback closed this as completed in #44497 Nov 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Segmentation-fault during a "custom" rolling operation #44470

BUG: Segmentation-fault during a "custom" rolling operation #44470

erezinman commented Nov 15, 2021

mroeschke commented Nov 17, 2021

erezinman commented Nov 18, 2021 •

edited

Loading

jreback commented Nov 18, 2021 •

edited

Loading

erezinman commented Nov 18, 2021

mroeschke commented Nov 18, 2021

erezinman commented Nov 19, 2021

mroeschke commented Nov 19, 2021

mroeschke commented Nov 19, 2021

BUG: Segmentation-fault during a "custom" rolling operation #44470

BUG: Segmentation-fault during a "custom" rolling operation #44470

Comments

erezinman commented Nov 15, 2021

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

mroeschke commented Nov 17, 2021

erezinman commented Nov 18, 2021 • edited Loading

jreback commented Nov 18, 2021 • edited Loading

erezinman commented Nov 18, 2021

mroeschke commented Nov 18, 2021

erezinman commented Nov 19, 2021

mroeschke commented Nov 19, 2021

mroeschke commented Nov 19, 2021

erezinman commented Nov 18, 2021 •

edited

Loading

jreback commented Nov 18, 2021 •

edited

Loading