Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Pandas rolling window always converts to float #53214

Open
3 tasks done
daviddavo opened this issue May 13, 2023 · 2 comments
Open
3 tasks done

BUG: Pandas rolling window always converts to float #53214

daviddavo opened this issue May 13, 2023 · 2 comments
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement Window rolling, ewma, expanding

Comments

@daviddavo
Copy link

daviddavo commented May 13, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

This example function gives an error if passed as an argument to apply

aux = pd.Series([1,1,1,1,2,2,3,4,], dtype='uint32')
@numba.jit(numba.float64(numba.uint32[:]), nopython=True)
def nunique(arr):
    return len(set(arr))

aux.rolling(2).apply(nunique, raw=True)

This raises the following error:

TypeError: No matching definition for argument type(s) array(float64, 1d, C)

Issue Description

If you try to use a function that receives anything other than a float on a rolling window, it will give you an error. This is because everything is converted to float beforehand.

Furthermore, arithmetic addition is not invertible for floating point1

Expected Behavior

raw=True should respect types

Installed Versions

INSTALLED VERSIONS

commit : 2e218d1
python : 3.10.10.final.0
python-bits : 64
OS : Linux
OS-release : 6.1.26-1-MANJARO
Version : #1 SMP PREEMPT_DYNAMIC Wed Apr 26 22:07:35 UTC 2023
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.5.3
numpy : 1.23.5
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 65.5.0
pip : 22.3.1
Cython : None
pytest : 7.2.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.8.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : 1.0.9
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.6.3
numba : 0.56.4
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 10.0.1
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.10.0
snappy : None
sqlalchemy : None
tables : None
tabulate : 0.8.10
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : 2023.3

Footnotes

  1. Martin Hirzel, Scott Schneider, and Kanat Tangwongsan. 2017. Sliding-Window Aggregation Algorithms: Tutorial. In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems (DEBS '17). Association for Computing Machinery, New York, NY, USA, 11–14. https://doi.org/10.1145/3093742.3095107

@daviddavo daviddavo added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 13, 2023
@lithomas1
Copy link
Member

Hi @daviddavo,
Thanks for the report. I can reproduce this on main.

The issue here is that our Cython code that handles rolling can only take in float64 inputs, so the input is cast beforehand to float64 here.

values = ensure_float64(values)

Your code is called from within the Cython function, so that's why it fails with a TypeError, since the input is already a float by then.

I believe #46619 has the same root cause as this one.

I will close that one and relabel this as an enhancement.

@lithomas1 lithomas1 added Enhancement Dtype Conversions Unexpected or buggy dtype conversions Window rolling, ewma, expanding and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 15, 2023
@lithomas1
Copy link
Member

For reference, the MWE for the complex issue is (notice the warning thrown):

Complex issue

import pandas as pd
pd.DataFrame([1j,1+2j]).rolling(2).apply(lambda x: print(x) is None)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement Window rolling, ewma, expanding
Projects
None yet
Development

No branches or pull requests

2 participants