-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Unexpected behavior math operations using multiindexes #51500
Comments
Hi, thanks for your report. Can you provide a minimal reproducible example? See https://matthewrocklin.com/minimal-bug-reports for an explanation |
Hi, of course. The code above is as minimal as i could get it though. Oh wait, upon rechecking I could reduce the input data little bit further, sorry! Now, bit more reduced and without the comments. import pandas as pd
data = pd.DataFrame([["C","B","B"],["B","A","A"],["B","A","B"]], columns=["0","1","2"])
data["0"] = data["0"].astype("category")
data["0"] = data["0"].cat.rename_categories({"C":"B", "B":"C"})
a = data.groupby(by=["0","1"])["2"].value_counts()
b = data.groupby(by=["0","1"]).size()
a.div(b) For the expected behavior, but with a differently ordered index, use: a.div(b.sort_index(ascending=False)) My findings so far:
Based on the above, I would guess that |
Ah it's not only the data but also the necessary steps. No need to include groupby, you could simply create the DataFrames that groupby returns directly without any additional methods. Edit: To be more clear, you can just write the MultiIndex down the way it's returned by the groupby op. |
I actually tried! But those 'manually created' indexes behave as expected. I did have a quick look what could be the difference between the 'manual' index and the one generated using the code snippet above. If i remember correctly the |
Ah, this is a valuable information for your initial post. This points to something in groupby not working as expected. Your example works as expected on main btw. I recall that we fixed something similar a couple of months ago (the groupby case). |
Thanks for checking. (I tried to build main, but issue #51499 prevented me.) |
This might need tests, not sure |
take |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Performing a division/multiplication/addition/subtraction using two multiindexes leads to NaN's where that is not appropriate.
Expected Behavior
Performing a math operation produces the expected result (see also code snippet above).
Installed Versions
INSTALLED VERSIONS
commit : 2e218d1
python : 3.8.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : Swedish_Sweden.1252
pandas : 1.5.3
numpy : 1.23.5
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 47.1.0
pip : 22.3.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.6.3
numba : 0.56.4
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.10.0
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : 2022.7
The text was updated successfully, but these errors were encountered: