-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: hash of Timestamp on fold=1 create a Segfault #33931
Comments
Seems specific to dateutil timezones
|
From the doc of #31563: Also I tried to figured it out the reason of this bug, it seems that this initialization create an undefined behavior (value, freq, etc. becomes 0/ pandas/pandas/_libs/tslibs/timestamps.pyx Line 48 in 911e19b
I tried to have the support of the
Anyway, it kind of worked at this end, and the segfault was no longer present, but I had weird hash values (always the same one for different timestamps). I am not sure of the origin of the bug, and I really don't really have the time to mess around too much with this. But hopefully, I thought that could help 🤷♂️. Also maybe @AlexKirko could have a look on that? He did a great job on #31563, and he might have more insights on what's going on here. |
I commented in this commit the change that I did: hasB4K@b7200a2 - it fixes the segfault (I have not added a proper test in this commit though), and it seems that the hash values are correct after all. The only thing is when Like I said, I don't think I will have the time to dig more on this issue anytime soon, so I'm not planning to create a PR for now, but maybe my debugging may help. 🤷♂️ |
Found another way of hitting this (might be useful as a test case perhaps?): happens on the Fall DST boundary (not the Spring) when comparing DateTimeIndex's with mixed timezone sources, one of which comes from dateutil python 3.9.4, pandas 1.2.4, dateutil 2.8.1: # pandas_segfault.py:
import pandas as pd
import dateutil
DATEUTIL_US_PAC = dateutil.tz.gettz('US/Pacific')
# df_1 uses pandas timezone string resolution:
df_1 = pd.DataFrame(
'aaa',
columns=['A'],
# Spring US/Pacific DST: Works fine
# index=pd.date_range(start='2021-03-14', end='2021-03-15', tz='US/Pacific', freq='H'),
# Fall US/Pacific DST: SEGFAULT!!!
index=pd.date_range(start='2020-11-01', end='2020-11-02', tz='US/Pacific', freq='H'),
)
# df_2 uses dateutils timezone objects
df_2 = pd.DataFrame(
'bbb',
columns=['B'],
# Spring US/Pacific DST: Works fine
# index=pd.date_range(start='2021-03-14', end='2021-03-15', tz=DATEUTIL_US_PAC, freq='H'),
# Fall US/Pacific DST: SEGFAULT!!!
index=pd.date_range(start='2020-11-01', end='2020-11-02', tz=DATEUTIL_US_PAC, freq='H'),
)
# Here we do an operation that compares the two mixed-tz DateTimeIndexes
print(pd.concat([df_1, df_2], axis=1)) If both df's use
|
fwiw, pandas 1.0.5 doesn't segfault, pandas 1.1.5 does. (segfault on my DateTimeIndex-mixed-tz-comparison use-case, not the initial Workaround for me is to downgrade to the 1.0.x branch. |
Code Sample, a copy-pastable example
Problem description
It should return a correct hash value, and it should not Segfault.
This create issue when using a manipulating a Timestamp with a dictionary or a set.
Expected Output
I would have expected the same behavior than datetime in Python:
So it seems that this bug is coming from pandas.
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: