-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] datetime shifted when using pyarrow.Table.from_pandas to load a pandas DateFrame containing datetime with timezone #20493
Comments
Miles Granger / @milesgranger: import pandas as pd
import pyarrow
ts = pd.Timestamp("2022-10-21 22:46:17", tz="America/Los_Angeles")
df = pd.DataFrame(\{"TS": [ts]})
table = pyarrow.Table.from_pandas(df)
print(df)
# TS
# 0 2022-10-21 22:46:17-07:00
print(table.to_pandas())
# TS
# 0 2022-10-21 22:46:17-07:00 However, placing mixed timezones makes the behavior more apparent in that it is coercing to the first timezone. ts = pd.Timestamp("2022-10-21 22:46:17", tz="America/Los_Angeles")
df = pd.DataFrame({"TS": [ts, pd.Timestamp("2022-10-21 22:46:17", tz="UTC")]})
table = pyarrow.Table.from_pandas(df)
print(df)
# TS
# 0 2022-10-21 22:46:17-07:00
# 1 2022-10-21 22:46:17+00:00
print(table)
# pyarrow.Table
# TS: timestamp[us, tz=America/Los_Angeles]
# ----
# TS: [[2022-10-22 05:46:17.000000,2022-10-21 22:46:17.000000]]
print(table.to_pandas())
# TS
# 0 2022-10-21 22:46:17-07:00
# 1 2022-10-21 15:46:17-07:00 I believe |
Joris Van den Bossche / @jorisvandenbossche:
Yes, this is in this case the cause of the confusion. The dates are not "wrong" after conversion to arrow, they are just confusingly printed in UTC without any indication of this. We have ARROW-14567 to track this issue.
That's a separate issue (and something that doesn't happen that often, for example also pandas requires a single timezone for a column, if you have a datetime64 dtype). But indeed, Arrow's timestamp type requires a single timezone, and thus when encountering multiple ones, we currently coerce to the first one. I think it would be better to coerce to UTC instead (-> ARROW-5912). |
Miles Granger / @milesgranger: |
Problem:
When using pyarrow.Table.from_pandas to load a pandas DataFrame which contains a timestamp object with timezone information, the created Table object will shift the datetime, while still keeping the timezone information. Please see my scripts.
Reproduce scripts:
Expected results:
The table should not shift the datetime when timezone information is provided.
Environment: MacOS M1, Python 3.8.13
Reporter: Adam Ling
Related issues:
Note: This issue was originally created as ARROW-18298. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: