Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-6652: [Python] Fix ChunkedArray.to_pandas to retain timezone #5471

Conversation

jorisvandenbossche
Copy link
Member

Follow-up on #5462 to also apply this fix for ChunkedArray.

if tz is not None:
tz = string_to_tzinfo(tz)
result = (result.dt.tz_localize('utc')
.dt.tz_convert(tz))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure those 6 lines are worth a separate helper function (now this is duplicated from Array.to_pandas), but if others prefer I can certainly do that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me, it can be deduplicated in the future if needed. @wesm or @pitrou any preference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's small but carries domain knowledge, so I'd rather see a helper function (perhaps in the pandas_compat module).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I made a small helper function (see new push).
For now I still put it behind the if check to only need to import it from pandas_compat if it would actually be needed. But that made that it is not much shorter as before in lines of code .. If find this clearer in the code to read, but I can also always call the helper function and move the if check inside the helper function.

@codecov-io
Copy link

Codecov Report

Merging #5471 into master will decrease coverage by 22.24%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #5471       +/-   ##
===========================================
- Coverage   88.64%   66.39%   -22.25%     
===========================================
  Files         958      505      -453     
  Lines      127522    69887    -57635     
  Branches     1498        0     -1498     
===========================================
- Hits       113039    46403    -66636     
- Misses      14118    23484     +9366     
+ Partials      365        0      -365
Impacted Files Coverage Δ
python/pyarrow/table.pxi 88.7% <100%> (+0.16%) ⬆️
python/pyarrow/tests/test_table.py 99.67% <100%> (ø) ⬆️
python/pyarrow/tests/test_array.py 93.56% <100%> (+0.02%) ⬆️
cpp/src/arrow/util/memory.h 0% <0%> (-100%) ⬇️
cpp/src/gandiva/date_utils.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/util/memory.cc 0% <0%> (-100%) ⬇️
cpp/src/gandiva/decimal_type_util.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/compute/logical_type.h 0% <0%> (-100%) ⬇️
cpp/src/parquet/hasher.h 0% <0%> (-100%) ⬇️
cpp/src/gandiva/basic_decimal_scalar.h 0% <0%> (-100%) ⬇️
... and 702 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd8f628...dbf788a. Read the comment docs.

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jorisvandenbossche ! I confirmed that with this change and #5465 I can pass Spark integration tests locally.

@jorisvandenbossche jorisvandenbossche force-pushed the ARROW-6652-chunked-array-timezone branch from dbf788a to 89d0044 Compare September 24, 2019 10:42
Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@wesm wesm closed this in 61637dd Sep 24, 2019
@jorisvandenbossche jorisvandenbossche deleted the ARROW-6652-chunked-array-timezone branch September 24, 2019 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants