Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Casting Timestamp scalar does not preserve UTC Suffix #35370

Closed
wirable23 opened this issue Apr 29, 2023 · 4 comments
Closed

[Python] Casting Timestamp scalar does not preserve UTC Suffix #35370

wirable23 opened this issue Apr 29, 2023 · 4 comments

Comments

@wirable23
Copy link

Describe the bug, including details regarding any error messages, version, and platform.

>>> ts = pd.Timestamp('2000-12-01 00:00:00+0000', tz='UTC')
>>> scal = pa.scalar(ts, type=pa.timestamp("ns", tz="UTC"))
>>> scal.cast(pa.string())
<pyarrow.StringScalar: '2000-12-01 00:00:00.000000000'>
>>>

No UTC "Z" suffix is present, but will be present when casting using arrays:

>>> dt_arr = pa.array([ts], type=pa.timestamp("ns", tz="UTC"))
>>> dt_arr.cast(pa.string())
<pyarrow.lib.StringArray object at 0x000002268498B340>
[
  "2000-12-01 00:00:00.000000000Z"
]
>>>

When using the Timestamp array cast, the UTC "Z" suffix is preserved, when using the scalar, it's not preserved.

Component(s)

Python

@danepitkin
Copy link
Member

Good catch! I can reproduce the bug. The scalar code path does not handle timezone info. The pyarrow scalar cast uses a different implementation (arrow/cpp/src/arrow/util/time.h::ConvertTimestampValue()) than the pyarrow array cast (arrow/compute/kernels/scalar_cast_string.cc::TemporalToStringCastFunctor). Ideally, all functionality should be migrated to the compute kernel implementation. That will be a pretty big change, so the quickest option is to patch the current code path to add support for timezones.

@danepitkin
Copy link
Member

Alternatively, you can call the compute function directly:

>>> pa.compute.cast(scal, target_type=pa.string())
<pyarrow.StringScalar: '2000-12-01 00:00:00.000000000Z'>

@danepitkin
Copy link
Member

A somewhat related issue: #35040

@kou kou changed the title Casting Timestamp scalar does not preserve UTC Suffix [Python] Casting Timestamp scalar does not preserve UTC Suffix May 2, 2023
AlenkaF pushed a commit that referenced this issue May 11, 2023
)

### Rationale for this change

Scalar cast should use the computer kernel just like Arrays, instead of its own custom implementation.

### Are these changes tested?

Added test cases for GH-35370, GH-34901, and GH-35040

### Are there any user-facing changes?

The Scalar.cast() API is enhanced and backwards compatible. 
* Closes: #35040

Authored-by: Dane Pitkin <[email protected]>
Signed-off-by: Alenka Frim <[email protected]>
@danepitkin
Copy link
Member

Fixed as part of #35395

ArgusLi pushed a commit to Bit-Quill/arrow that referenced this issue May 15, 2023
apache#35395)

### Rationale for this change

Scalar cast should use the computer kernel just like Arrays, instead of its own custom implementation.

### Are these changes tested?

Added test cases for apacheGH-35370, apacheGH-34901, and apacheGH-35040

### Are there any user-facing changes?

The Scalar.cast() API is enhanced and backwards compatible. 
* Closes: apache#35040

Authored-by: Dane Pitkin <[email protected]>
Signed-off-by: Alenka Frim <[email protected]>
rtpsw pushed a commit to rtpsw/arrow that referenced this issue May 16, 2023
apache#35395)

### Rationale for this change

Scalar cast should use the computer kernel just like Arrays, instead of its own custom implementation.

### Are these changes tested?

Added test cases for apacheGH-35370, apacheGH-34901, and apacheGH-35040

### Are there any user-facing changes?

The Scalar.cast() API is enhanced and backwards compatible. 
* Closes: apache#35040

Authored-by: Dane Pitkin <[email protected]>
Signed-off-by: Alenka Frim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants