-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] pyarrow compute strptime not working with format '%Y%m%d %H%M%S%f' #41132
Comments
Thanks for reporting this @nikfio ! >>> pc.strptime('20090101 185956', format='%Y%m%d %H%M%S', unit='ms')
<pyarrow.TimestampScalar: '2009-01-01T18:59:56.000'> To work around this you can try [this (cumbersome) approach] for now: import pyarrow as pa
import pyarrow.compute as pc
ts = pa.array(["1970-01-01T00:00:59.123456789", "2000-02-29T23:23:23.999999999"], pa.string())
ts2 = pc.strptime(pc.utf8_slice_codeunits(ts, 0, 19), format="%Y-%m-%dT%H:%M:%S", unit="ns")
d = pc.utf8_slice_codeunits(ts, 20, 99).cast(pa.int64()).cast(pa.duration("ns"))
pc.add(ts2, d) |
Tag #31324 for reference. |
Hello @rok, thank you for your help. I tried your piece suggested. But executing the first operation still gives the same error :
I know and I am sorry actually, this date format ( I managed to work aroud the issue by reading the timestamp at first as pyarrow string and the passing through a datetime conversion using pandas
Didn't wrote everything clear in point (3) to be more synthesized. Thanks, |
Hey @nikfio, sorry I didn't use your raw data. This works for me on your example: import pyarrow as pa
import pyarrow.compute as pc
ts = pa.array(["20090101 185956123"], pa.string())
ts2 = pc.strptime(pc.utf8_slice_codeunits(ts, 0, 15), format="%Y%m%d %H%M%S", unit="ms")
d = pc.utf8_slice_codeunits(ts, 15, 99).cast(pa.int64()).cast(pa.duration("ms"))
pc.add(ts2, d)
Your Pandas/numpy workaround looks good. I'm not sure which approach would be better for your usecase. |
Great @rok I just tested it and it works fine as expected. sorry also my fault I didn't understand at first the usage of I think I'll go with yours in order to have a all-pyarrow operation. Thanks a lot, |
Close as a prime solution has been suggested by @rok and an alternative one has been proposed. |
Thanks for confirming it worked and closing the isssue @nikfio. |
Related Stack Overflow discussion. |
Hi guys,
I am trying to convert a timestamp column from string to timestamp datatype.
I have this format:
%Y%m%d %H%M%S%f
To read, for example, a date equal to
20090101 185956000
.Assuming that
table['timestamp'][0] = '20090101 185956000'
Running the script below
import pyarrow.compute as pc
pc.strptime(table['timestamp'][0], format='%Y%m%d %H%M%S%f', unit='ms')
throws the error:
*** pyarrow.lib.ArrowInvalid: Failed to parse string: '20090101 185956000' as a scalar of type timestamp[ms]
Meanwhile if I use strptime from datetime module:
test_date = str(table['timestamp'][0])
datetime.strptime(str(df['timestamp'][0]), '%Y%m%d %H%M%S%f')
It gives as output a correctly read datetime object:
datetime.datetime(2009, 1, 1, 18, 59, 56)
Should the pyarrow strptime work like the standard datetime strptime function?
What am I doing wrong?
Many thanks,
Nick
Component(s)
Python
The text was updated successfully, but these errors were encountered: