Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Failed to parse string into timestamp #25940

Closed
asfimport opened this issue Sep 3, 2020 · 5 comments
Closed

[Python] Failed to parse string into timestamp #25940

asfimport opened this issue Sep 3, 2020 · 5 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Sep 3, 2020

Hi,

Not sure if I am missing something, but I am unable to get pyarrow to parse my datetimes that are being inferred as strings, to be timestamps.

My strings are arriving in CSVs with this format: '2015-01-09 00:00:00.000'

I have tried:

convert_ops = csv.ConvertOptions(timestamp_parsers=['%Y-%m-%d %H:%M:%S.%f])
df = csv.read_csv('path_to_csv', convert_options=convert_opts)
print(df.schema)

This yields no change and has my columns with these formatted timestamps still showing as strings.

Additionally, I have tried casting as well:

dfschema = pa.schema([
('date_column', pa.timestamp('ms'))
])
df = csv.read_csv('path_to_csv')
df.cast(target_schema=dfschema)

This way yields the error: "pyarrow.lib.ArrowInvalid: Failed to parse string: 2015-01-09 00:00:00.000"

I am using pyarrow=1.0.1 on a linux docker container.

I tried to send an email to the users email list but it keeps returning a Mailer Daemon error.

Reporter: Gary
Watchers: Rok Mihevc / @rok

Related issues:

Note: This issue was originally created as ARROW-9907. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
See also ARROW-9561, support for parsing subsecond timestamps (fractional seconds) was only added after the 1.0 release. So the error is expected for 1.0, and should work on master.

Now, that the reading of csv silently results in strings instead of indicating that the specified format is not supported might be something to check.

@asfimport
Copy link
Collaborator Author

Neal Richardson / @nealrichardson:
If you want to test out the latest (unreleased) version that includes this timestamp support, you can pip install a wheel from https://repo.fury.io/arrow-nightlies/.

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:

Now, that the reading of csv silently results in strings instead of indicating that the specified format is not supported might be something to check.

Hmm, since the format string can basically contain any random string as well (in addition to specific % fields), that might be difficult to check, actually.

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
Short update here clarifying the current status of this issue:

  1. Parsing sub-second values (fractional second) is supported with the default ISO8601 parser:
import io
from pyarrow import csv

s = """col
2015-01-09 00:00:00.000"""

>>> csv.read_csv(io.BytesIO(s.encode()))
pyarrow.Table
col: timestamp[ns]
----
col: [[2015-01-09 00:00:00.000000000]]

>>> csv.read_csv(io.BytesIO(s.encode()), convert_options=csv.ConvertOptions(timestamp_parsers=[csv.ISO8601]))
pyarrow.Table
col: timestamp[ns]
----
col: [[2015-01-09 00:00:00.000000000]]
  1. It does not yet work when manually specifying the format (resulting type is string and not timestamp):
In [28]: csv.read_csv(io.BytesIO(s.encode()), convert_options=csv.ConvertOptions(timestamp_parsers=["%Y-%m-%d %H:%M:%S.%f"]))
Out[28]: 
pyarrow.Table
col: string
----
col: [["2015-01-09 00:00:00.000"]]

This can also be seen directly in strptime:

>>> import pyarrow.compute as pc
>>> pc.strptime("2015-01-09 00:00:00.000", format="%Y-%m-%d %H:%M:%S", unit="ns")
...
ArrowInvalid: Failed to parse string: '2015-01-09 00:00:00.000' as a scalar of type timestamp[ns]

>>> pc.strptime("2015-01-09 00:00:00.000", format="%Y-%m-%d %H:%M:%S.%f", unit="ns")
...
ArrowInvalid: Failed to parse string: '2015-01-09 00:00:00.000' as a scalar of type timestamp[ns]

For the issue of parsing fractional seconds in strptime, we also have ARROW-10430 and ARROW-15883

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
So I am going to close this as a duplicate of ARROW-15883

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant