Attempting to cast timestamps to string raises an ArrowNotImplementedError #43

isichei · 2021-03-16T09:58:57Z

Our current set up of read into arrow then cast will cause this issue if arrow sees (CSV/JSON) timestamp strings in the ISO standard format. A current work around for CSVs can be this:

from io import BytesIO
import pyarrow as pa
from pyarrow import csv
from arrow_pd_parser.parse import pa_read_csv_to_pandas

csv_data = b"""
a,b
1,2020-01-01 00:00:00
2,2021-01-01 23:59:59
"""

# note can also provide partial schema and get package to infer a's type by also setting `expect_full_schema=False`
schema = pa.schema([("b", pa.string())])
test_file = BytesIO(csv_data)

# The following line will raise an ArrowNotImplementedError.
# This is because there is currently no implementation to casting timestamps to str.
df = pa_read_csv_to_pandas(test_file, schema=schema, expect_full_schema=False)

# By default Arrow will read in str representations of timestamps as
# timestamps if they conform to ISO standard format.
# Then you get the error when you try and cast that timestamp to str. To
# get around this you can force pyarrow to read in the data as a string
# when it parses it as a CSV (note that ConvertOptions is not currently
# available for the JSON reader)
co = csv.ConvertOptions(column_types=schema)
df = pa_read_csv_to_pandas(test_file, schema=schema, expect_full_schema=False, convert_options=co)

But this seems quite clunky. It can also not be implemented for JSON which do not currently have a ConvertOptions module. Also worth remembering that we moved to the frame work of (let arrow read in using its best guess at the data then cast as providing a schema to the JSON reader caused an issue (see #40). It may be worth updating to pyarrow 3.0 and seeing if this issue still persists, if not perhaps we should provide the schema on read in. Failing that it might be worth casting the data via Pandas rather than Arrow.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempting to cast timestamps to string raises an ArrowNotImplementedError #43

Attempting to cast timestamps to string raises an ArrowNotImplementedError #43

isichei commented Mar 16, 2021

Attempting to cast timestamps to string raises an ArrowNotImplementedError #43

Attempting to cast timestamps to string raises an ArrowNotImplementedError #43

Comments

isichei commented Mar 16, 2021