You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our current set up of read into arrow then cast will cause this issue if arrow sees (CSV/JSON) timestamp strings in the ISO standard format. A current work around for CSVs can be this:
fromioimportBytesIOimportpyarrowaspafrompyarrowimportcsvfromarrow_pd_parser.parseimportpa_read_csv_to_pandascsv_data=b"""a,b1,2020-01-01 00:00:002,2021-01-01 23:59:59"""# note can also provide partial schema and get package to infer a's type by also setting `expect_full_schema=False`schema=pa.schema([("b", pa.string())])
test_file=BytesIO(csv_data)
# The following line will raise an ArrowNotImplementedError.# This is because there is currently no implementation to casting timestamps to str.df=pa_read_csv_to_pandas(test_file, schema=schema, expect_full_schema=False)
# By default Arrow will read in str representations of timestamps as# timestamps if they conform to ISO standard format.# Then you get the error when you try and cast that timestamp to str. To# get around this you can force pyarrow to read in the data as a string# when it parses it as a CSV (note that ConvertOptions is not currently# available for the JSON reader)co=csv.ConvertOptions(column_types=schema)
df=pa_read_csv_to_pandas(test_file, schema=schema, expect_full_schema=False, convert_options=co)
But this seems quite clunky. It can also not be implemented for JSON which do not currently have a ConvertOptions module. Also worth remembering that we moved to the frame work of (let arrow read in using its best guess at the data then cast as providing a schema to the JSON reader caused an issue (see #40). It may be worth updating to pyarrow 3.0 and seeing if this issue still persists, if not perhaps we should provide the schema on read in. Failing that it might be worth casting the data via Pandas rather than Arrow.
The text was updated successfully, but these errors were encountered:
Our current set up of read into arrow then cast will cause this issue if arrow sees (CSV/JSON) timestamp strings in the ISO standard format. A current work around for CSVs can be this:
But this seems quite clunky. It can also not be implemented for JSON which do not currently have a
ConvertOptions
module. Also worth remembering that we moved to the frame work of (let arrow read in using its best guess at the data then cast as providing a schema to the JSON reader caused an issue (see #40). It may be worth updating to pyarrow 3.0 and seeing if this issue still persists, if not perhaps we should provide the schema on read in. Failing that it might be worth casting the data via Pandas rather than Arrow.The text was updated successfully, but these errors were encountered: