You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug, including details regarding any error messages, version, and platform.
I'm trying to build a tool to confirm that a given parquet file conforms to an expected schema with our metadata format. There appears to be a bug with how pyarrow converts dates, however.
I have a dummy dataset, called test.csv (expand to view)
pyarrow.parquet doesn't round trip all types right now. Right now, the only other example I know of is is dictionary types. They always come back with int32 indices, regardless of the original index type. See also: https://lists.apache.org/thread/rv29cwf4208jh73s0gyrzpw5l87pf7pb
date64 type only exists for compatibility with systems that use milliseconds to represent dates. That representation doesn't exist in the Parquet format. It's also not a sensible representation of a date, because the logical resolution is a day, so the milliseconds information isn't used.
But it looks like we handle this for nearly every other type, including Large* variants of string, binary, and list, different timestamp resolutions, and unsigned integers. So maybe it's worth fixing these last few types.
jorisvandenbossche
changed the title
pyarrow won't cast date32 to date64
[C++] date64[ms] comes back as date32[day] after roundtrip to Parquet
Jul 25, 2024
Describe the bug, including details regarding any error messages, version, and platform.
I'm trying to build a tool to confirm that a given parquet file conforms to an expected schema with our metadata format. There appears to be a bug with how pyarrow converts dates, however.
I have a dummy dataset, called test.csv (expand to view)
I have a user-generated schema that resolves to this:
However, the last column seems to be read by pyarrow as
date32[day]
, and won't cast otherwise:gives
Component(s)
Parquet, Python
The text was updated successfully, but these errors were encountered: