You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.
This is an interesting case: pyarrow writes arrows' nanosecond precision as parquet's logical type microseconds, and divides the number accordingly. So, the file ends up with
parquet's logical type: microseconds
arrow's logical type (in the schema's metadata): nanoseconds
The bug on our end is that we ignore parquet's logical type when deserializing, which caused us reading parquet's microseconds into arrow's nanoseconds without correctly converting them.
I had to read it twice, but I think I understand. :D So there are two logical types in a parquet file? The one written and the destination's logical type.
simple dataset
If we write and read with pyarrow the timestamp is correct. If we read and write with arrow2 the timestamp is also correct.
The text was updated successfully, but these errors were encountered: