You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug, including details regarding any error messages, version, and platform.
When looking into #45073 I found that Arrow doesn't raise an error when reading data with invalid repetition levels into Arrow list arrays.
The encryption test files included an int64 list column with leaf-values equal to i * 1,000,000,000,000, where i is the leaf-value index. The repetition level was set to 1 for even leaf indices and 0 for odd indices, meaning the first repetition level was 1 which is invalid. This file is read by PyArrow without any error being raised though, and the first leaf value (0) is skipped:
pyarrow.Table
int64_field: list<int64_field: int64 not null> not null
child 0, int64_field: int64 not null
----
int64_field: [[[1000000000000,2000000000000],[3000000000000,4000000000000],...,[97000000000000,98000000000000],[99000000000000]]]
I wouldn't expect an error to be raised if reading the raw values and repetition levels with the lower-level Parquet C++ API, but think reading this data as an Arrow list should raise an error.
Component(s)
C++, Parquet
The text was updated successfully, but these errors were encountered:
Describe the bug, including details regarding any error messages, version, and platform.
When looking into #45073 I found that Arrow doesn't raise an error when reading data with invalid repetition levels into Arrow list arrays.
The encryption test files included an int64 list column with leaf-values equal to i * 1,000,000,000,000, where i is the leaf-value index. The repetition level was set to 1 for even leaf indices and 0 for odd indices, meaning the first repetition level was 1 which is invalid. This file is read by PyArrow without any error being raised though, and the first leaf value (0) is skipped:
I wouldn't expect an error to be raised if reading the raw values and repetition levels with the lower-level Parquet C++ API, but think reading this data as an Arrow list should raise an error.
Component(s)
C++, Parquet
The text was updated successfully, but these errors were encountered: