Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
apacheGH-32439: [Python] Fix off by one bug when chunking nested stru…
…cts (apache#37376) ### Rationale for this change See: apache#32439 ### What changes are included in this PR? During conversion from Python to Arrow, when a struct's child hits a capacity error and chunking is triggered, this can leave the Finish'd chunk in an invalid state since the struct's length does not match the length of its children. This change simply tries to Append the children first, and only if successful will Append the struct. This is safe because the order of Append'ing between the struct and its child is not specified. It is only specified that they must be consistent with each other. This is per: https://github.com/apache/arrow/blob/86b7a84c9317fa08222eb63f6930bbb54c2e6d0b/cpp/src/arrow/array/builder_nested.h#L507-L508 ### Are these changes tested? A unit test is added that would previously have an invalid data error. ``` > tab = pa.Table.from_pandas(df) pyarrow/tests/test_pandas.py:4970: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pyarrow/table.pxi:3788: in pyarrow.lib.Table.from_pandas return cls.from_arrays(arrays, schema=schema) pyarrow/table.pxi:3890: in pyarrow.lib.Table.from_arrays result.validate() pyarrow/table.pxi:3170: in pyarrow.lib.Table.validate check_status(self.table.Validate()) # ... FAILED pyarrow/tests/test_pandas.py::test_nested_chunking_valid - pyarrow.lib.ArrowInvalid: Column 0: In chunk 0: Invalid: List child array invalid: Invalid: Struct child array #0 has length smaller than expected for struct array (2 < 3) ``` NOTE: This unit test uses about 7GB of memory (max RSS) on my macbook pro. This might make CI challenging; I'm open to suggestions to limit it. ### Are there any user-facing changes? No * Closes: apache#32439 Lead-authored-by: Mike Lui <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
- Loading branch information