-
Notifications
You must be signed in to change notification settings - Fork 803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MapArray columns don't handle null values correctly #2484
Comments
The above reproducer test can be found in the fork: |
This is a duplicate of #1699, I will attempt to fix this in the coming days. It has been on my radar for a while |
I had a look at that issue and thought it was more about MapArray nested in another Array rather than at the top level. Still, if you can get it to work as expected, that'd be great! |
Yeah the ticket just covers what was clearly incorrect when I had a brief look at it, there are likely other issues with the logic. I fully intend to replace it wholesale with the ListArrayReader logic which is better tested, and actually correct (or at the very least more plausibly correct) 😅 |
Describe the bug
Given a parquet file with the following nested map content:
Reading the keys values for the third record gives the wrong data. We expect
["three", "four", "five", "six", "seven"]
but get[null, null, "three", "four", "five"]
.To Reproduce
The following test will reproduce this:
Expected behavior
Reading the keys values for the third record should give
["three", "four", "five", "six", "seven"]
.Additional context
Debugging the code shows that the offsets array for the MapArray's key column is this
[0, 0, 0, 5, 9, 13]
when I think it should be this[0, 5, 9, 13, 16, 21]
. It looks like the issue is inparquet::arrow::array_reader::map_array::MapArrayReaderconsume_batch
where the repetition and definition levels are used to determine the data offsets (starting at parquet\src\arrow\array_reader\map_array.rs line 123).The text was updated successfully, but these errors were encountered: