Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Fixed growable of dictionaries negative keys #582

Merged
merged 4 commits into from
Nov 7, 2021

Conversation

ritchie46
Copy link
Collaborator

@ritchie46 ritchie46 commented Nov 7, 2021

Pyarrow creates dictionary keys with negative integers, that are masked out by the validity bitmap.

This PR fixes conversion from those negative integers to usize, they are mapped to 0, and that is ok, because the values are masked.

Related downstream issue: pola-rs/polars#1686

Additionally the scalar API also read masked out values, so was incorrect.

@ghuls
Copy link
Contributor

ghuls commented Nov 7, 2021

It indeed can take a while before it is fixed in pyarrow, as pyarrow 6.0.0 is released a few days ago.

@jorgecarleitao
Copy link
Owner

wow, this looks like a vulnerability to me on the c++ side. if someone accesses those keys, they may read out of bounds if they do not check bounds (which some implementations do). I think this should be addressed in the pyarrow implementation: dictionary keys and offset deltas should never be negative. I think that our implementation is correct: it is the user responsibility to ensure that the keys are positive, even if the type is a signed integer (for JVM reasons).

I will follow up on this on the Arrow side.

@ritchie46
Copy link
Collaborator Author

ritchie46 commented Nov 7, 2021

What do you think about a temporary patch? ^_^. Before a new pyarrow is shipped to all pandas code that uses this, I guess that will take some months. 🙈

btw, the fix for reading null value keys in the scalar API should still be pushed.

Copy link
Owner

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

I was wrong, this is valid in arrow and it is a bug here: we should accept negative keys if they are nulls. Null values can have any (initialized) value.

Left two minor comments, but otherwise ready to sail.

src/array/dictionary/mod.rs Show resolved Hide resolved
src/array/growable/dictionary.rs Show resolved Hide resolved
tests/it/array/growable/dictionary.rs Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Nov 7, 2021

Codecov Report

Merging #582 (0517b95) into main (f146097) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #582      +/-   ##
==========================================
+ Coverage   78.99%   79.01%   +0.01%     
==========================================
  Files         399      399              
  Lines       24752    24765      +13     
==========================================
+ Hits        19554    19567      +13     
  Misses       5198     5198              
Impacted Files Coverage Δ
src/array/dictionary/mod.rs 58.33% <100.00%> (+1.43%) ⬆️
src/array/growable/dictionary.rs 69.56% <100.00%> (ø)
tests/it/array/growable/dictionary.rs 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f146097...0517b95. Read the comment docs.

@jorgecarleitao jorgecarleitao merged commit 6489f59 into jorgecarleitao:main Nov 7, 2021
@jorgecarleitao jorgecarleitao changed the title fix growable negative keys Fixed growable of dictionaries negative keys Nov 7, 2021
@jorgecarleitao jorgecarleitao added the bug Something isn't working label Nov 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants