Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FOR encoding a sorted array doesn't preserve sortedness #400

Closed
robert3005 opened this issue Jun 21, 2024 · 1 comment · Fixed by #401
Closed

FOR encoding a sorted array doesn't preserve sortedness #400

robert3005 opened this issue Jun 21, 2024 · 1 comment · Fixed by #401

Comments

@robert3005
Copy link
Member

FOR encoding has an edge case where if you have values close to both ends of the type range you might not be able to represent encoded values in the same dtype since they will exceed max value. This can happen in particular when you have values close to MIN and MAX of the dtype. In order to solve this issue we currently perform wrapping subtraction so values too big end up being shifted to the end of the range.

Example for i5 dtype

[-10, -8, 10] -> subtract min (-10) -> [0, 2, -12]

Now the converted array is no longer sorted.

FOR encoding is done so we can shift all the signed integers into unsigned space which works with wrapping subtraction, however, it's unclear that it's the most practical choice. The cases where you'd run into the issue will not compress well further since you have values with very few leading zeros (1?) and we seem to lose some guarantees that are desired for our compression codecs. I suggest that we refuse to FOR and then Bitpack such arrays and let compressor choose a different encoding.

@robert3005
Copy link
Member Author

I think if we were to change the dtypes of FOR array from signed to unsigned then everything still works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant