You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FOR encoding has an edge case where if you have values close to both ends of the type range you might not be able to represent encoded values in the same dtype since they will exceed max value. This can happen in particular when you have values close to MIN and MAX of the dtype. In order to solve this issue we currently perform wrapping subtraction so values too big end up being shifted to the end of the range.
FOR encoding is done so we can shift all the signed integers into unsigned space which works with wrapping subtraction, however, it's unclear that it's the most practical choice. The cases where you'd run into the issue will not compress well further since you have values with very few leading zeros (1?) and we seem to lose some guarantees that are desired for our compression codecs. I suggest that we refuse to FOR and then Bitpack such arrays and let compressor choose a different encoding.
The text was updated successfully, but these errors were encountered:
FOR encoding has an edge case where if you have values close to both ends of the type range you might not be able to represent encoded values in the same dtype since they will exceed max value. This can happen in particular when you have values close to MIN and MAX of the dtype. In order to solve this issue we currently perform wrapping subtraction so values too big end up being shifted to the end of the range.
Example for i5 dtype
Now the converted array is no longer sorted.
FOR encoding is done so we can shift all the signed integers into unsigned space which works with wrapping subtraction, however, it's unclear that it's the most practical choice. The cases where you'd run into the issue will not compress well further since you have values with very few leading zeros (1?) and we seem to lose some guarantees that are desired for our compression codecs. I suggest that we refuse to FOR and then Bitpack such arrays and let compressor choose a different encoding.
The text was updated successfully, but these errors were encountered: