-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Use signed arithmetic for frame of reference in DeltaBitPackEncoder #37939
Comments
pitrou
pushed a commit
that referenced
this issue
Oct 3, 2023
…oding DELTA_BINARY_PACKED (#37940) Closes #37939. ### What changes are included in this PR? This PR changes values used in the `DELTA_BINARY_PACKED` encoder to signed types. To gracefully handle overflow, arithmetic is still performed in the unsigned domain, but other operations such as computing the min and max deltas are done in the signed domain. Using signed types ensures the optimal number of bits is used when encoding the deltas, which was not the case before if any negative deltas were encountered (which is obviously common). ### Are these changes tested? I've included two tests that result in overflow. ### Are there any user-facing changes? No * Closes: #37939 Authored-by: seidl <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
JerAguilon
pushed a commit
to JerAguilon/arrow
that referenced
this issue
Oct 23, 2023
…en encoding DELTA_BINARY_PACKED (apache#37940) Closes apache#37939. ### What changes are included in this PR? This PR changes values used in the `DELTA_BINARY_PACKED` encoder to signed types. To gracefully handle overflow, arithmetic is still performed in the unsigned domain, but other operations such as computing the min and max deltas are done in the signed domain. Using signed types ensures the optimal number of bits is used when encoding the deltas, which was not the case before if any negative deltas were encountered (which is obviously common). ### Are these changes tested? I've included two tests that result in overflow. ### Are there any user-facing changes? No * Closes: apache#37939 Authored-by: seidl <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
loicalleyne
pushed a commit
to loicalleyne/arrow
that referenced
this issue
Nov 13, 2023
…en encoding DELTA_BINARY_PACKED (apache#37940) Closes apache#37939. ### What changes are included in this PR? This PR changes values used in the `DELTA_BINARY_PACKED` encoder to signed types. To gracefully handle overflow, arithmetic is still performed in the unsigned domain, but other operations such as computing the min and max deltas are done in the signed domain. Using signed types ensures the optimal number of bits is used when encoding the deltas, which was not the case before if any negative deltas were encountered (which is obviously common). ### Are these changes tested? I've included two tests that result in overflow. ### Are there any user-facing changes? No * Closes: apache#37939 Authored-by: seidl <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
dgreiss
pushed a commit
to dgreiss/arrow
that referenced
this issue
Feb 19, 2024
…en encoding DELTA_BINARY_PACKED (apache#37940) Closes apache#37939. ### What changes are included in this PR? This PR changes values used in the `DELTA_BINARY_PACKED` encoder to signed types. To gracefully handle overflow, arithmetic is still performed in the unsigned domain, but other operations such as computing the min and max deltas are done in the signed domain. Using signed types ensures the optimal number of bits is used when encoding the deltas, which was not the case before if any negative deltas were encountered (which is obviously common). ### Are these changes tested? I've included two tests that result in overflow. ### Are there any user-facing changes? No * Closes: apache#37939 Authored-by: seidl <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
abandy
added a commit
to abandy/arrow
that referenced
this issue
Mar 16, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the enhancement requested
The current implementation of
DeltaBitPackEncoder
uses unsigned arithmetic to handle possible overflow when calculating deltas (see here). This has unfortunate consequences when encoding small negative deltas. As an example, writing a vector with values{1, 0, -1, 0, 1, 0, -1, 0, 1}
produces the following output (starting at the delta binary header):The encoder uses a bit width of 32 for all values. If signed values are used instead, then the result is:
Here the encoder can use 2 bits per value. This can result in much smaller files, especially in cases where the logical type is less than 32 bits.
Component(s)
C++
The text was updated successfully, but these errors were encountered: