-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-37829: [Java] Avoid resizing data buffer twice when appending variable length vectors #37844
Conversation
…g variable length vectors
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 0f94eb6. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
…g variable length vectors (apache#37844) ### Rationale for this change This change prevents avoidable `OversizedAllocationException`s when appending a variable-length vector with many small elements to a variable-length vector with a few large elements. When appending variable-length vectors, `VectorAppender` iteratively doubles the offset and validity buffers until they can accommodate the combined elements. In the previous implementation, each iteration would also double the data buffer's capacity. This behavior is appropriate for vectors of fixed-size types but can result in an oversized data buffers when appending many small elements to a variable length vector with a large data buffer. ### What changes are included in this PR? The new behavior only resizes the offset and validity buffers when resizing the target vector's buffers to ensure they can hold the total number of combined elements. The data buffer is resized based on the total required data size of the combined elements. ### Are these changes tested? Yes. I added a unit test that results in an `OversizedAllocationException` when run against the previous version of the code. ### Are there any user-facing changes? No. * Closes: apache#37829 Authored-by: hrishisd <[email protected]> Signed-off-by: David Li <[email protected]>
…g variable length vectors (apache#37844) ### Rationale for this change This change prevents avoidable `OversizedAllocationException`s when appending a variable-length vector with many small elements to a variable-length vector with a few large elements. When appending variable-length vectors, `VectorAppender` iteratively doubles the offset and validity buffers until they can accommodate the combined elements. In the previous implementation, each iteration would also double the data buffer's capacity. This behavior is appropriate for vectors of fixed-size types but can result in an oversized data buffers when appending many small elements to a variable length vector with a large data buffer. ### What changes are included in this PR? The new behavior only resizes the offset and validity buffers when resizing the target vector's buffers to ensure they can hold the total number of combined elements. The data buffer is resized based on the total required data size of the combined elements. ### Are these changes tested? Yes. I added a unit test that results in an `OversizedAllocationException` when run against the previous version of the code. ### Are there any user-facing changes? No. * Closes: apache#37829 Authored-by: hrishisd <[email protected]> Signed-off-by: David Li <[email protected]>
…g variable length vectors (apache#37844) ### Rationale for this change This change prevents avoidable `OversizedAllocationException`s when appending a variable-length vector with many small elements to a variable-length vector with a few large elements. When appending variable-length vectors, `VectorAppender` iteratively doubles the offset and validity buffers until they can accommodate the combined elements. In the previous implementation, each iteration would also double the data buffer's capacity. This behavior is appropriate for vectors of fixed-size types but can result in an oversized data buffers when appending many small elements to a variable length vector with a large data buffer. ### What changes are included in this PR? The new behavior only resizes the offset and validity buffers when resizing the target vector's buffers to ensure they can hold the total number of combined elements. The data buffer is resized based on the total required data size of the combined elements. ### Are these changes tested? Yes. I added a unit test that results in an `OversizedAllocationException` when run against the previous version of the code. ### Are there any user-facing changes? No. * Closes: apache#37829 Authored-by: hrishisd <[email protected]> Signed-off-by: David Li <[email protected]>
…g variable length vectors (apache#37844) ### Rationale for this change This change prevents avoidable `OversizedAllocationException`s when appending a variable-length vector with many small elements to a variable-length vector with a few large elements. When appending variable-length vectors, `VectorAppender` iteratively doubles the offset and validity buffers until they can accommodate the combined elements. In the previous implementation, each iteration would also double the data buffer's capacity. This behavior is appropriate for vectors of fixed-size types but can result in an oversized data buffers when appending many small elements to a variable length vector with a large data buffer. ### What changes are included in this PR? The new behavior only resizes the offset and validity buffers when resizing the target vector's buffers to ensure they can hold the total number of combined elements. The data buffer is resized based on the total required data size of the combined elements. ### Are these changes tested? Yes. I added a unit test that results in an `OversizedAllocationException` when run against the previous version of the code. ### Are there any user-facing changes? No. * Closes: apache#37829 Authored-by: hrishisd <[email protected]> Signed-off-by: David Li <[email protected]>
Rationale for this change
This change prevents avoidable
OversizedAllocationException
s when appending a variable-length vector with many small elements to a variable-length vector with a few large elements.When appending variable-length vectors,
VectorAppender
iteratively doubles the offset and validity buffers until they can accommodate the combined elements. In the previous implementation, each iteration would also double the data buffer's capacity. This behavior is appropriate for vectors of fixed-size types but can result in an oversized data buffers when appending many small elements to a variable length vector with a large data buffer.What changes are included in this PR?
The new behavior only resizes the offset and validity buffers when resizing the target vector's buffers to ensure they can hold the total number of combined elements.
The data buffer is resized based on the total required data size of the combined elements.
Are these changes tested?
Yes. I added a unit test that results in an
OversizedAllocationException
when run against the previous version of the code.Are there any user-facing changes?
No.