We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is your feature request related to a problem or challenge? Please describe what you are trying to do. Part of #5374
@XiangpengHao implemented optimized row format --> ByteView (StringView / BinaryView) encoding/decoding in #5945 / #6044
It also adds benchmarks so we can test🎉
However, as mentioned in https://github.com/apache/arrow-rs/pull/6044/files#r1676803119 the output array in #6044 will have both short and long strings even though only the long strings are used in the view definition (the short strings are included to do fast utf8 validation)
This results in more memory used for the output array than neccessary
Describe the solution you'd like
reduce memory required by output array
Describe alternatives you've considered One idea is to use a separate utf8 validation buffer for short strings, similarly to
arrow-rs/parquet/src/arrow/array_reader/byte_view_array.rs
Lines 623 to 668 in 0002b4d
Additional context
The text was updated successfully, but these errors were encountered:
StringViewArray
BinaryViewArray
StringView
BinaryView
No branches or pull requests
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Part of #5374
@XiangpengHao implemented optimized row format --> ByteView (StringView / BinaryView) encoding/decoding in #5945 / #6044
It also adds benchmarks so we can test🎉
However, as mentioned in https://github.com/apache/arrow-rs/pull/6044/files#r1676803119 the output array in #6044 will have both short and long strings even though only the long strings are used in the view definition (the short strings are included to do fast utf8 validation)
This results in more memory used for the output array than neccessary
Describe the solution you'd like
reduce memory required by output array
Describe alternatives you've considered
One idea is to use a separate utf8 validation buffer for short strings, similarly to
arrow-rs/parquet/src/arrow/array_reader/byte_view_array.rs
Lines 623 to 668 in 0002b4d
Additional context
The text was updated successfully, but these errors were encountered: