Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speedup take_byte_view kernel #6167

Closed
Tracked by #6163
a10y opened this issue Aug 1, 2024 · 1 comment · Fixed by #6168
Closed
Tracked by #6163

speedup take_byte_view kernel #6167

a10y opened this issue Aug 1, 2024 · 1 comment · Fixed by #6168
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@a10y
Copy link
Contributor

a10y commented Aug 1, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Related to #6163

The take kernel for StringView and BinaryView is implemented using GenericByteViewArray::new() which is a safe constructor that does full utf8 validation for all non-inlined strings in the buffers. This is kind of silly, given we're not even constructing a new array, just copying the existing buffers arrays that are known to contain well-formed utf8 values.

In Vortex, I'm seeing this show up in the profiles for TPC-H queries as one of the more prominent items, in many cases causing a regression of up to 50% over Utf8.

image

Describe the solution you'd like

The take_byte kernel for Utf8/Binary arrays constructs an ArrayData instance and does not perform Utf8 validation, since we're taking from an already known-good Utf8 array.

Describe alternatives you've considered

Additional context

@a10y a10y added the enhancement Any new improvement worthy of a entry in the changelog label Aug 1, 2024
@alamb alamb added the arrow Changes to the arrow crate label Aug 31, 2024
@alamb
Copy link
Contributor

alamb commented Aug 31, 2024

label_issue.py automatically added labels {'arrow'} from #6168

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants