Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convince the compiler to auto-vectorize the range check in parquet DictionaryBuffer #4453

Conversation

jhorstmann
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

I noticed an unexpected hotspot when profiling parquet benchmarks for dictionary encoded arrays. The range checks that verify keys are in bounds take a relatively large amount of time. There was already a comment suggesting to use simd instructions, and in my opinion the compiler should have been able to auto-vectorize the existing code, but it took some experimentation to get the compiler generate more efficient code.

Benchmark results compared to master branch on my laptop

(using rust 1.70, there was no change in generated code in earlier versions)

$ RUSTFLAGS="-Ctarget-cpu=skylake" cargo bench --features arrow,test_common,experimental --bench arrow_reader -- --baseline master StringDictionary

arrow_array_reader/StringDictionary/dictionary encoded, mandatory, no NULLs
                        time:   [15.197 µs 15.334 µs 15.520 µs]
                        change: [-33.272% -32.922% -32.535%] (p = 0.00 < 0.05)
                        Performance has improved.
arrow_array_reader/StringDictionary/dictionary encoded, optional, no NULLs
                        time:   [16.166 µs 16.213 µs 16.268 µs]
                        change: [-31.508% -31.347% -31.181%] (p = 0.00 < 0.05)
                        Performance has improved.
arrow_array_reader/StringDictionary/dictionary encoded, optional, half NULLs
                        time:   [36.581 µs 36.735 µs 36.932 µs]
                        change: [-11.060% -10.605% -10.166%] (p = 0.00 < 0.05)
                        Performance has improved.

What changes are included in this PR?

basically a one-line change

Are there any user-facing changes?

no

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jun 27, 2023
@tustvold tustvold merged commit c1656ff into apache:master Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants