Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster parquet DictEncoder (~20%) #2123

Merged
merged 5 commits into from
Jul 29, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Jul 21, 2022

Which issue does this PR close?

Part of #1764

Rationale for this change

The existing implementation is complex, and slower

What changes are included in this PR?

Gives the encoder the same treatment as #1861, switching to using ahash and hashbrown.

Are there any user-facing changes?

No

@tustvold
Copy link
Contributor Author

tustvold commented Jul 21, 2022

Running benchmarks with just the change to ahash show no significant performance change. This is not entirely surprising as the current implementation uses crc32 which is very cheap to compute (although not DOS resistant).

The change to hashbrown nets a non-trivial return where value encoding is the major bottleneck, this diminishes as additional overheads from nulls, lists, etc... take effect.

write_batch primitive/4096 values primitive                                                                             
                        time:   [1.5325 ms 1.5331 ms 1.5338 ms]
                        thrpt:  [115.02 MiB/s 115.07 MiB/s 115.12 MiB/s]
                 change:
                        time:   [-20.677% -20.632% -20.590%] (p = 0.00 < 0.05)
                        thrpt:  [+25.929% +25.995% +26.068%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Benchmarking write_batch primitive/4096 values primitive non-null: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.5s, enable flat sampling, or reduce sample count to 50.
write_batch primitive/4096 values primitive non-null                                                                             
                        time:   [1.4838 ms 1.4847 ms 1.4857 ms]
                        thrpt:  [116.44 MiB/s 116.52 MiB/s 116.59 MiB/s]
                 change:
                        time:   [-12.080% -12.017% -11.954%] (p = 0.00 < 0.05)
                        thrpt:  [+13.577% +13.659% +13.739%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
write_batch primitive/4096 values bool                                                                            
                        time:   [111.01 us 111.09 us 111.19 us]
                        thrpt:  [10.224 MiB/s 10.233 MiB/s 10.240 MiB/s]
                 change:
                        time:   [-0.8794% -0.6831% -0.4488%] (p = 0.00 < 0.05)
                        thrpt:  [+0.4508% +0.6878% +0.8872%]
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
write_batch primitive/4096 values bool non-null                                                                            
                        time:   [52.931 us 53.012 us 53.094 us]
                        thrpt:  [21.411 MiB/s 21.444 MiB/s 21.477 MiB/s]
                 change:
                        time:   [-2.2177% -2.1085% -1.9913%] (p = 0.00 < 0.05)
                        thrpt:  [+2.0318% +2.1539% +2.2680%]
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  5 (5.00%) high mild
  10 (10.00%) high severe
write_batch primitive/4096 values string                                                                            
                        time:   [891.20 us 891.52 us 891.88 us]
                        thrpt:  [89.239 MiB/s 89.275 MiB/s 89.306 MiB/s]
                 change:
                        time:   [-8.4838% -8.4391% -8.3955%] (p = 0.00 < 0.05)
                        thrpt:  [+9.1650% +9.2170% +9.2703%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Benchmarking write_batch primitive/4096 values string non-null: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60.
write_batch primitive/4096 values string non-null                                                                             
                        time:   [1.0208 ms 1.0213 ms 1.0218 ms]
                        thrpt:  [77.889 MiB/s 77.931 MiB/s 77.970 MiB/s]
                 change:
                        time:   [+0.0730% +0.1746% +0.2545%] (p = 0.00 < 0.05)
                        thrpt:  [-0.2538% -0.1743% -0.0730%]
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

Benchmarking write_batch nested/4096 values primitive list: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.8s, enable flat sampling, or reduce sample count to 50.
write_batch nested/4096 values primitive list                                                                             
                        time:   [1.9798 ms 2.0064 ms 2.0368 ms]
                        thrpt:  [80.409 MiB/s 81.627 MiB/s 82.725 MiB/s]
                 change:
                        time:   [+0.9435% +1.8832% +3.0013%] (p = 0.00 < 0.05)
                        thrpt:  [-2.9139% -1.8484% -0.9347%]
                        Change within noise threshold.
Found 19 outliers among 100 measurements (19.00%)
  1 (1.00%) high mild
  18 (18.00%) high severe
write_batch nested/4096 values primitive list non-null                                                                             
                        time:   [2.4385 ms 2.4696 ms 2.5038 ms]
                        thrpt:  [76.896 MiB/s 77.959 MiB/s 78.952 MiB/s]
                 change:
                        time:   [-0.1096% +1.1302% +2.5102%] (p = 0.10 > 0.05)
                        thrpt:  [-2.4488% -1.1176% +0.1097%]
                        No change in performance detected.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jul 21, 2022
@tustvold tustvold changed the title Faster parquet DictEncoder Faster parquet DictEncoder (~20%) Jul 21, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jul 21, 2022

Codecov Report

Attention: Patch coverage is 90.24390% with 8 lines in your changes missing coverage. Please review.

Project coverage is 82.51%. Comparing base (5e3facf) to head (a07d513).
Report is 2352 commits behind head on master.

Files with missing lines Patch % Lines
parquet/src/encodings/encoding/dict_encoder.rs 91.37% 5 Missing ⚠️
parquet/src/util/interner.rs 90.90% 2 Missing ⚠️
parquet/src/encodings/encoding/mod.rs 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2123      +/-   ##
==========================================
- Coverage   83.71%   82.51%   -1.20%     
==========================================
  Files         225      240      +15     
  Lines       59567    62234    +2667     
==========================================
+ Hits        49865    51355    +1490     
- Misses       9702    10879    +1177     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -49,6 +50,7 @@ serde_json = { version = "1.0", default-features = false, features = ["std"], op
rand = { version = "0.8", default-features = false, features = ["std", "std_rng"] }
futures = { version = "0.3", default-features = false, features = ["std"], optional = true }
tokio = { version = "1.0", optional = true, default-features = false, features = ["macros", "fs", "rt", "io-util"] }
hashbrown = { version = "0.12", default-features = false }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a feature 'inline-more" which is enabled by default in hashbrown which gives sometimes a bit better performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By disabling this here, we can delegate that decision downstream


impl<T: DataType> Encoder<T> for DictEncoder<T> {
fn put(&mut self, values: &[T::T]) -> Result<()> {
for i in values {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's a bottleneck, it might be faster to compute the hashes for values in one go (i.e. vectorized)?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me but I am concerned about the new dependencies as I believe some people use parquet after compiling to WASM or on embedded devices.

I am curious what other maintainers think too

cc @sunchao @nevi-me @viirya @HaoYang670

@@ -30,6 +30,7 @@ edition = "2021"
rust-version = "1.57"

[dependencies]
ahash = "0.7"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem to be new dependencies (if optional features are not enabled)


state: ahash::RandomState,

/// Used to provide a lookup from value to unique value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the replication of this pattern (maybe now in three places?) perhaps we can factor it into its own structure, mostly for readability as the use of HashMap to implement a HashSet takes some thought to totally grok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider this, but I was unsure where to put it. It can't live in arrow, as parquet needs to compile without arrow, but aside from creating a new crate I wasn't really sure where to put it...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tustvold
Copy link
Contributor Author

I'm going to get this in as I need it for #1764, we have time until the next release to address any issues.

@tustvold tustvold merged commit 6ce4c4e into apache:master Jul 29, 2022
@ursabot
Copy link

ursabot commented Jul 29, 2022

Benchmark runs are scheduled for baseline = 985760f and contender = 6ce4c4e. 6ce4c4e is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

use std::hash::Hash;

/// Storage trait for [`Interner`]
pub trait Storage {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants