Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix deflate performance #18

Open
folkertdev opened this issue Feb 1, 2024 · 3 comments
Open

fix deflate performance #18

folkertdev opened this issue Feb 1, 2024 · 3 comments

Comments

@folkertdev
Copy link
Collaborator

we are consistently ~10% slower than zlib-ng on deflate. This fluctuates with the different compression levels, but currently none of then are on-par with zlib-ng.

There is so far no obvious reason for this slowdown, so it's likely a "death by a thousand papercuts" sort of thing.

@folkertdev
Copy link
Collaborator Author

as a data point, commit 8048180

Benchmark 1 (29 runs): cargo run --release --example compress 1 ng silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           175ms ± 8.58ms     168ms …  217ms          1 ( 3%)        0%
  peak_rss           26.8MB ± 87.3KB    26.6MB … 27.0MB          0 ( 0%)        0%
  cpu_cycles          515M  ± 6.88M      505M  …  529M           0 ( 0%)        0%
  instructions        744M  ± 35.3K      744M  …  744M           1 ( 3%)        0%
  cache_references   11.3M  ± 1.18M     9.08M  … 14.1M           0 ( 0%)        0%
  cache_misses       2.25M  ± 66.4K     2.12M  … 2.40M           1 ( 3%)        0%
  branch_misses      4.13M  ± 15.1K     4.10M  … 4.17M           0 ( 0%)        0%
Benchmark 2 (27 runs): cargo run --release --example compress 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           191ms ± 4.14ms     187ms …  207ms          1 ( 4%)        💩+  9.2% ±  2.1%
  peak_rss           26.7MB ± 93.5KB    26.6MB … 26.9MB          0 ( 0%)          -  0.3% ±  0.2%
  cpu_cycles          580M  ± 10.7M      570M  …  619M           1 ( 4%)        💩+ 12.7% ±  0.9%
  instructions        873M  ± 31.4K      873M  …  873M           0 ( 0%)        💩+ 17.3% ±  0.0%
  cache_references   11.4M  ± 1.16M     9.31M  … 13.6M           0 ( 0%)          +  0.9% ±  5.6%
  cache_misses       2.33M  ± 74.4K     2.22M  … 2.49M           0 ( 0%)        💩+  3.6% ±  1.7%
  branch_misses      4.39M  ± 42.1K     4.34M  … 4.52M           1 ( 4%)        💩+  6.2% ±  0.4%

@bjorn3
Copy link
Collaborator

bjorn3 commented Oct 14, 2024

#223 should have helped a fair bit.

@folkertdev
Copy link
Collaborator Author

yes, but there is still a significant gap (note that that benchmark runs cargo, that's why the numbers are higher)

Benchmark 1 (68 runs): target/release/examples/blogpost-compress 1 ng silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          73.9ms ± 1.02ms    72.7ms … 80.2ms          3 ( 4%)        0%
  peak_rss           26.6MB ± 42.5KB    26.5MB … 26.6MB          8 (12%)        0%
  cpu_cycles          264M  ± 1.79M      261M  …  270M           2 ( 3%)        0%
  instructions        460M  ±  266       460M  …  460M           1 ( 1%)        0%
  cache_references   18.9M  ±  233K     18.5M  … 19.7M           5 ( 7%)        0%
  cache_misses        507K  ± 85.4K      408K  …  949K           2 ( 3%)        0%
  branch_misses      3.32M  ± 5.96K     3.30M  … 3.33M           0 ( 0%)        0%
Benchmark 2 (64 runs): target/release/examples/blogpost-compress 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          78.6ms ±  990us    77.4ms … 83.3ms          5 ( 8%)        💩+  6.5% ±  0.5%
  peak_rss           26.7MB ± 65.8KB    26.6MB … 26.7MB          0 ( 0%)          +  0.3% ±  0.1%
  cpu_cycles          287M  ± 4.11M      284M  …  310M           9 (14%)        💩+  8.8% ±  0.4%
  instructions        633M  ±  276       633M  …  633M           0 ( 0%)        💩+ 37.7% ±  0.0%
  cache_references   19.9M  ±  200K     19.7M  … 20.6M           6 ( 9%)        💩+  5.5% ±  0.4%
  cache_misses        440K  ± 95.8K      327K  …  778K           1 ( 2%)        ⚡- 13.3% ±  6.1%
  branch_misses      3.08M  ± 10.8K     3.06M  … 3.11M           9 (14%)        ⚡-  7.3% ±  0.1%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants