optimize `inflate::core::init_tree` by precomputing reversed bits #132

connorskees · 2023-01-10T02:47:14Z

Somewhat inspired by #82. For small inputs, this function can take up more than 50% of the runtime. As noted in the linked issue, in absolute terms this function is not notable, accounting for at most a couple hundred microseconds, but it is still useful to optimize.

There is only one relevant benchmark in the existing suite:

// before
test oxide::decompress_short_lvl_1      ... bench:       2,966 ns/iter (+/- 67)

// after
test oxide::decompress_short_lvl_1      ... bench:       2,377 ns/iter (+/- 88)

A roughly 25% improvement (though in absolute terms only 500ns). On the image linked in #82, the total runtime improves by about 13%. This benchmark includes the total time to decode the PNG. In absolute terms the performance increase is about 0.4ms on the server I use for benchmarking.

I'm a bit skeptical about the precomputed reversed bits table, and have left a GitHub comment below discussing this.

This PR in its current form should close #82.

Potential future work

There likely isn't much benefit to spending more time on this function. If one really wanted to squeeze more performance out of this, doubling the size of the precomputed reversed bits lookup table should have some impact, though I haven't measured the performance of making such a change. The while rev_code < FAST_LOOKUP_SIZE loop is extremely tight, but I haven't looked at any potential savings there. The for i in 1..16 loop looks extremely similar to prefix-sum which can be optimized with SIMD, but such an optimization would likely require unsafe use of intrinsics.

bug appeared when running `cargo t --all`. the solution to use `.sum()` on the iterator makes it slightly slower, so just completely reverting

miniz_oxide/src/inflate/core.rs

oyvindln · 2023-01-10T22:20:38Z

@notgull does it look fine to you as well now?

notgull · 2023-01-10T23:14:44Z

Yeah, it looks good to me. This is basically how I'd do it.

oyvindln · 2023-01-10T23:19:53Z

aight. let me know when you want a new release. Can wait a bit if you have more work on the way.

connorskees added 2 commits January 9, 2023 22:08

optimize inflate::core::init_tree

db22308

revert total_symbols loop optimization

1401051

bug appeared when running `cargo t --all`. the solution to use `.sum()` on the iterator makes it slightly slower, so just completely reverting

connorskees changed the title ~~optimize inflate::core::init_tree~~ optimize inflate::core::init_tree by precomputing reversed bits Jan 10, 2023

connorskees commented Jan 10, 2023

View reviewed changes

miniz_oxide/src/inflate/core.rs Outdated Show resolved Hide resolved

notgull reviewed Jan 10, 2023

View reviewed changes

miniz_oxide/src/inflate/core.rs Outdated Show resolved Hide resolved

notgull mentioned this pull request Jan 10, 2023

feat: Optimize core inflation tree #133

Closed

use const fn to generate table

3d03f9a

oyvindln merged commit bf66097 into Frommi:master Jan 10, 2023

oyvindln mentioned this pull request May 17, 2024

Remove lookup table from rustc-std builds #152

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize `inflate::core::init_tree` by precomputing reversed bits #132

optimize `inflate::core::init_tree` by precomputing reversed bits #132

connorskees commented Jan 10, 2023 •

edited

Loading

oyvindln commented Jan 10, 2023

notgull commented Jan 10, 2023

oyvindln commented Jan 10, 2023

optimize inflate::core::init_tree by precomputing reversed bits #132

optimize inflate::core::init_tree by precomputing reversed bits #132

Conversation

connorskees commented Jan 10, 2023 • edited Loading

Potential future work

oyvindln commented Jan 10, 2023

notgull commented Jan 10, 2023

oyvindln commented Jan 10, 2023

optimize `inflate::core::init_tree` by precomputing reversed bits #132

optimize `inflate::core::init_tree` by precomputing reversed bits #132

connorskees commented Jan 10, 2023 •

edited

Loading