-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memoization of RGBA palette when expanding palette indices into RGB8 or RGBA8 #462
Commits on Jan 29, 2024
-
Extract a separate
palette.rs
module.This commit moves `expand_paletted_into_rgb8` and `expand_paletted_into_rgba8` (and their unit tests) into a separate `transform/palette.rs` module. This prepares room for encapsulating extra complexity in this module in follow-up commits, where we will start to precompute and memoize some data when creating a `TransformFn`. This commit just moves the code around - it should have no impact on correctness or performance.
Configuration menu - View commit details
-
Copy full SHA for 4e08f2f - Browse repository at this point
Copy the full SHA 4e08f2fView commit details -
Fix constants used in palette benchmarks.
The `PLTE` chunk's size should be a multiple of 3 (since it contains RGB entries - 3 bytes per entry). Additionally, taking 10000 samples in the `bench_create_fn` benchmarks is a bit excessive after memoization.
Configuration menu - View commit details
-
Copy full SHA for 89153e6 - Browse repository at this point
Copy the full SHA 89153e6View commit details -
Change
TransformFn
to allow memoization in the futureThis commit changes the `TransformFn` type alias from `fn(...)` into `Box<dyn Fn(...)>`. This allows the `TransformFn` to have store some precomputer, memoized state that we plan to add in follow-up commits. In theory this commit may have negative performance impact, but in the grand scheme of things it disappears into the measurement noise. In particular, when there is no state, then `Box` shouldn't allocate.
Configuration menu - View commit details
-
Copy full SHA for 6b17e13 - Browse repository at this point
Copy the full SHA 6b17e13View commit details
Commits on Jan 31, 2024
-
Memoize combined PLTE+trNS lookup table.
Before this commit `expand_paletted_into_rgba8` would: * Perform 2 lookups - `palette.get(i)` and `trns.get(i)` * Check via `unwrap_or` if `i` was within the bounds of `palette`/`trns` This commit introduces `create_rgba_palette` which combines `palette` and `trns` into a fixed-size `[[u8;4]; 256]` look-up table (called `rgba_palette` in the code). After this commit `expand_paletted_into_rgba8` only needs to perform a single look-up and doesn't need to check the bounds. This helps to improve the expansion time by 60+%: - expand_paletted(exec)/trns=yes/src_bits=4/src_size=5461: [-60.208% -60.057% -59.899%] (p = 0.00 < 0.05) - expand_paletted(exec)/trns=yes/src_bits=8/src_size=5461: [-77.520% -77.407% -77.301%] (p = 0.00 < 0.05) `expand_paletted_into_rgb8` performs only a single lookup before and after this commit, but avoiding bounds checks still helps to improve the expansion time by ~12%: - expand_paletted(exec)/trns=no/src_bits=4/src_size=5461: [-12.357% -12.005% -11.664%] (p = 0.00 < 0.05) - expand_paletted(exec)/trns=no/src_bits=8/src_size=5461: [-13.135% -12.584% -12.092%] (p = 0.00 < 0.05) Understandably, this commit regresses the time of `create_transform_fn`. Future commits will reduce this regression 2-4 times: - expand_paletted(ctor)/plte=256/trns=256: [+3757.2% +3763.8% +3770.5%] (p = 0.00 < 0.05) - expand_paletted(ctor)/plte=224/trns=32: [+3807.3% +3816.2% +3824.6%] (p = 0.00 < 0.05) - expand_paletted(ctor)/plte=16/trns=1: [+1672.0% +1675.0% +1678.1%] (p = 0.00 < 0.05)
Configuration menu - View commit details
-
Copy full SHA for d24e9b3 - Browse repository at this point
Copy the full SHA d24e9b3View commit details -
Copy 4 bytes at a time when expanding palette into rgb8.
Before this commit `expand_into_rgb8` would copy 3 bytes at a time into the output. After this commit it copies 4 bytes at a time (possibly cloberring pixels that will be populated during the next iteration - this is ok). This improved the performance as follows: expand_paletted(exec)/trns=no/src_bits=8/src_size=5461 time: [-23.852% -23.593% -23.319%] (p = 0.00 < 0.05)
Configuration menu - View commit details
-
Copy full SHA for fbd33df - Browse repository at this point
Copy the full SHA fbd33dfView commit details -
Copy 4 bytes at a time in
create_rgba_palette
This improves the performance as follows: - expand_paletted(ctor)/plte=256/trns=256 [-40.581% -40.396% -40.211%] (p = 0.00 < 0.05) - expand_paletted(ctor)/plte=224/trns=32 [-24.070% -23.840% -23.592%] (p = 0.00 < 0.05) Small palettes are mostly unaffected: - expand_paletted(ctor)/plte=16/trns=1 [-0.2525% +0.0338% +0.3239%] (p = 0.81 > 0.05)
Configuration menu - View commit details
-
Copy full SHA for 57b5d54 - Browse repository at this point
Copy the full SHA 57b5d54View commit details