Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memoization of RGBA palette when expanding palette indices into RGB8 or RGBA8 #462

Merged
merged 6 commits into from
Feb 2, 2024

Commits on Jan 29, 2024

  1. Extract a separate palette.rs module.

    This commit moves `expand_paletted_into_rgb8` and
    `expand_paletted_into_rgba8` (and their unit tests) into a separate
    `transform/palette.rs` module.  This prepares room for encapsulating
    extra complexity in this module in follow-up commits, where we will
    start to precompute and memoize some data when creating a `TransformFn`.
    
    This commit just moves the code around - it should have no impact on
    correctness or performance.
    anforowicz committed Jan 29, 2024
    Configuration menu
    Copy the full SHA
    4e08f2f View commit details
    Browse the repository at this point in the history
  2. Fix constants used in palette benchmarks.

    The `PLTE` chunk's size should be a multiple of 3 (since it contains RGB
    entries - 3 bytes per entry).
    
    Additionally, taking 10000 samples in the `bench_create_fn` benchmarks
    is a bit excessive after memoization.
    anforowicz committed Jan 29, 2024
    Configuration menu
    Copy the full SHA
    89153e6 View commit details
    Browse the repository at this point in the history
  3. Change TransformFn to allow memoization in the future

    This commit changes the `TransformFn` type alias from `fn(...)` into
    `Box<dyn Fn(...)>`.  This allows the `TransformFn` to have store some
    precomputer, memoized state that we plan to add in follow-up commits.
    
    In theory this commit may have negative performance impact, but in the
    grand scheme of things it disappears into the measurement noise.  In
    particular, when there is no state, then `Box` shouldn't allocate.
    anforowicz committed Jan 29, 2024
    Configuration menu
    Copy the full SHA
    6b17e13 View commit details
    Browse the repository at this point in the history

Commits on Jan 31, 2024

  1. Memoize combined PLTE+trNS lookup table.

    Before this commit `expand_paletted_into_rgba8` would:
    
    * Perform 2 lookups - `palette.get(i)` and `trns.get(i)`
    * Check via `unwrap_or` if `i` was within the bounds of `palette`/`trns`
    
    This commit introduces `create_rgba_palette` which combines `palette`
    and `trns` into a fixed-size `[[u8;4]; 256]` look-up table (called
    `rgba_palette` in the code).  After this commit
    `expand_paletted_into_rgba8` only needs to perform a single look-up and
    doesn't need to check the bounds.  This helps to improve the expansion
    time by 60+%:
    
    - expand_paletted(exec)/trns=yes/src_bits=4/src_size=5461:
      [-60.208% -60.057% -59.899%] (p = 0.00 < 0.05)
    - expand_paletted(exec)/trns=yes/src_bits=8/src_size=5461:
      [-77.520% -77.407% -77.301%] (p = 0.00 < 0.05)
    
    `expand_paletted_into_rgb8` performs only a single lookup before and
    after this commit, but avoiding bounds checks still helps to improve the
    expansion time by ~12%:
    
    - expand_paletted(exec)/trns=no/src_bits=4/src_size=5461:
      [-12.357% -12.005% -11.664%] (p = 0.00 < 0.05)
    - expand_paletted(exec)/trns=no/src_bits=8/src_size=5461:
      [-13.135% -12.584% -12.092%] (p = 0.00 < 0.05)
    
    Understandably, this commit regresses the time of `create_transform_fn`.
    Future commits will reduce this regression 2-4 times:
    
    - expand_paletted(ctor)/plte=256/trns=256:
      [+3757.2% +3763.8% +3770.5%] (p = 0.00 < 0.05)
    - expand_paletted(ctor)/plte=224/trns=32:
      [+3807.3% +3816.2% +3824.6%] (p = 0.00 < 0.05)
    - expand_paletted(ctor)/plte=16/trns=1:
      [+1672.0% +1675.0% +1678.1%] (p = 0.00 < 0.05)
    anforowicz committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    d24e9b3 View commit details
    Browse the repository at this point in the history
  2. Copy 4 bytes at a time when expanding palette into rgb8.

    Before this commit `expand_into_rgb8` would copy 3 bytes at a time into
    the output.  After this commit it copies 4 bytes at a time (possibly
    cloberring pixels that will be populated during the next iteration -
    this is ok).  This improved the performance as follows:
    
    expand_paletted(exec)/trns=no/src_bits=8/src_size=5461
    time:   [-23.852% -23.593% -23.319%] (p = 0.00 < 0.05)
    anforowicz committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    fbd33df View commit details
    Browse the repository at this point in the history
  3. Copy 4 bytes at a time in create_rgba_palette

    This improves the performance as follows:
    
    - expand_paletted(ctor)/plte=256/trns=256
      [-40.581% -40.396% -40.211%] (p = 0.00 < 0.05)
    - expand_paletted(ctor)/plte=224/trns=32
      [-24.070% -23.840% -23.592%] (p = 0.00 < 0.05)
    
    Small palettes are mostly unaffected:
    
    - expand_paletted(ctor)/plte=16/trns=1
      [-0.2525% +0.0338% +0.3239%] (p = 0.81 > 0.05)
    anforowicz committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    57b5d54 View commit details
    Browse the repository at this point in the history