Remove State allocs in `RedisStateMachine` and reduce allocs in `ByteArrayCodec` #2768

shikharid · 2024-02-23T05:58:48Z

Two changes:

In RedisStateMachine, use pre-allocated State objects
For ByteArrayCodec (or any codec that can tell exact encoded byte sizes), directly copy to target ByteBuf instead of creating a temporary single-use copy first

Have added benchmarks (results in comments)

You have read the contribution guidelines.
You have created a feature request first to discuss your contribution intent. Please reference the feature request ticket number in the pull request.
You use the code formatters provided here and have them applied to your changes. Don’t submit any formatting related changes.
You submit test cases (unit or integration tests) that back your changes. (Already existed, added benchmark tests)

…te object allocs redis#2610 * adds gc and thrpt profiling in RedisStateMachine benchmark * fixes a stale benchmark which caused compilation errors ClusterDistributionChannelWriterBenchmark

…redis#2610 * adds benchmarks to show perf gains * about 10x improvement in perf, with no added gc overhead

shikharid · 2024-02-23T06:09:57Z

Summary

25% perf improvement in RedisStateMachine with negligible heap allocs (before 130 ns/op, after 104 ns/op)
~10x (1000%) perf improvement in Key/Value encoding for ByteArrayCodec (before 310 ns/op, after 34 ns/op)

Benchmark Setup

JMH version: 1.21
VM version: JDK 1.8.0_342, OpenJDK 64-Bit Server VM, 25.342-b07
OS: MacOS Ventura 13.2.1
Arch: Apple M1 Max (32 gb, aarch64)
Warmup: 5 iterations, 10 s each
Measurement: 5 iterations, 10 s each
Timeout: 2 s per iteration
Threads: 1 thread, will synchronize iterations

RedisStateMachine Benchmark Results

Before

Benchmark                                                                  Mode  Cnt    Score    Error   Units
RedisStateMachineBenchmark.measureDecode                                   avgt    5  130.654 ± 11.039   ns/op
RedisStateMachineBenchmark.measureDecode:·gc.alloc.rate                    avgt    5  667.181 ± 55.258  MB/sec
RedisStateMachineBenchmark.measureDecode:·gc.alloc.rate.norm               avgt    5   96.000 ±  0.001    B/op
RedisStateMachineBenchmark.measureDecode:·gc.churn.PS_Eden_Space           avgt    5  667.349 ± 56.391  MB/sec
RedisStateMachineBenchmark.measureDecode:·gc.churn.PS_Eden_Space.norm      avgt    5   96.024 ±  0.745    B/op
RedisStateMachineBenchmark.measureDecode:·gc.churn.PS_Survivor_Space       avgt    5    0.106 ±  0.084  MB/sec
RedisStateMachineBenchmark.measureDecode:·gc.churn.PS_Survivor_Space.norm  avgt    5    0.015 ±  0.011    B/op
RedisStateMachineBenchmark.measureDecode:·gc.count                         avgt    5  715.000           counts
RedisStateMachineBenchmark.measureDecode:·gc.time                          avgt    5  345.000               ms

After

Benchmark                                                     Mode  Cnt    Score    Error   Units
RedisStateMachineBenchmark.measureDecode                      avgt    5  104.293 ±  0.309   ns/op
RedisStateMachineBenchmark.measureDecode:·gc.alloc.rate       avgt    5   ≈ 10⁻⁴           MB/sec
RedisStateMachineBenchmark.measureDecode:·gc.alloc.rate.norm  avgt    5   ≈ 10⁻⁵             B/op
RedisStateMachineBenchmark.measureDecode:·gc.count            avgt    5      ≈ 0           counts

ByteArrayCodec Benchmark Results

Before

ExactVsEstimatedSizeCodecBenchmark.encodeKeyEstimatedSize                                     avgt    5   310.842 ±  1.404   ns/op
ExactVsEstimatedSizeCodecBenchmark.encodeKeyEstimatedSize:·gc.alloc.rate                      avgt    5   972.380 ±  3.934  MB/sec
ExactVsEstimatedSizeCodecBenchmark.encodeKeyEstimatedSize:·gc.alloc.rate.norm                 avgt    5   332.994 ±  0.010    B/op
ExactVsEstimatedSizeCodecBenchmark.encodeKeyEstimatedSize:·gc.churn.PS_Eden_Space             avgt    5   971.755 ± 12.683  MB/sec
ExactVsEstimatedSizeCodecBenchmark.encodeKeyEstimatedSize:·gc.churn.PS_Eden_Space.norm        avgt    5   332.779 ±  3.172    B/op
ExactVsEstimatedSizeCodecBenchmark.encodeKeyEstimatedSize:·gc.churn.PS_Survivor_Space         avgt    5     0.134 ±  0.091  MB/sec
ExactVsEstimatedSizeCodecBenchmark.encodeKeyEstimatedSize:·gc.churn.PS_Survivor_Space.norm    avgt    5     0.046 ±  0.031    B/op
ExactVsEstimatedSizeCodecBenchmark.encodeKeyEstimatedSize:·gc.count                           avgt    5   759.000           counts
ExactVsEstimatedSizeCodecBenchmark.encodeKeyEstimatedSize:·gc.time                            avgt    5   370.000               ms

After

Benchmark                                                                                     Mode  Cnt     Score    Error   Units
ExactVsEstimatedSizeCodecBenchmark.encodeKeyExactSize                                         avgt    5    34.032 ±  0.191   ns/op
ExactVsEstimatedSizeCodecBenchmark.encodeKeyExactSize:·gc.alloc.rate                          avgt    5  4054.084 ± 20.099  MB/sec
ExactVsEstimatedSizeCodecBenchmark.encodeKeyExactSize:·gc.alloc.rate.norm                     avgt    5   152.000 ±  0.001    B/op
ExactVsEstimatedSizeCodecBenchmark.encodeKeyExactSize:·gc.churn.PS_Eden_Space                 avgt    5  4047.560 ± 97.328  MB/sec
ExactVsEstimatedSizeCodecBenchmark.encodeKeyExactSize:·gc.churn.PS_Eden_Space.norm            avgt    5   151.755 ±  3.222    B/op
ExactVsEstimatedSizeCodecBenchmark.encodeKeyExactSize:·gc.churn.PS_Survivor_Space             avgt    5     0.147 ±  0.085  MB/sec
ExactVsEstimatedSizeCodecBenchmark.encodeKeyExactSize:·gc.churn.PS_Survivor_Space.norm        avgt    5     0.006 ±  0.003    B/op
ExactVsEstimatedSizeCodecBenchmark.encodeKeyExactSize:·gc.count                               avgt    5   780.000           counts
ExactVsEstimatedSizeCodecBenchmark.encodeKeyExactSize:·gc.time                                avgt    5   427.000               ms

…tate object allocs #2610 * adds gc and thrpt profiling in RedisStateMachine benchmark * fixes a stale benchmark which caused compilation errors ClusterDistributionChannelWriterBenchmark Original pull request: #2768

…zes #2610 * adds benchmarks to show perf gains * about 10x improvement in perf, with no added gc overhead Original pull request: #2768

Reduce code duplications. Add exact optimization to ASCII StringCodec. Tweak Javadoc. Original pull request: #2768

…tate object allocs #2610 * adds gc and thrpt profiling in RedisStateMachine benchmark * fixes a stale benchmark which caused compilation errors ClusterDistributionChannelWriterBenchmark Original pull request: #2768

…zes #2610 * adds benchmarks to show perf gains * about 10x improvement in perf, with no added gc overhead Original pull request: #2768

Reduce code duplications. Add exact optimization to ASCII StringCodec. Tweak Javadoc. Original pull request: #2768

mp911de · 2024-02-26T08:48:41Z

Thank you for your contribution. That's merged, polished, and backported now.

shikharid added 2 commits February 23, 2024 09:41

Use pre-allocated State's in RedisStateMachine, avoiding need for Sta…

db829dc

…te object allocs redis#2610 * adds gc and thrpt profiling in RedisStateMachine benchmark * fixes a stale benchmark which caused compilation errors ClusterDistributionChannelWriterBenchmark

Directly encode key/value to ByteBuf when codec knows exact byte sizes …

0525aec

…redis#2610 * adds benchmarks to show perf gains * about 10x improvement in perf, with no added gc overhead

shikharid changed the title ~~perf: remove State allocs in RedisStateMachine and reduce allocs in ByteArrayCodec #2610~~ perf: remove State allocs in RedisStateMachine and reduce allocs in ByteArrayCodec Feb 23, 2024

shikharid mentioned this pull request Feb 23, 2024

Performance: Encoding of keys/values in CommandArgs when using a codec that implements ToByteBufEncoder #2610

Closed

mp911de changed the title ~~perf: remove State allocs in RedisStateMachine and reduce allocs in ByteArrayCodec~~ Remove State allocs in RedisStateMachine and reduce allocs in ByteArrayCodec Feb 26, 2024

mp911de changed the title ~~Remove State allocs in RedisStateMachine and reduce allocs in ByteArrayCodec~~ Remove State allocs in RedisStateMachine and reduce allocs in ByteArrayCodec Feb 26, 2024

mp911de linked an issue Feb 26, 2024 that may be closed by this pull request

Performance: Encoding of keys/values in CommandArgs when using a codec that implements ToByteBufEncoder #2610

Closed

mp911de added the type: feature A new feature label Feb 26, 2024

mp911de added this to the 6.3.2.RELEASE milestone Feb 26, 2024

mp911de pushed a commit that referenced this pull request Feb 26, 2024

Directly encode key/value to ByteBuf when codec knows exact byte si…

68c89ae

…zes #2610 * adds benchmarks to show perf gains * about 10x improvement in perf, with no added gc overhead Original pull request: #2768

mp911de added a commit that referenced this pull request Feb 26, 2024

Polishing #2610

6185ebd

Reduce code duplications. Add exact optimization to ASCII StringCodec. Tweak Javadoc. Original pull request: #2768

mp911de pushed a commit that referenced this pull request Feb 26, 2024

Directly encode key/value to ByteBuf when codec knows exact byte si…

45b6ad3

…zes #2610 * adds benchmarks to show perf gains * about 10x improvement in perf, with no added gc overhead Original pull request: #2768

mp911de added a commit that referenced this pull request Feb 26, 2024

Polishing #2610

0458b21

Reduce code duplications. Add exact optimization to ASCII StringCodec. Tweak Javadoc. Original pull request: #2768

mp911de closed this Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove State allocs in `RedisStateMachine` and reduce allocs in `ByteArrayCodec` #2768

Remove State allocs in `RedisStateMachine` and reduce allocs in `ByteArrayCodec` #2768

shikharid commented Feb 23, 2024 •

edited

Loading

shikharid commented Feb 23, 2024 •

edited

Loading

mp911de commented Feb 26, 2024

Remove State allocs in RedisStateMachine and reduce allocs in ByteArrayCodec #2768

Remove State allocs in RedisStateMachine and reduce allocs in ByteArrayCodec #2768

Conversation

shikharid commented Feb 23, 2024 • edited Loading

shikharid commented Feb 23, 2024 • edited Loading

Summary

Benchmark Setup

mp911de commented Feb 26, 2024

Remove State allocs in `RedisStateMachine` and reduce allocs in `ByteArrayCodec` #2768

Remove State allocs in `RedisStateMachine` and reduce allocs in `ByteArrayCodec` #2768

shikharid commented Feb 23, 2024 •

edited

Loading

shikharid commented Feb 23, 2024 •

edited

Loading