-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(de)compression: reduce memory allocation to improve performance #521
Merged
seanmonstar
merged 1 commit into
tower-rs:main
from
magurotuna:perf-wrap-body-allocation
Sep 23, 2024
Merged
(de)compression: reduce memory allocation to improve performance #521
seanmonstar
merged 1 commit into
tower-rs:main
from
magurotuna:perf-wrap-body-allocation
Sep 23, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Currently, every time `WrapBody::poll_frame` is called, new instance of `BytesMut` is created with the default capacity, which is effectively 64 bytes. This ends up with a lot of memory allocation in certain situations, making the throughput significantly worse. To optimize memory allocation, `WrapBody` now gets `BytesMut` as its field, with initial capacity of 4096 bytes. This buffer will be reused as much as possible across multiple `poll_frame` calls, and only when its capacity becomes 0, new allocation of another 4096 bytes is performed. Fixes: tower-rs#520
seanmonstar
approved these changes
Sep 23, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phenomenal write-up, thanks!
github-merge-queue bot
referenced
this pull request
in apollographql/subgraph-template-rust-async-graphql
Sep 23, 2024
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [tower-http](https://redirect.github.com/tower-rs/tower-http) | dependencies | patch | `0.6.0` -> `0.6.1` | --- ### Release Notes <details> <summary>tower-rs/tower-http (tower-http)</summary> ### [`v0.6.1`](https://redirect.github.com/tower-rs/tower-http/releases/tag/tower-http-0.6.1): v0.6.1 [Compare Source](https://redirect.github.com/tower-rs/tower-http/compare/tower-http-0.6.0...tower-http-0.6.1) #### Fixed - **decompression:** reuse scratch buffer to significantly reduce allocations and improve performance ([#​521]) [#​521]: https://redirect.github.com/tower-rs/tower-http/pull/521 #### New Contributors - [@​magurotuna](https://redirect.github.com/magurotuna) made their first contribution in [https://github.com/tower-rs/tower-http/pull/521](https://redirect.github.com/tower-rs/tower-http/pull/521) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/apollographql/subgraph-template-rust-async-graphql). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC44MC4wIiwidXBkYXRlZEluVmVyIjoiMzguODAuMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOltdfQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
szinn
referenced
this pull request
in szinn/s3-cdn
Sep 24, 2024
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [tower-http](https://redirect.github.com/tower-rs/tower-http) | dependencies | patch | `0.6.0` -> `0.6.1` | --- ### Release Notes <details> <summary>tower-rs/tower-http (tower-http)</summary> ### [`v0.6.1`](https://redirect.github.com/tower-rs/tower-http/releases/tag/tower-http-0.6.1): v0.6.1 [Compare Source](https://redirect.github.com/tower-rs/tower-http/compare/tower-http-0.6.0...tower-http-0.6.1) #### Fixed - **decompression:** reuse scratch buffer to significantly reduce allocations and improve performance ([#​521]) [#​521]: https://redirect.github.com/tower-rs/tower-http/pull/521 #### New Contributors - [@​magurotuna](https://redirect.github.com/magurotuna) made their first contribution in [https://github.com/tower-rs/tower-http/pull/521](https://redirect.github.com/tower-rs/tower-http/pull/521) </details> <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC45NC4yIiwidXBkYXRlZEluVmVyIjoiMzguOTQuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsidHlwZS9wYXRjaCJdfQ==--> Co-authored-by: repo-jeeves[bot] <106431701+repo-jeeves[bot]@users.noreply.github.com>
szinn
referenced
this pull request
in szinn/rust-arch
Sep 24, 2024
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [tower-http](https://redirect.github.com/tower-rs/tower-http) | dependencies | patch | `0.6.0` -> `0.6.1` | --- ### Release Notes <details> <summary>tower-rs/tower-http (tower-http)</summary> ### [`v0.6.1`](https://redirect.github.com/tower-rs/tower-http/releases/tag/tower-http-0.6.1): v0.6.1 [Compare Source](https://redirect.github.com/tower-rs/tower-http/compare/tower-http-0.6.0...tower-http-0.6.1) #### Fixed - **decompression:** reuse scratch buffer to significantly reduce allocations and improve performance ([#​521]) [#​521]: https://redirect.github.com/tower-rs/tower-http/pull/521 #### New Contributors - [@​magurotuna](https://redirect.github.com/magurotuna) made their first contribution in [https://github.com/tower-rs/tower-http/pull/521](https://redirect.github.com/tower-rs/tower-http/pull/521) </details> <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC45NC4yIiwidXBkYXRlZEluVmVyIjoiMzguOTQuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsidHlwZS9wYXRjaCJdfQ==--> Co-authored-by: repo-jeeves[bot] <106431701+repo-jeeves[bot]@users.noreply.github.com>
its-danny
referenced
this pull request
in mist-id/mist
Sep 24, 2024
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [tower-http](https://redirect.github.com/tower-rs/tower-http) | dependencies | patch | `0.6.0` -> `0.6.1` | --- ### Release Notes <details> <summary>tower-rs/tower-http (tower-http)</summary> ### [`v0.6.1`](https://redirect.github.com/tower-rs/tower-http/releases/tag/tower-http-0.6.1): v0.6.1 [Compare Source](https://redirect.github.com/tower-rs/tower-http/compare/tower-http-0.6.0...tower-http-0.6.1) #### Fixed - **decompression:** reuse scratch buffer to significantly reduce allocations and improve performance ([#​521]) [#​521]: https://redirect.github.com/tower-rs/tower-http/pull/521 #### New Contributors - [@​magurotuna](https://redirect.github.com/magurotuna) made their first contribution in [https://github.com/tower-rs/tower-http/pull/521](https://redirect.github.com/tower-rs/tower-http/pull/521) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/mist-id/mist). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC44MC4wIiwidXBkYXRlZEluVmVyIjoiMzguODAuMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOltdfQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Currently, every time
WrapBody::poll_frame
is called, new instance ofBytesMut
is created with the default capacity, which is effectively 64 bytes. This ends up with a lot of memory allocation in certain situations, making the throughput significantly worse.#520 describes that a manual implementation of decompression logic using
async-compression
gets the performance issue withtower_http::decompression
to be resolved. To identify what is making this difference, I captured a flamegraph. Here's the result:As annotated,
tower_http::decompression
has more memory allocation which is the reason why the performance is degraded.Solution
To optimize memory allocation,
WrapBody
now getsBytesMut
as its field, with initial capacity of 4096 bytes. This buffer will be reused as much as possible across multiplepoll_frame
calls, and only when its capacity becomes 0, new allocation of another 4096 bytes is performed.The capacity of 4096 bytes is taken from the default capacity of the internal buffer used in
tokio_util::io::ReaderStream
: https://docs.rs/tokio-util/0.7.12/src/tokio_util/io/reader_stream.rs.html#8Performance Improvement
With this optimization, the flamegraph is changed as follows, where memory allocation is significantly reduced and optimized.
Also, the throughput gets 10x better when we measure it using Deno's
fetch
implementation that internally useshyper
andtower_http::decompresson
.This graph shows how long each Deno version takes to handle 2k requests. The rightmost one named
v2.0.0-rc.4-tower-http-patched
is the one with this patch applied. The leftmost one, v1.45.2, does not usetower_http::decompression
.For more information on how this result was measured, please refer to https://github.com/magurotuna/deno_fetch_decompression_throughput
Fixes: #520