You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have looked for existing issues (including closed) about this
Bug Report
I've seen in some situations the throughput of decompression gets significantly worse when using tower_http::decompression compared to manually implementing a similar logic with async-compression crate.
Version
Platform
Apple silicon macOS
(Not 100% sure, but should happen in Linux as well)
Description
In Deno, we switched the inner implementation of fetch (JavaScript API) from reqwest based to hyper-util based.
In the hyper-util based implementation, it uses tower_http::decompression to decompress the fetched data if necessary. Note here that reqwest doesn't use tower_http.
After this change, we started to see the throughput to be degraded especially when the server serves compressed large data. Looks at the following graph, showing how long each Deno version takes to 2k requests where it fetches compressed data from the upstream server and then forwards it the end client.
v1.45.2 is before we switched to hyper-based fetch implementation. Since v1.45.3 when we landed it, the throughput got 10x worse.
Then I identified that tower_http::decompression causes this issue, and figured out that if we implement a decompression logic by directly using the async-compression crate, the performance gets back to what it was. (see denoland/deno#25800 for how manual implementation with async-compression affects the performance)
Currently, every time `WrapBody::poll_frame` is called, new instance of
`BytesMut` is created with the default capacity, which is effectively
64 bytes. This ends up with a lot of memory allocation in certain
situations, making the throughput significantly worse.
To optimize memory allocation, `WrapBody` now gets `BytesMut` as its
field, with initial capacity of 4096 bytes. This buffer will be reused
as much as possible across multiple `poll_frame` calls, and only when
its capacity becomes 0, new allocation of another 4096 bytes is
performed.
Fixes: tower-rs#520
Bug Report
I've seen in some situations the throughput of decompression gets significantly worse when using
tower_http::decompression
compared to manually implementing a similar logic withasync-compression
crate.Version
Platform
Apple silicon macOS
(Not 100% sure, but should happen in Linux as well)
Description
In Deno, we switched the inner implementation of
fetch
(JavaScript API) fromreqwest
based tohyper-util
based.denoland/deno#24593
In the hyper-util based implementation, it uses
tower_http::decompression
to decompress the fetched data if necessary. Note here that reqwest doesn't use tower_http.After this change, we started to see the throughput to be degraded especially when the server serves compressed large data. Looks at the following graph, showing how long each Deno version takes to 2k requests where it fetches compressed data from the upstream server and then forwards it the end client.
v1.45.2 is before we switched to hyper-based
fetch
implementation. Since v1.45.3 when we landed it, the throughput got 10x worse.Then I identified that
tower_http::decompression
causes this issue, and figured out that if we implement a decompression logic by directly using theasync-compression
crate, the performance gets back to what it was. (see denoland/deno#25800 for how manual implementation withasync-compression
affects the performance)You can find how I performed the benchmark at https://github.com/magurotuna/deno_fetch_decompression_throughput
The text was updated successfully, but these errors were encountered: