-
Notifications
You must be signed in to change notification settings - Fork 734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document reload layer performance impact #1632
Comments
There is a note in the module-level documentation for I'm somewhat surprised that the reload layer makes filtering as much as 3x slower with |
There is probably a lot of stuff being printed from different threads. I have "measured" this overhead with wasmtime while it compiled some wasm code and wasmtime is using a lot of threads for it. However, most of the calls are using a read lock as far as I have seen. So, I also don't really understand why it is that slow. |
Hmm, there is currently a microbenchmark for comparing the performance of a I wrote a version where events are recorded by multiple threads concurrently, but I still don't see a significant performance difference between the benchmark with reloading and the version without:
I would certainly expect the If your application spawns an extremely large number of threads, it's possible that the |
Per your comment here:
I noticed you said "most of the calls are using a read lock". Does that mean that during normal operation of the program, the layer is being reloaded regularly? The intended use-case for the |
No, no. The layer is not reloaded regularly. It happens when the user provides some new logging targets, which happens almost never. So, that should be fine. |
Could you try to change this to tracing/tracing-subscriber/benches/reload.rs Lines 75 to 80 in 5b49db4
And change this
something=trace ? The point is to have the min level be trace , but we need to filter by the prefix.
|
…1636) ## Motivation We use `tracing` as our logger in [`substrate`](https://github.com/paritytech/substrate). We've noticed that as soon as *any* `trace` log is enabled (even one which doesn't exists) the whole logging machinery starts to take a lot of time, even if nothing at all is actually printed! In one of our quick-and-dirty reproduction benchmarks (JIT-ing a WASM program) we saw the total real runtime rise from around ~1.3s to ~7s just by adding a `trace` log filter which doesn't match anything. (Depending on the hardware and on how many threads are simultaneously logging this figure varies pretty widely, but it's always a very significant drop.) After looking into this problem I've found that the culprit of the slowdown were `trace!` and `debug!` logs sprinkled quite liberally in some of the more hot codepaths. When there are no `trace`-level filters defined on the logger it can basically reject those inert `trace!` and `debug!` logs purely based on the current maximum logging level (which is cheap!), but as soon as you define *any* trace filter the current maximum logging changes, and then every `trace!` and `debug!` log has to go through the whole filtering machinery before it can be rejected. While this is cheap if you only do it once, it starts to become very expensive when you do it a lot, especially when you're running multiple threads and enable log reloading. (This is related to #1632.) ## Solution I've added an opt-in per-thread LRU cache which tries to cache whenever the logger is actually interested in a given `target` + `level` pair for every log emitted through the `log` crate. I've also added a benchmark very roughly replicating the situation from our code; here's the performance *without* the cache: (`cargo bench`) ``` [838.67 ns 846.51 ns 854.04 ns] ``` And here's the performance *with* the cache: (`cargo bench --features interest-cache`) ``` [25.322 ns 25.556 ns 25.820 ns] ``` As you can see the per-call cost was cut down to less than ~3%.
…okio-rs#1636) ## Motivation We use `tracing` as our logger in [`substrate`](https://github.com/paritytech/substrate). We've noticed that as soon as *any* `trace` log is enabled (even one which doesn't exists) the whole logging machinery starts to take a lot of time, even if nothing at all is actually printed! In one of our quick-and-dirty reproduction benchmarks (JIT-ing a WASM program) we saw the total real runtime rise from around ~1.3s to ~7s just by adding a `trace` log filter which doesn't match anything. (Depending on the hardware and on how many threads are simultaneously logging this figure varies pretty widely, but it's always a very significant drop.) After looking into this problem I've found that the culprit of the slowdown were `trace!` and `debug!` logs sprinkled quite liberally in some of the more hot codepaths. When there are no `trace`-level filters defined on the logger it can basically reject those inert `trace!` and `debug!` logs purely based on the current maximum logging level (which is cheap!), but as soon as you define *any* trace filter the current maximum logging changes, and then every `trace!` and `debug!` log has to go through the whole filtering machinery before it can be rejected. While this is cheap if you only do it once, it starts to become very expensive when you do it a lot, especially when you're running multiple threads and enable log reloading. (This is related to tokio-rs#1632.) ## Solution I've added an opt-in per-thread LRU cache which tries to cache whenever the logger is actually interested in a given `target` + `level` pair for every log emitted through the `log` crate. I've also added a benchmark very roughly replicating the situation from our code; here's the performance *without* the cache: (`cargo bench`) ``` [838.67 ns 846.51 ns 854.04 ns] ``` And here's the performance *with* the cache: (`cargo bench --features interest-cache`) ``` [25.322 ns 25.556 ns 25.820 ns] ``` As you can see the per-call cost was cut down to less than ~3%.
Hey, we have observed that using the reload layer has quite some performance impact. Without
parking_lot
(we realized later that this is disabled by default) it takes factor 6x more time when you haveRUST_LOG=bullshit=trace
set, aka we are required to check if any target matches "bullshit"(which no target matches). With parking lot enabled the performance decrease is still factor 3. We think that it would be nice to document this.The text was updated successfully, but these errors were encountered: