Document reload layer performance impact #1632

bkchr · 2021-10-11T08:20:00Z

Hey, we have observed that using the reload layer has quite some performance impact. Without parking_lot(we realized later that this is disabled by default) it takes factor 6x more time when you have RUST_LOG=bullshit=trace set, aka we are required to check if any target matches "bullshit"(which no target matches). With parking lot enabled the performance decrease is still factor 3. We think that it would be nice to document this.

The text was updated successfully, but these errors were encountered:

hawkw · 2021-10-11T18:09:07Z

There is a note in the module-level documentation for reload that it introduces some performance overhead, although it doesn't go into great detail on this: https://docs.rs/tracing-subscriber/0.2.25/tracing_subscriber/reload/index.html, perhaps it should be expanded. Anything you'd like to see mentioned more specifically in the docs?

I'm somewhat surprised that the reload layer makes filtering as much as 3x slower with parking_lot enabled, though. That may be worth investigating...

bkchr · 2021-10-11T18:18:17Z

I'm somewhat surprised that the reload layer makes filtering as much as 3x slower with parking_lot enabled, though. That may be worth investigating...

There is probably a lot of stuff being printed from different threads. I have "measured" this overhead with wasmtime while it compiled some wasm code and wasmtime is using a lot of threads for it. However, most of the calls are using a read lock as far as I have seen. So, I also don't really understand why it is that slow.

hawkw · 2021-10-11T18:37:11Z

Hmm, there is currently a microbenchmark for comparing the performance of a reload layer with that of the same subscriber without reloading enabled. However, I noticed that this benchmark currently only tests the single-threaded case.

I wrote a version where events are recorded by multiple threads concurrently, but I still don't see a significant performance difference between the benchmark with reloading and the version without:

reload/complex-collector
                        time:   [1.2370 us 1.2400 us 1.2431 us]

reload/complex-collector-reloadable
                        time:   [1.2078 us 1.2117 us 1.2152 us]

reload_concurrent/complex-collector
                        time:   [8.4684 us 8.5037 us 8.5417 us]

reload_concurrent/complex-collector-reloadable
                        time:   [8.6130 us 8.6478 us 8.6845 us]

I would certainly expect the RwLock to introduce some amount of overhead, but I wouldn't expect it to be nearly so severe when the lock is never contended. I'm assuming you're not constantly reloading the filter very frequently?

If your application spawns an extremely large number of threads, it's possible that the RwLock has a more noticeable impact than in the case where the layer is shared between a smaller number of threads. Potentially, we could use a sharded lock approach to decrease the impact in the read case.

hawkw · 2021-10-11T18:45:56Z

Per your comment here:

However, most of the calls are using a read lock as far as I have seen. So, I also don't really understand why it is that slow.

I noticed you said "most of the calls are using a read lock". Does that mean that during normal operation of the program, the layer is being reloaded regularly?

The intended use-case for the reload layer is one where the filter configuration is reloaded very infrequently; typically in response to a user-initiated configuration change (e.g. changing settings in a configuration UI, sending the program a SIGHUP to reload a config file, etc) A general rule of thumb is that if reloading a filter is happening through some automated process, you're probably reloading the filter too frequently to get good performance out of reload --- if you're using it to implement dynamic filtering or sampling where a different filter configuration is used for, say, 1 out of every 100 requests, or something, you should probably be using a custom filter implementation instead.

bkchr · 2021-10-11T19:18:47Z

I noticed you said "most of the calls are using a read lock". Does that mean that during normal operation of the program, the layer is being reloaded regularly?

No, no. The layer is not reloaded regularly. It happens when the user provides some new logging targets, which happens almost never. So, that should be fine.

bkchr · 2021-10-11T19:24:45Z

Could you try to change this to

tracing/tracing-subscriber/benches/reload.rs

Lines 75 to 80 in 5b49db4

    
           fn events() { 
        
               tracing::info!(target: "this", "hi"); 
        
               tracing::debug!(target: "enabled", "hi"); 
        
               tracing::warn!(target: "hepp", "hi"); 
        
               tracing::trace!(target: "stuff", "hi"); 
        
           }

to

for _ in 1000 {
   tracing::info!(target: "this", "hi");
   tracing::debug!(target: "enabled", "hi");
   tracing::warn!(target: "hepp", "hi");
   tracing::trace!(target: "stuff", "hi");
}

And change this

tracing/tracing-subscriber/benches/reload.rs

Line 28 in 5b49db4

.add_directive(LevelFilter::INFO.into());

to something=trace? The point is to have the min level be trace, but we need to filter by the prefix.

…1636) ## Motivation We use `tracing` as our logger in [`substrate`](https://github.com/paritytech/substrate). We've noticed that as soon as *any* `trace` log is enabled (even one which doesn't exists) the whole logging machinery starts to take a lot of time, even if nothing at all is actually printed! In one of our quick-and-dirty reproduction benchmarks (JIT-ing a WASM program) we saw the total real runtime rise from around ~1.3s to ~7s just by adding a `trace` log filter which doesn't match anything. (Depending on the hardware and on how many threads are simultaneously logging this figure varies pretty widely, but it's always a very significant drop.) After looking into this problem I've found that the culprit of the slowdown were `trace!` and `debug!` logs sprinkled quite liberally in some of the more hot codepaths. When there are no `trace`-level filters defined on the logger it can basically reject those inert `trace!` and `debug!` logs purely based on the current maximum logging level (which is cheap!), but as soon as you define *any* trace filter the current maximum logging changes, and then every `trace!` and `debug!` log has to go through the whole filtering machinery before it can be rejected. While this is cheap if you only do it once, it starts to become very expensive when you do it a lot, especially when you're running multiple threads and enable log reloading. (This is related to #1632.) ## Solution I've added an opt-in per-thread LRU cache which tries to cache whenever the logger is actually interested in a given `target` + `level` pair for every log emitted through the `log` crate. I've also added a benchmark very roughly replicating the situation from our code; here's the performance *without* the cache: (`cargo bench`) ``` [838.67 ns 846.51 ns 854.04 ns] ``` And here's the performance *with* the cache: (`cargo bench --features interest-cache`) ``` [25.322 ns 25.556 ns 25.820 ns] ``` As you can see the per-call cost was cut down to less than ~3%.

…okio-rs#1636) ## Motivation We use `tracing` as our logger in [`substrate`](https://github.com/paritytech/substrate). We've noticed that as soon as *any* `trace` log is enabled (even one which doesn't exists) the whole logging machinery starts to take a lot of time, even if nothing at all is actually printed! In one of our quick-and-dirty reproduction benchmarks (JIT-ing a WASM program) we saw the total real runtime rise from around ~1.3s to ~7s just by adding a `trace` log filter which doesn't match anything. (Depending on the hardware and on how many threads are simultaneously logging this figure varies pretty widely, but it's always a very significant drop.) After looking into this problem I've found that the culprit of the slowdown were `trace!` and `debug!` logs sprinkled quite liberally in some of the more hot codepaths. When there are no `trace`-level filters defined on the logger it can basically reject those inert `trace!` and `debug!` logs purely based on the current maximum logging level (which is cheap!), but as soon as you define *any* trace filter the current maximum logging changes, and then every `trace!` and `debug!` log has to go through the whole filtering machinery before it can be rejected. While this is cheap if you only do it once, it starts to become very expensive when you do it a lot, especially when you're running multiple threads and enable log reloading. (This is related to tokio-rs#1632.) ## Solution I've added an opt-in per-thread LRU cache which tries to cache whenever the logger is actually interested in a given `target` + `level` pair for every log emitted through the `log` crate. I've also added a benchmark very roughly replicating the situation from our code; here's the performance *without* the cache: (`cargo bench`) ``` [838.67 ns 846.51 ns 854.04 ns] ``` And here's the performance *with* the cache: (`cargo bench --features interest-cache`) ``` [25.322 ns 25.556 ns 25.820 ns] ``` As you can see the per-call cost was cut down to less than ~3%.

bkchr mentioned this issue Oct 11, 2021

Disable log reloading by default paritytech/substrate#9966

Merged

koute mentioned this issue Oct 13, 2021

Add an interest cache for logs emitted through the log crate #1636

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document reload layer performance impact #1632

Document reload layer performance impact #1632

bkchr commented Oct 11, 2021

hawkw commented Oct 11, 2021

bkchr commented Oct 11, 2021

hawkw commented Oct 11, 2021

hawkw commented Oct 11, 2021

bkchr commented Oct 11, 2021

bkchr commented Oct 11, 2021 •

edited

Loading

Document reload layer performance impact #1632

Document reload layer performance impact #1632

Comments

bkchr commented Oct 11, 2021

hawkw commented Oct 11, 2021

bkchr commented Oct 11, 2021

hawkw commented Oct 11, 2021

hawkw commented Oct 11, 2021

bkchr commented Oct 11, 2021

bkchr commented Oct 11, 2021 • edited Loading

bkchr commented Oct 11, 2021 •

edited

Loading