Help LLVM understand that some spans are never going to do anything #1600

oli-obk · 2021-09-29T14:49:03Z

Motivation

Adding #[instrument(level = "debug")] attributes to functions in rustc
caused a performance regression (in release, where debug! is fully
optimized out) across all crates:
rust-lang/rust#89048 (comment)

While trying to debug this, I noticed that spans don't have the same
advantage that events have wrt to how LLVM sees them. Spans (or more
precisely, the enter-guard), will get dropped at the end of the scope,
which throws a spanner into the LLVM optimization pipeline. I am not
entirely sure where the problem is, but I am moderately certain that the
issue is that even entering a dummy span is too much code for LLVM to
reliably (or at all) optimize out.

Solution

My hope is that in trusting the Rust compiler to generate cool code when using
drop flags, we can essentially generate a drop flag that depends on something we
know (due to events working as expected) to be optimizable.

So instead of doing

let _x = span!();
let _y = _x.enter();
// lotsa code
drop(_y)

we do

let _x;
let _y;
let must_drop = false;
if level_enabled!(DEBUG) {
    must_drop = true;
    _x = span!();
    _y = _x.enter();
}
// lotsa code
if must_drop {
    drop(_y)
}

I believe this will allow LLVM to properly optimize this again. Testing that
right now, but I wanted to open this PR immediately for review.

hawkw

Thanks, this is very cool!

It seems like it should also be possible to do something similar for #[instrument]ed async blocks --- we could just not add the instrument combinator at all when the span won't be enabled, avoiding the extra per-poll overhead of entering the disabled span. However, we should probably do that in a follow-up branch, rather than this one.

This change looks good to me!

tracing-attributes/src/lib.rs

hawkw

lovely, thank you!

tracing-attributes/src/lib.rs

hawkw · 2021-09-29T16:52:44Z

As an aside, I think we should be able to make the same change in the err case here as well:

tracing/tracing-attributes/src/lib.rs

Lines 540 to 553 in c945ac0

    
           } else if err { 
        
               quote_spanned!(block.span()=> 
        
                   let __tracing_attr_span = #span; 
        
                   let __tracing_attr_guard = __tracing_attr_span.enter(); 
        
                   #[allow(clippy::redundant_closure_call)] 
        
                   match (move || #block)() { 
        
                       #[allow(clippy::unit_arg)] 
        
                       Ok(x) => Ok(x), 
        
                       Err(e) => { 
        
                           tracing::error!(error = %e); 
        
                           Err(e) 
        
                       } 
        
                   } 
        
               )

and I think we'll see the same benefits?

oli-obk · 2021-09-29T17:23:14Z

yes, once the perf is through and shows that this actually works, I will add the same change to the async and err branches

hawkw · 2021-09-29T17:55:07Z

yes, once the perf is through and shows that this actually works, I will add the same change to the async and err branches

cool, i'll hold off on merging this PR until then!

I think the change in the async case will be somewhat significantly different than non-async, but it should still be possible to make a similar optimization...

oli-obk · 2021-09-30T12:57:00Z

I have finished optimizing the other code paths. Perfbot also says that this is working.

One thing I'm wondering is whether it would be possible to backport this to tracing 0.1, as there is no published 0.2 candidate yet

hawkw · 2021-09-30T17:11:38Z

One thing I'm wondering is whether it would be possible to backport this to tracing 0.1, as there is no published 0.2 candidate yet

Yeah, AFAICT this should be pretty much trivial to backport to 0.1. I can handle that part, though --- I'll see about getting a release prepped today!

hawkw

this looks good to me, but I think we may actually be able to improve the case with futures even further. i'm going to go ahead and merge this PR, and improve things more in a follow-up.

thanks again for working on this!

hawkw · 2021-09-30T17:27:51Z

tracing-attributes/src/lib.rs

+                if tracing::level_enabled!(#level) {
+                    tracing::Instrument::instrument(
+                        fut,
+                        __tracing_attr_span
+                    )
+                    .await


in this case, since we are always creating the span, we could actually do this:

Suggested change

if tracing::level_enabled!(#level) {

tracing::Instrument::instrument(

fut,

__tracing_attr_span

)

.await

if !span.is_disabled() {

tracing::Instrument::instrument(

fut,

__tracing_attr_span

)

.await

which would also let us skip instrumenting the future in the case where the span's level is enabled but it was disabled by the subscriber (e.g. if its target was not enabled)

hawkw · 2021-09-30T17:30:49Z

tracing-attributes/src/lib.rs

+                if tracing::level_enabled!(#level) {
+                    tracing::Instrument::instrument(
+                        fut,
+                        __tracing_attr_span
+                    )
+                    .await
+                } else {
+                    fut.await
+                }


actually, we could probably also do the optimization of not creating the span at all if the level is disabled:

Suggested change

if tracing::level_enabled!(#level) {

tracing::Instrument::instrument(

fut,

__tracing_attr_span

)

.await

} else {

fut.await

}

if tracing::level_enabled!(#level) {

let __tracing_attr_span = #span;

if !span.is_disabled() {

return tracing::Instrument::instrument(

fut,

__tracing_attr_span

)

.await;

}

fut.await

…o anything (#1600) (#1605) ## Motivation Adding `#[instrument(level = "debug")]` attributes to functions in rustc caused a performance regression (in release, where `debug!` is fully optimized out) across all crates: rust-lang/rust#89048 (comment) While trying to debug this, I noticed that spans don't have the same advantage that events have wrt to how LLVM sees them. Spans (or more precisely, the enter-guard), will get dropped at the end of the scope, which throws a spanner into the LLVM optimization pipeline. I am not entirely sure where the problem is, but I am moderately certain that the issue is that even entering a dummy span is too much code for LLVM to reliably (or at all) optimize out. ## Solution My hope is that in trusting the Rust compiler to generate cool code when using drop flags, we can essentially generate a drop flag that depends on something we know (due to events working as expected) to be optimizable. So instead of doing ```rust let _x = span!(); let _y = _x.enter(); // lotsa code drop(_y) ``` we do ```rust let _x; let _y; let must_drop = false; if level_enabled!(DEBUG) { must_drop = true; _x = span!(); _y = _x.enter(); } // lotsa code if must_drop { drop(_y) } ``` I believe this will allow LLVM to properly optimize this again. Testing that right now, but I wanted to open this PR immediately for review.

## Motivation In #1600, the `instrument` code generation was changed to avoid ever constructing a `Span` struct if the level is explicitly disabled. However, for `async` functions, `#[instrument]` will currently still create the span, but simply skips constructing an `Instrument` future if the level is disabled. ## Solution This branch changes the `#[instrument]` code generation for async blocks to totally skip constructing the span if the level is disabled. I also simplfied the code generation a bit by combining the shared code between the `err` and non-`err` cases, reducing code duplication a bit. Signed-off-by: Eliza Weisman <[email protected]>

@oli-obk

# 0.1.17 (October 1, 2021) This release significantly improves performance when `#[instrument]`-generated spans are below the maximum enabled level. ### Added - improve performance when skipping `#[instrument]`-generated spans below the max level ([#1600], [#1605]) Thanks to @oli-obk for contributing to this release! [#1600]: #1600 [#1605]: #1605

@oli-obk

# 0.1.17 (October 1, 2021) This release significantly improves performance when `#[instrument]`-generated spans are below the maximum enabled level. ### Added - improve performance when skipping `#[instrument]`-generated spans below the max level ([#1600], [#1605]) Thanks to @oli-obk for contributing to this release! [#1600]: #1600 [#1605]: #1605

…imulacrum Fix performance regression with #[instrument] linked tracing PR: tokio-rs/tracing#1600 regression introduced by rust-lang#89048

@BrianBurgers

# 0.1.29 (October 5th, 2021 This release adds support for recording `Option<T> where T: Value` as typed `tracing` field values. It also includes significant performance improvements for functions annotated with the `#[instrument]` attribute when the generated span is disabled. ### Changed - `tracing-core`: updated to v0.1.21 - `tracing-attributes`: updated to v0.1.19 ### Added - **field**: `Value` impl for `Option<T> where T: Value` ([#1585]) - **attributes**: - improved performance when skipping `#[instrument]`-generated spans below the max level ([#1600], [#1605], [#1614], [#1616], [#1617]) ### Fixed - **instrument**: added missing `Future` implementation for `WithSubscriber`, making the `WithDispatch` extension trait actually useable ([#1602]) - Documentation fixes and improvements ([#1595], [#1601], [#1597]) Thanks to @BrianBurgers, @mattiast, @DCjanus, @oli-obk, and @matklad for contributing to this release! [#1585]: #1585 [#1595]: #1596 [#1597]: #1597 [#1600]: #1600 [#1601]: #1601 [#1602]: #1602 [#1614]: #1614 [#1616]: #1616 [#1617]: #1617

@BrianBurgers

# 0.1.29 (October 5th, 2021 This release adds support for recording `Option<T> where T: Value` as typed `tracing` field values. It also includes significant performance improvements for functions annotated with the `#[instrument]` attribute when the generated span is disabled. ### Changed - `tracing-core`: updated to v0.1.21 - `tracing-attributes`: updated to v0.1.19 ### Added - **field**: `Value` impl for `Option<T> where T: Value` ([#1585]) - **attributes**: - improved performance when skipping `#[instrument]`-generated spans below the max level ([#1600], [#1605], [#1614], [#1616], [#1617]) ### Fixed - **instrument**: added missing `Future` implementation for `WithSubscriber`, making the `WithDispatch` extension trait actually useable ([#1602]) - Documentation fixes and improvements ([#1595], [#1601], [#1597]) Thanks to @BrianBurgers, @mattiast, @DCjanus, @oli-obk, and @matklad for contributing to this release! [#1585]: #1585 [#1595]: #1596 [#1597]: #1597 [#1600]: #1600 [#1601]: #1601 [#1602]: #1602 [#1614]: #1614 [#1616]: #1616 [#1617]: #1617

@BrianBurgers

# 0.1.29 (October 5th, 2021 This release adds support for recording `Option<T> where T: Value` as typed `tracing` field values. It also includes significant performance improvements for functions annotated with the `#[instrument]` attribute when the generated span is disabled. ### Changed - `tracing-core`: updated to v0.1.21 - `tracing-attributes`: updated to v0.1.19 ### Added - **field**: `Value` impl for `Option<T> where T: Value` ([#1585]) - **attributes**: - improved performance when skipping `#[instrument]`-generated spans below the max level ([#1600], [#1605], [#1614], [#1616], [#1617]) ### Fixed - **instrument**: added missing `Future` implementation for `WithSubscriber`, making the `WithDispatch` extension trait actually useable ([#1602]) - Documentation fixes and improvements ([#1595], [#1601], [#1597]) Thanks to @BrianBurgers, @mattiast, @DCjanus, @oli-obk, and @matklad for contributing to this release! [#1585]: #1585 [#1595]: #1596 [#1597]: #1597 [#1600]: #1600 [#1601]: #1601 [#1602]: #1602 [#1605]: #1605 [#1614]: #1614 [#1616]: #1616 [#1617]: #1617

@oli-obk

# 0.1.17 (October 1, 2021) This release significantly improves performance when `#[instrument]`-generated spans are below the maximum enabled level. ### Added - improve performance when skipping `#[instrument]`-generated spans below the max level ([tokio-rs#1600], [tokio-rs#1605]) Thanks to @oli-obk for contributing to this release! [tokio-rs#1600]: tokio-rs#1600 [tokio-rs#1605]: tokio-rs#1605

@BrianBurgers

# 0.1.29 (October 5th, 2021 This release adds support for recording `Option<T> where T: Value` as typed `tracing` field values. It also includes significant performance improvements for functions annotated with the `#[instrument]` attribute when the generated span is disabled. ### Changed - `tracing-core`: updated to v0.1.21 - `tracing-attributes`: updated to v0.1.19 ### Added - **field**: `Value` impl for `Option<T> where T: Value` ([tokio-rs#1585]) - **attributes**: - improved performance when skipping `#[instrument]`-generated spans below the max level ([tokio-rs#1600], [tokio-rs#1605], [tokio-rs#1614], [tokio-rs#1616], [tokio-rs#1617]) ### Fixed - **instrument**: added missing `Future` implementation for `WithSubscriber`, making the `WithDispatch` extension trait actually useable ([tokio-rs#1602]) - Documentation fixes and improvements ([tokio-rs#1595], [tokio-rs#1601], [tokio-rs#1597]) Thanks to @BrianBurgers, @mattiast, @DCjanus, @oli-obk, and @matklad for contributing to this release! [tokio-rs#1585]: tokio-rs#1585 [tokio-rs#1595]: tokio-rs#1596 [tokio-rs#1597]: tokio-rs#1597 [tokio-rs#1600]: tokio-rs#1600 [tokio-rs#1601]: tokio-rs#1601 [tokio-rs#1602]: tokio-rs#1602 [tokio-rs#1605]: tokio-rs#1605 [tokio-rs#1614]: tokio-rs#1614 [tokio-rs#1616]: tokio-rs#1616 [tokio-rs#1617]: tokio-rs#1617

oli-obk requested review from davidbarsky, hawkw and a team as code owners September 29, 2021 14:49

oli-obk mentioned this pull request Sep 29, 2021

Fix performance regression with #[instrument] rust-lang/rust#89363

Merged

hawkw approved these changes Sep 29, 2021

View reviewed changes

tracing-attributes/src/lib.rs Show resolved Hide resolved

hawkw approved these changes Sep 29, 2021

View reviewed changes

tracing-attributes/src/lib.rs Show resolved Hide resolved

oli-obk added 4 commits September 30, 2021 12:53

Help LLVM understand that some spans are never going to do anything

feda6ee

Pacify clippy

94842ed

Explain the lazy variable initialization

27bd20e

Apply the static level optimization to async fns and error printing

b1b0de2

oli-obk force-pushed the master branch from 89ae992 to b1b0de2 Compare September 30, 2021 12:54

hawkw approved these changes Sep 30, 2021

View reviewed changes

hawkw merged commit e448aa3 into tokio-rs:master Sep 30, 2021

oli-obk mentioned this pull request Oct 1, 2021

Backport LLVM optimization hint #1605

Merged

hawkw mentioned this pull request Oct 1, 2021

attributes: skip async spans if level disabled #1607

Merged

hawkw mentioned this pull request Oct 1, 2021

attributes: prepare to release v0.1.17 #1611

Merged

hawkw mentioned this pull request Oct 4, 2021

tracing::instrument triggers clippy warning #1613

Closed

hawkw mentioned this pull request Oct 5, 2021

tracing: prepare to release v0.1.29 #1623

Merged

davidbarsky mentioned this pull request Sep 26, 2023

chore: backport roughly a year's worth of changes #2728

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help LLVM understand that some spans are never going to do anything #1600

Help LLVM understand that some spans are never going to do anything #1600

oli-obk commented Sep 29, 2021 •

edited by hawkw

Loading

hawkw left a comment

hawkw left a comment

hawkw commented Sep 29, 2021

oli-obk commented Sep 29, 2021

hawkw commented Sep 29, 2021

oli-obk commented Sep 30, 2021

hawkw commented Sep 30, 2021

hawkw left a comment

hawkw Sep 30, 2021

hawkw Sep 30, 2021

Help LLVM understand that some spans are never going to do anything #1600

Help LLVM understand that some spans are never going to do anything #1600

Conversation

oli-obk commented Sep 29, 2021 • edited by hawkw Loading

Motivation

Solution

hawkw left a comment

Choose a reason for hiding this comment

hawkw left a comment

Choose a reason for hiding this comment

hawkw commented Sep 29, 2021

oli-obk commented Sep 29, 2021

hawkw commented Sep 29, 2021

oli-obk commented Sep 30, 2021

hawkw commented Sep 30, 2021

hawkw left a comment

Choose a reason for hiding this comment

hawkw Sep 30, 2021

Choose a reason for hiding this comment

hawkw Sep 30, 2021

Choose a reason for hiding this comment

oli-obk commented Sep 29, 2021 •

edited by hawkw

Loading