-
Notifications
You must be signed in to change notification settings - Fork 741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help LLVM understand that some spans are never going to do anything #1600
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is very cool!
It seems like it should also be possible to do something similar for #[instrument]
ed async blocks --- we could just not add the instrument
combinator at all when the span won't be enabled, avoiding the extra per-poll overhead of entering the disabled span. However, we should probably do that in a follow-up branch, rather than this one.
This change looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lovely, thank you!
As an aside, I think we should be able to make the same change in the tracing/tracing-attributes/src/lib.rs Lines 540 to 553 in c945ac0
and I think we'll see the same benefits? |
yes, once the perf is through and shows that this actually works, I will add the same change to the async and err branches |
cool, i'll hold off on merging this PR until then! I think the change in the async case will be somewhat significantly different than non-async, but it should still be possible to make a similar optimization... |
I have finished optimizing the other code paths. Perfbot also says that this is working. One thing I'm wondering is whether it would be possible to backport this to tracing 0.1, as there is no published 0.2 candidate yet |
Yeah, AFAICT this should be pretty much trivial to backport to 0.1. I can handle that part, though --- I'll see about getting a release prepped today! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good to me, but I think we may actually be able to improve the case with futures even further. i'm going to go ahead and merge this PR, and improve things more in a follow-up.
thanks again for working on this!
if tracing::level_enabled!(#level) { | ||
tracing::Instrument::instrument( | ||
fut, | ||
__tracing_attr_span | ||
) | ||
.await |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case, since we are always creating the span, we could actually do this:
if tracing::level_enabled!(#level) { | |
tracing::Instrument::instrument( | |
fut, | |
__tracing_attr_span | |
) | |
.await | |
if !span.is_disabled() { | |
tracing::Instrument::instrument( | |
fut, | |
__tracing_attr_span | |
) | |
.await |
which would also let us skip instrumenting the future in the case where the span's level is enabled but it was disabled by the subscriber (e.g. if its target was not enabled)
if tracing::level_enabled!(#level) { | ||
tracing::Instrument::instrument( | ||
fut, | ||
__tracing_attr_span | ||
) | ||
.await | ||
} else { | ||
fut.await | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, we could probably also do the optimization of not creating the span at all if the level is disabled:
if tracing::level_enabled!(#level) { | |
tracing::Instrument::instrument( | |
fut, | |
__tracing_attr_span | |
) | |
.await | |
} else { | |
fut.await | |
} | |
if tracing::level_enabled!(#level) { | |
let __tracing_attr_span = #span; | |
if !span.is_disabled() { | |
return tracing::Instrument::instrument( | |
fut, | |
__tracing_attr_span | |
) | |
.await; | |
} | |
fut.await |
…o anything (#1600) (#1605) ## Motivation Adding `#[instrument(level = "debug")]` attributes to functions in rustc caused a performance regression (in release, where `debug!` is fully optimized out) across all crates: rust-lang/rust#89048 (comment) While trying to debug this, I noticed that spans don't have the same advantage that events have wrt to how LLVM sees them. Spans (or more precisely, the enter-guard), will get dropped at the end of the scope, which throws a spanner into the LLVM optimization pipeline. I am not entirely sure where the problem is, but I am moderately certain that the issue is that even entering a dummy span is too much code for LLVM to reliably (or at all) optimize out. ## Solution My hope is that in trusting the Rust compiler to generate cool code when using drop flags, we can essentially generate a drop flag that depends on something we know (due to events working as expected) to be optimizable. So instead of doing ```rust let _x = span!(); let _y = _x.enter(); // lotsa code drop(_y) ``` we do ```rust let _x; let _y; let must_drop = false; if level_enabled!(DEBUG) { must_drop = true; _x = span!(); _y = _x.enter(); } // lotsa code if must_drop { drop(_y) } ``` I believe this will allow LLVM to properly optimize this again. Testing that right now, but I wanted to open this PR immediately for review.
## Motivation In #1600, the `instrument` code generation was changed to avoid ever constructing a `Span` struct if the level is explicitly disabled. However, for `async` functions, `#[instrument]` will currently still create the span, but simply skips constructing an `Instrument` future if the level is disabled. ## Solution This branch changes the `#[instrument]` code generation for async blocks to totally skip constructing the span if the level is disabled. I also simplfied the code generation a bit by combining the shared code between the `err` and non-`err` cases, reducing code duplication a bit. Signed-off-by: Eliza Weisman <[email protected]>
## Motivation In #1600, the `instrument` code generation was changed to avoid ever constructing a `Span` struct if the level is explicitly disabled. However, for `async` functions, `#[instrument]` will currently still create the span, but simply skips constructing an `Instrument` future if the level is disabled. ## Solution This branch changes the `#[instrument]` code generation for async blocks to totally skip constructing the span if the level is disabled. I also simplfied the code generation a bit by combining the shared code between the `err` and non-`err` cases, reducing code duplication a bit. Signed-off-by: Eliza Weisman <[email protected]>
## Motivation In #1600, the `instrument` code generation was changed to avoid ever constructing a `Span` struct if the level is explicitly disabled. However, for `async` functions, `#[instrument]` will currently still create the span, but simply skips constructing an `Instrument` future if the level is disabled. ## Solution This branch changes the `#[instrument]` code generation for async blocks to totally skip constructing the span if the level is disabled. I also simplfied the code generation a bit by combining the shared code between the `err` and non-`err` cases, reducing code duplication a bit. Signed-off-by: Eliza Weisman <[email protected]>
# 0.1.17 (October 1, 2021) This release significantly improves performance when `#[instrument]`-generated spans are below the maximum enabled level. ### Added - improve performance when skipping `#[instrument]`-generated spans below the max level ([#1600], [#1605]) Thanks to @oli-obk for contributing to this release! [#1600]: #1600 [#1605]: #1605
# 0.1.17 (October 1, 2021) This release significantly improves performance when `#[instrument]`-generated spans are below the maximum enabled level. ### Added - improve performance when skipping `#[instrument]`-generated spans below the max level ([#1600], [#1605]) Thanks to @oli-obk for contributing to this release! [#1600]: #1600 [#1605]: #1605
…imulacrum Fix performance regression with #[instrument] linked tracing PR: tokio-rs/tracing#1600 regression introduced by rust-lang#89048
# 0.1.29 (October 5th, 2021 This release adds support for recording `Option<T> where T: Value` as typed `tracing` field values. It also includes significant performance improvements for functions annotated with the `#[instrument]` attribute when the generated span is disabled. ### Changed - `tracing-core`: updated to v0.1.21 - `tracing-attributes`: updated to v0.1.19 ### Added - **field**: `Value` impl for `Option<T> where T: Value` ([#1585]) - **attributes**: - improved performance when skipping `#[instrument]`-generated spans below the max level ([#1600], [#1605], [#1614], [#1616], [#1617]) ### Fixed - **instrument**: added missing `Future` implementation for `WithSubscriber`, making the `WithDispatch` extension trait actually useable ([#1602]) - Documentation fixes and improvements ([#1595], [#1601], [#1597]) Thanks to @BrianBurgers, @mattiast, @DCjanus, @oli-obk, and @matklad for contributing to this release! [#1585]: #1585 [#1595]: #1596 [#1597]: #1597 [#1600]: #1600 [#1601]: #1601 [#1602]: #1602 [#1614]: #1614 [#1616]: #1616 [#1617]: #1617
# 0.1.29 (October 5th, 2021 This release adds support for recording `Option<T> where T: Value` as typed `tracing` field values. It also includes significant performance improvements for functions annotated with the `#[instrument]` attribute when the generated span is disabled. ### Changed - `tracing-core`: updated to v0.1.21 - `tracing-attributes`: updated to v0.1.19 ### Added - **field**: `Value` impl for `Option<T> where T: Value` ([#1585]) - **attributes**: - improved performance when skipping `#[instrument]`-generated spans below the max level ([#1600], [#1605], [#1614], [#1616], [#1617]) ### Fixed - **instrument**: added missing `Future` implementation for `WithSubscriber`, making the `WithDispatch` extension trait actually useable ([#1602]) - Documentation fixes and improvements ([#1595], [#1601], [#1597]) Thanks to @BrianBurgers, @mattiast, @DCjanus, @oli-obk, and @matklad for contributing to this release! [#1585]: #1585 [#1595]: #1596 [#1597]: #1597 [#1600]: #1600 [#1601]: #1601 [#1602]: #1602 [#1614]: #1614 [#1616]: #1616 [#1617]: #1617
# 0.1.29 (October 5th, 2021 This release adds support for recording `Option<T> where T: Value` as typed `tracing` field values. It also includes significant performance improvements for functions annotated with the `#[instrument]` attribute when the generated span is disabled. ### Changed - `tracing-core`: updated to v0.1.21 - `tracing-attributes`: updated to v0.1.19 ### Added - **field**: `Value` impl for `Option<T> where T: Value` ([#1585]) - **attributes**: - improved performance when skipping `#[instrument]`-generated spans below the max level ([#1600], [#1605], [#1614], [#1616], [#1617]) ### Fixed - **instrument**: added missing `Future` implementation for `WithSubscriber`, making the `WithDispatch` extension trait actually useable ([#1602]) - Documentation fixes and improvements ([#1595], [#1601], [#1597]) Thanks to @BrianBurgers, @mattiast, @DCjanus, @oli-obk, and @matklad for contributing to this release! [#1585]: #1585 [#1595]: #1596 [#1597]: #1597 [#1600]: #1600 [#1601]: #1601 [#1602]: #1602 [#1605]: #1605 [#1614]: #1614 [#1616]: #1616 [#1617]: #1617
# 0.1.17 (October 1, 2021) This release significantly improves performance when `#[instrument]`-generated spans are below the maximum enabled level. ### Added - improve performance when skipping `#[instrument]`-generated spans below the max level ([tokio-rs#1600], [tokio-rs#1605]) Thanks to @oli-obk for contributing to this release! [tokio-rs#1600]: tokio-rs#1600 [tokio-rs#1605]: tokio-rs#1605
# 0.1.29 (October 5th, 2021 This release adds support for recording `Option<T> where T: Value` as typed `tracing` field values. It also includes significant performance improvements for functions annotated with the `#[instrument]` attribute when the generated span is disabled. ### Changed - `tracing-core`: updated to v0.1.21 - `tracing-attributes`: updated to v0.1.19 ### Added - **field**: `Value` impl for `Option<T> where T: Value` ([tokio-rs#1585]) - **attributes**: - improved performance when skipping `#[instrument]`-generated spans below the max level ([tokio-rs#1600], [tokio-rs#1605], [tokio-rs#1614], [tokio-rs#1616], [tokio-rs#1617]) ### Fixed - **instrument**: added missing `Future` implementation for `WithSubscriber`, making the `WithDispatch` extension trait actually useable ([tokio-rs#1602]) - Documentation fixes and improvements ([tokio-rs#1595], [tokio-rs#1601], [tokio-rs#1597]) Thanks to @BrianBurgers, @mattiast, @DCjanus, @oli-obk, and @matklad for contributing to this release! [tokio-rs#1585]: tokio-rs#1585 [tokio-rs#1595]: tokio-rs#1596 [tokio-rs#1597]: tokio-rs#1597 [tokio-rs#1600]: tokio-rs#1600 [tokio-rs#1601]: tokio-rs#1601 [tokio-rs#1602]: tokio-rs#1602 [tokio-rs#1605]: tokio-rs#1605 [tokio-rs#1614]: tokio-rs#1614 [tokio-rs#1616]: tokio-rs#1616 [tokio-rs#1617]: tokio-rs#1617
Motivation
Adding
#[instrument(level = "debug")]
attributes to functions in rustccaused a performance regression (in release, where
debug!
is fullyoptimized out) across all crates:
rust-lang/rust#89048 (comment)
While trying to debug this, I noticed that spans don't have the same
advantage that events have wrt to how LLVM sees them. Spans (or more
precisely, the enter-guard), will get dropped at the end of the scope,
which throws a spanner into the LLVM optimization pipeline. I am not
entirely sure where the problem is, but I am moderately certain that the
issue is that even entering a dummy span is too much code for LLVM to
reliably (or at all) optimize out.
Solution
My hope is that in trusting the Rust compiler to generate cool code when using
drop flags, we can essentially generate a drop flag that depends on something we
know (due to events working as expected) to be optimizable.
So instead of doing
we do
I believe this will allow LLVM to properly optimize this again. Testing that
right now, but I wanted to open this PR immediately for review.