Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help LLVM understand that some spans are never going to do anything #1600

Merged
merged 4 commits into from
Sep 30, 2021

Conversation

oli-obk
Copy link
Contributor

@oli-obk oli-obk commented Sep 29, 2021

Motivation

Adding #[instrument(level = "debug")] attributes to functions in rustc
caused a performance regression (in release, where debug! is fully
optimized out) across all crates:
rust-lang/rust#89048 (comment)

While trying to debug this, I noticed that spans don't have the same
advantage that events have wrt to how LLVM sees them. Spans (or more
precisely, the enter-guard), will get dropped at the end of the scope,
which throws a spanner into the LLVM optimization pipeline. I am not
entirely sure where the problem is, but I am moderately certain that the
issue is that even entering a dummy span is too much code for LLVM to
reliably (or at all) optimize out.

Solution

My hope is that in trusting the Rust compiler to generate cool code when using
drop flags, we can essentially generate a drop flag that depends on something we
know (due to events working as expected) to be optimizable.

So instead of doing

let _x = span!();
let _y = _x.enter();
// lotsa code
drop(_y)

we do

let _x;
let _y;
let must_drop = false;
if level_enabled!(DEBUG) {
    must_drop = true;
    _x = span!();
    _y = _x.enter();
}
// lotsa code
if must_drop {
    drop(_y)
}

I believe this will allow LLVM to properly optimize this again. Testing that
right now, but I wanted to open this PR immediately for review.

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is very cool!

It seems like it should also be possible to do something similar for #[instrument]ed async blocks --- we could just not add the instrument combinator at all when the span won't be enabled, avoiding the extra per-poll overhead of entering the disabled span. However, we should probably do that in a follow-up branch, rather than this one.

This change looks good to me!

tracing-attributes/src/lib.rs Show resolved Hide resolved
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lovely, thank you!

tracing-attributes/src/lib.rs Show resolved Hide resolved
@hawkw
Copy link
Member

hawkw commented Sep 29, 2021

As an aside, I think we should be able to make the same change in the err case here as well:

} else if err {
quote_spanned!(block.span()=>
let __tracing_attr_span = #span;
let __tracing_attr_guard = __tracing_attr_span.enter();
#[allow(clippy::redundant_closure_call)]
match (move || #block)() {
#[allow(clippy::unit_arg)]
Ok(x) => Ok(x),
Err(e) => {
tracing::error!(error = %e);
Err(e)
}
}
)

and I think we'll see the same benefits?

@oli-obk
Copy link
Contributor Author

oli-obk commented Sep 29, 2021

yes, once the perf is through and shows that this actually works, I will add the same change to the async and err branches

@hawkw
Copy link
Member

hawkw commented Sep 29, 2021

yes, once the perf is through and shows that this actually works, I will add the same change to the async and err branches

cool, i'll hold off on merging this PR until then!

I think the change in the async case will be somewhat significantly different than non-async, but it should still be possible to make a similar optimization...

@oli-obk
Copy link
Contributor Author

oli-obk commented Sep 30, 2021

I have finished optimizing the other code paths. Perfbot also says that this is working.

One thing I'm wondering is whether it would be possible to backport this to tracing 0.1, as there is no published 0.2 candidate yet

@hawkw
Copy link
Member

hawkw commented Sep 30, 2021

One thing I'm wondering is whether it would be possible to backport this to tracing 0.1, as there is no published 0.2 candidate yet

Yeah, AFAICT this should be pretty much trivial to backport to 0.1. I can handle that part, though --- I'll see about getting a release prepped today!

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good to me, but I think we may actually be able to improve the case with futures even further. i'm going to go ahead and merge this PR, and improve things more in a follow-up.

thanks again for working on this!

Comment on lines +531 to +536
if tracing::level_enabled!(#level) {
tracing::Instrument::instrument(
fut,
__tracing_attr_span
)
.await
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case, since we are always creating the span, we could actually do this:

Suggested change
if tracing::level_enabled!(#level) {
tracing::Instrument::instrument(
fut,
__tracing_attr_span
)
.await
if !span.is_disabled() {
tracing::Instrument::instrument(
fut,
__tracing_attr_span
)
.await

which would also let us skip instrumenting the future in the case where the span's level is enabled but it was disabled by the subscriber (e.g. if its target was not enabled)

Comment on lines +531 to +539
if tracing::level_enabled!(#level) {
tracing::Instrument::instrument(
fut,
__tracing_attr_span
)
.await
} else {
fut.await
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, we could probably also do the optimization of not creating the span at all if the level is disabled:

Suggested change
if tracing::level_enabled!(#level) {
tracing::Instrument::instrument(
fut,
__tracing_attr_span
)
.await
} else {
fut.await
}
if tracing::level_enabled!(#level) {
let __tracing_attr_span = #span;
if !span.is_disabled() {
return tracing::Instrument::instrument(
fut,
__tracing_attr_span
)
.await;
}
fut.await

@hawkw hawkw merged commit e448aa3 into tokio-rs:master Sep 30, 2021
hawkw pushed a commit that referenced this pull request Oct 1, 2021
…o anything (#1600) (#1605)

## Motivation

Adding `#[instrument(level = "debug")]` attributes to functions in rustc
caused a performance regression (in release, where `debug!` is fully
optimized out) across all crates:
rust-lang/rust#89048 (comment)

While trying to debug this, I noticed that spans don't have the same
advantage that events have wrt to how LLVM sees them. Spans (or more
precisely, the enter-guard), will get dropped at the end of the scope,
which throws a spanner into the LLVM optimization pipeline. I am not
entirely sure where the problem is, but I am moderately certain that the
issue is that even entering a dummy span is too much code for LLVM to
reliably (or at all) optimize out.

## Solution

My hope is that in trusting the Rust compiler to generate cool code when using
drop flags, we can essentially generate a drop flag that depends on something we
know (due to events working as expected) to be optimizable.

So instead of doing

```rust
let _x = span!();
let _y = _x.enter();
// lotsa code
drop(_y)
```

we do

```rust
let _x;
let _y;
let must_drop = false;
if level_enabled!(DEBUG) {
    must_drop = true;
    _x = span!();
    _y = _x.enter();
}
// lotsa code
if must_drop {
    drop(_y)
}
```

I believe this will allow LLVM to properly optimize this again. Testing that
right now, but I wanted to open this PR immediately for review.
hawkw added a commit that referenced this pull request Oct 1, 2021
## Motivation

In #1600, the `instrument` code generation was changed to avoid ever
constructing a `Span` struct if the level is explicitly disabled.
However, for `async` functions, `#[instrument]` will currently still
create the span, but simply skips constructing an `Instrument` future if
the level is disabled.

## Solution

This branch changes the `#[instrument]` code generation for async blocks
to totally skip constructing the span if the level is disabled. I also
simplfied the code generation a bit by combining the shared code between
the `err` and non-`err` cases, reducing code duplication a bit.

Signed-off-by: Eliza Weisman <[email protected]>
hawkw added a commit that referenced this pull request Oct 1, 2021
## Motivation

In #1600, the `instrument` code generation was changed to avoid ever
constructing a `Span` struct if the level is explicitly disabled.
However, for `async` functions, `#[instrument]` will currently still
create the span, but simply skips constructing an `Instrument` future if
the level is disabled.

## Solution

This branch changes the `#[instrument]` code generation for async blocks
to totally skip constructing the span if the level is disabled. I also
simplfied the code generation a bit by combining the shared code between
the `err` and non-`err` cases, reducing code duplication a bit.

Signed-off-by: Eliza Weisman <[email protected]>
hawkw added a commit that referenced this pull request Oct 1, 2021
## Motivation

In #1600, the `instrument` code generation was changed to avoid ever
constructing a `Span` struct if the level is explicitly disabled.
However, for `async` functions, `#[instrument]` will currently still
create the span, but simply skips constructing an `Instrument` future if
the level is disabled.

## Solution

This branch changes the `#[instrument]` code generation for async blocks
to totally skip constructing the span if the level is disabled. I also
simplfied the code generation a bit by combining the shared code between
the `err` and non-`err` cases, reducing code duplication a bit.

Signed-off-by: Eliza Weisman <[email protected]>
hawkw added a commit that referenced this pull request Oct 1, 2021
# 0.1.17 (October 1, 2021)

This release significantly improves performance when
`#[instrument]`-generated spans are below the maximum enabled level.

### Added

- improve performance when skipping `#[instrument]`-generated spans
  below the max level ([#1600], [#1605])

Thanks to @oli-obk for contributing to this release!

[#1600]: #1600
[#1605]: #1605
hawkw added a commit that referenced this pull request Oct 1, 2021
# 0.1.17 (October 1, 2021)

This release significantly improves performance when
`#[instrument]`-generated spans are below the maximum enabled level.

### Added

- improve performance when skipping `#[instrument]`-generated spans
  below the max level ([#1600], [#1605])

Thanks to @oli-obk for contributing to this release!

[#1600]: #1600
[#1605]: #1605
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 5, 2021
…imulacrum

Fix performance regression with #[instrument]

linked tracing PR: tokio-rs/tracing#1600

regression introduced by rust-lang#89048
hawkw added a commit that referenced this pull request Oct 5, 2021
# 0.1.29 (October 5th, 2021

This release adds support for recording `Option<T> where T: Value` as
typed `tracing` field values. It also includes significant performance
improvements for functions annotated with the `#[instrument]` attribute
when the generated span is disabled.

### Changed

- `tracing-core`: updated to v0.1.21
- `tracing-attributes`: updated to v0.1.19

### Added

- **field**: `Value` impl for `Option<T> where T: Value` ([#1585])
- **attributes**: - improved performance when skipping
  `#[instrument]`-generated spans below the max level ([#1600], [#1605],
  [#1614], [#1616], [#1617])

### Fixed

- **instrument**: added missing `Future` implementation for
  `WithSubscriber`, making the `WithDispatch` extension trait actually
  useable ([#1602])
- Documentation fixes and improvements ([#1595], [#1601], [#1597])

Thanks to @BrianBurgers, @mattiast, @DCjanus, @oli-obk, and @matklad for
contributing to this release!

[#1585]: #1585
[#1595]: #1596
[#1597]: #1597
[#1600]: #1600
[#1601]: #1601
[#1602]: #1602
[#1614]: #1614
[#1616]: #1616
[#1617]: #1617
hawkw added a commit that referenced this pull request Oct 5, 2021
# 0.1.29 (October 5th, 2021

This release adds support for recording `Option<T> where T: Value` as
typed `tracing` field values. It also includes significant performance
improvements for functions annotated with the `#[instrument]` attribute
when the generated span is disabled.

### Changed

- `tracing-core`: updated to v0.1.21
- `tracing-attributes`: updated to v0.1.19

### Added

- **field**: `Value` impl for `Option<T> where T: Value` ([#1585])
- **attributes**: - improved performance when skipping
  `#[instrument]`-generated spans below the max level ([#1600], [#1605],
  [#1614], [#1616], [#1617])

### Fixed

- **instrument**: added missing `Future` implementation for
  `WithSubscriber`, making the `WithDispatch` extension trait actually
  useable ([#1602])
- Documentation fixes and improvements ([#1595], [#1601], [#1597])

Thanks to @BrianBurgers, @mattiast, @DCjanus, @oli-obk, and @matklad for
contributing to this release!

[#1585]: #1585
[#1595]: #1596
[#1597]: #1597
[#1600]: #1600
[#1601]: #1601
[#1602]: #1602
[#1614]: #1614
[#1616]: #1616
[#1617]: #1617
hawkw added a commit that referenced this pull request Oct 6, 2021
# 0.1.29 (October 5th, 2021

This release adds support for recording `Option<T> where T: Value` as
typed `tracing` field values. It also includes significant performance
improvements for functions annotated with the `#[instrument]` attribute
when the generated span is disabled.

### Changed

- `tracing-core`: updated to v0.1.21
- `tracing-attributes`: updated to v0.1.19

### Added

- **field**: `Value` impl for `Option<T> where T: Value` ([#1585])
- **attributes**: - improved performance when skipping
  `#[instrument]`-generated spans below the max level ([#1600], [#1605],
  [#1614], [#1616], [#1617])

### Fixed

- **instrument**: added missing `Future` implementation for
  `WithSubscriber`, making the `WithDispatch` extension trait actually
  useable ([#1602])
- Documentation fixes and improvements ([#1595], [#1601], [#1597])

Thanks to @BrianBurgers, @mattiast, @DCjanus, @oli-obk, and @matklad for
contributing to this release!

[#1585]: #1585
[#1595]: #1596
[#1597]: #1597
[#1600]: #1600
[#1601]: #1601
[#1602]: #1602
[#1605]: #1605
[#1614]: #1614
[#1616]: #1616
[#1617]: #1617
kaffarell pushed a commit to kaffarell/tracing that referenced this pull request May 22, 2024
# 0.1.17 (October 1, 2021)

This release significantly improves performance when
`#[instrument]`-generated spans are below the maximum enabled level.

### Added

- improve performance when skipping `#[instrument]`-generated spans
  below the max level ([tokio-rs#1600], [tokio-rs#1605])

Thanks to @oli-obk for contributing to this release!

[tokio-rs#1600]: tokio-rs#1600
[tokio-rs#1605]: tokio-rs#1605
kaffarell pushed a commit to kaffarell/tracing that referenced this pull request May 22, 2024
# 0.1.29 (October 5th, 2021

This release adds support for recording `Option<T> where T: Value` as
typed `tracing` field values. It also includes significant performance
improvements for functions annotated with the `#[instrument]` attribute
when the generated span is disabled.

### Changed

- `tracing-core`: updated to v0.1.21
- `tracing-attributes`: updated to v0.1.19

### Added

- **field**: `Value` impl for `Option<T> where T: Value` ([tokio-rs#1585])
- **attributes**: - improved performance when skipping
  `#[instrument]`-generated spans below the max level ([tokio-rs#1600], [tokio-rs#1605],
  [tokio-rs#1614], [tokio-rs#1616], [tokio-rs#1617])

### Fixed

- **instrument**: added missing `Future` implementation for
  `WithSubscriber`, making the `WithDispatch` extension trait actually
  useable ([tokio-rs#1602])
- Documentation fixes and improvements ([tokio-rs#1595], [tokio-rs#1601], [tokio-rs#1597])

Thanks to @BrianBurgers, @mattiast, @DCjanus, @oli-obk, and @matklad for
contributing to this release!

[tokio-rs#1585]: tokio-rs#1585
[tokio-rs#1595]: tokio-rs#1596
[tokio-rs#1597]: tokio-rs#1597
[tokio-rs#1600]: tokio-rs#1600
[tokio-rs#1601]: tokio-rs#1601
[tokio-rs#1602]: tokio-rs#1602
[tokio-rs#1605]: tokio-rs#1605
[tokio-rs#1614]: tokio-rs#1614
[tokio-rs#1616]: tokio-rs#1616
[tokio-rs#1617]: tokio-rs#1617
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants