Skip to content

Commit

Permalink
tracing: ensmallerate assembly generated by macro expansions (#943)
Browse files Browse the repository at this point in the history
## Motivation

Currently, tracing's macros generate a *lot* of code at the expansion
site, most of which is not hidden behind function calls. Additionally,
several functions called in the generated code are inlined when they
probably shouldn't be. This results in a really gross disassembly for
code that calls a `tracing` macro.

Inlining code _can_ be faster, by avoiding function calls. However, so
much code can have some negative impacts. Obviously, the output binary
is much larger. Perhaps more importantly, though, this code may optimize
much less well --- for example, functions that contain tracing macros
are unlikely to be inlined, since they are quite large. 

## Solution

This branch "outlines" most of the macro code, with the exception of the
two hottest paths for skipping a disabled callsite (the global max level
and the per-callsite cache). Actually recording the span or event,
performing runtime filtering via `Subscriber::enabled`, and registering
the callsite for the first time, are behind a function call. This means
that the fast paths still don't require a stack frame, but the amount of
code they jump past is _significantly_ shorter.

Also, while working on this, I noticed that the event macro expansions
were hitting the dispatcher thread-local two separate times for enabled
events. That's not super great. I folded these together into one
`LocalKey::with` call, so that we don't access the thread-local twice.

Building `ValueSet`s to actually record a span or event is the path that
generates the most code at the callsite currently. In order to put this
behind a function call, it was necessary to capture the local variables
that comprise the `ValueSet` using a closure. While closures have
impacted build times in the past, my guess is that this may be offset a
bit by generating way less code overall. It was also necessary to add
new `dispatcher`-accessing functions that take a `FnOnce` rather than a
`FnMut`. This is because fields are _sometimes_ moved into the
`ValueSet` by value rather than by ref, when they are inside of a call
to `tracing::debug` or `display` that doesn't itself borrow the value.
Not supporting this would be a breaking change, so we had to make it
work. Capturing a `FnOnce` into the `FnMut` in an `&mut Option` seemed
inefficient, so I did this instead.

## Before & After

Here's the disassembly of the following function:
```rust
#[inline(never)]
pub fn event() {
    tracing::info!("hello world");
}
```

<details>
<summary>On master:</summary>

```assembly
event:
; pub fn event() {
               	push	r15
               	push	r14
               	push	r12
               	push	rbx
               	sub	rsp, 184
;         Relaxed => intrinsics::atomic_load_relaxed(dst),
               	lea	rax, [rip + 227515]
               	mov	rax, qword ptr [rax]
;             Self::ERROR_USIZE => Self::ERROR,
               	add	rax, -3
               	cmp	rax, 3
               	jb	395 <event+0x1b1>
               	lea	rbx, [rip + 227427]
               	mov	qword ptr [rsp + 96], rbx
;         Acquire => intrinsics::atomic_load_acq(dst),
               	mov	rax, qword ptr [rip + 227431]
;         self.state_and_queue.load(Ordering::Acquire) == COMPLETE
               	cmp	rax, 3
;         if self.is_completed() {
               	jne	36 <event+0x63>
               	mov	qword ptr [rsp], rbx
;         Relaxed => intrinsics::atomic_load_relaxed(dst),
               	mov	rax, qword ptr [rip + 227398]
;             if interest.is_always() {
               	test	rax, -3
               	je	86 <event+0xa8>
               	mov	rdi, rsp
;             crate::dispatcher::get_default(|current| current.enabled(self.meta))
               	call	-922 <tracing_core::dispatcher::get_default::h8c0486e541dd0400>
;     tracing::info!("hello world");
               	test	al, al
               	jne	84 <event+0xb2>
               	jmp	334 <event+0x1b1>
               	lea	rax, [rsp + 96]
;         let mut f = Some(f);
               	mov	qword ptr [rsp + 136], rax
               	lea	rax, [rsp + 136]
;         self.call_inner(false, &mut |_| f.take().unwrap()());
               	mov	qword ptr [rsp], rax
               	lea	rdi, [rip + 227357]
               	lea	rcx, [rip + 219174]
               	mov	rdx, rsp
               	xor	esi, esi
               	call	qword ptr [rip + 226083]
               	mov	qword ptr [rsp], rbx
;         Relaxed => intrinsics::atomic_load_relaxed(dst),
               	mov	rax, qword ptr [rip + 227312]
;             if interest.is_always() {
               	test	rax, -3
               	jne	-86 <event+0x52>
;                 0 => Interest::never(),
               	cmp	rax, 2
;     tracing::info!("hello world");
               	jne	255 <event+0x1b1>
;             &self.meta
               	mov	rbx, qword ptr [rip + 227295]
;     tracing::info!("hello world");
               	lea	r14, [rip + 2528]
               	mov	rdi, rbx
               	call	r14
               	lea	r12, [rsp + 136]
               	mov	rdi, r12
               	mov	rsi, rax
               	call	qword ptr [rip + 225926]
               	mov	rdi, rbx
               	call	r14
               	mov	r14, rax
               	mov	r15, rsp
               	mov	rdi, r15
               	mov	rsi, r12
               	call	qword ptr [rip + 225926]
;             Some(val) => val,
               	cmp	qword ptr [rsp + 8], 0
               	je	194 <event+0x1c0>
               	mov	rax, qword ptr [rsp + 32]
               	mov	qword ptr [rsp + 128], rax
               	movups	xmm0, xmmword ptr [rsp]
               	movups	xmm1, xmmword ptr [rsp + 16]
               	movaps	xmmword ptr [rsp + 112], xmm1
               	movaps	xmmword ptr [rsp + 96], xmm0
;         Arguments { pieces, fmt: None, args }
               	lea	rax, [rip + 218619]
               	mov	qword ptr [rsp], rax
               	mov	qword ptr [rsp + 8], 1
               	mov	qword ptr [rsp + 16], 0
               	lea	rax, [rip + 165886]
               	mov	qword ptr [rsp + 32], rax
               	mov	qword ptr [rsp + 40], 0
               	lea	rax, [rsp + 96]
;     tracing::info!("hello world");
               	mov	qword ptr [rsp + 72], rax
               	mov	qword ptr [rsp + 80], r15
               	lea	rax, [rip + 218570]
               	mov	qword ptr [rsp + 88], rax
               	lea	rax, [rsp + 72]
;         ValueSet {
               	mov	qword ptr [rsp + 48], rax
               	mov	qword ptr [rsp + 56], 1
               	mov	qword ptr [rsp + 64], r14
               	lea	rax, [rsp + 48]
;         Event {
               	mov	qword ptr [rsp + 136], rax
               	mov	qword ptr [rsp + 144], rbx
               	mov	qword ptr [rsp + 152], 1
               	lea	rdi, [rsp + 136]
;         crate::dispatcher::get_default(|current| {
               	call	-1777 <tracing_core::dispatcher::get_default::h32aa5a3d270b4ae2>
; }
               	add	rsp, 184
               	pop	rbx
               	pop	r12
               	pop	r14
               	pop	r15
               	ret
;             None => expect_failed(msg),
               	lea	rdi, [rip + 165696]
               	lea	rdx, [rip + 218426]
               	mov	esi, 34
               	call	qword ptr [rip + 226383]
               	ud2
```

</details>

<details>

<summary>And on this branch:</summary>

```assembly
event:
; pub fn event() {
               	sub	rsp, 24
;         Relaxed => intrinsics::atomic_load_relaxed(dst),
               	lea	rax, [rip + 219189]
               	mov	rax, qword ptr [rax]
;             Self::ERROR_USIZE => Self::ERROR,
               	add	rax, -3
               	cmp	rax, 3
               	jb	59 <event+0x53>
;         Relaxed => intrinsics::atomic_load_relaxed(dst),
               	mov	rcx, qword ptr [rip + 219105]
;                 0 => Interest::never(),
               	test	rcx, rcx
               	je	47 <event+0x53>
               	mov	al, 1
               	cmp	rcx, 1
               	je	8 <event+0x34>
               	cmp	rcx, 2
               	jne	38 <event+0x58>
               	mov	al, 2
               	lea	rcx, [rip + 219077]
               	mov	qword ptr [rsp + 16], rcx
               	mov	byte ptr [rsp + 15], al
               	lea	rdi, [rsp + 15]
               	lea	rsi, [rsp + 16]
;             tracing_core::dispatcher::get_current(|current| {
               	call	-931 <tracing_core::dispatcher::get_current::he31e3ce09e66e5fa>
; }
               	add	rsp, 24
               	ret
;                 _ => self.register(),
               	lea	rdi, [rip + 219041]
               	call	qword ptr [rip + 218187]
;             InterestKind::Never => true,
               	test	al, al
;     tracing::info!("hello world");
               	jne	-53 <event+0x34>
               	jmp	-24 <event+0x53>
               	nop	dword ptr [rax + rax]
```

</details>

So, uh...that seems better.

Signed-off-by: Eliza Weisman <[email protected]>
  • Loading branch information
hawkw authored Aug 21, 2020
1 parent 5e48505 commit 2692560
Show file tree
Hide file tree
Showing 7 changed files with 308 additions and 193 deletions.
6 changes: 6 additions & 0 deletions tracing-core/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# Unreleased

## Added

- Internal API changes to support optimizations in the `tracing` crate (#943)

# 0.1.14 (August 10, 2020)

### Fixed
Expand Down
2 changes: 1 addition & 1 deletion tracing-core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ name = "tracing-core"
# - README.md
# - Update CHANGELOG.md.
# - Create "v0.1.x" git tag.
version = "0.1.14"
version = "0.1.15"
authors = ["Tokio Contributors <[email protected]>"]
license = "MIT"
readme = "README.md"
Expand Down
102 changes: 76 additions & 26 deletions tracing-core/src/dispatcher.rs
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ use crate::stdlib::{

#[cfg(feature = "std")]
use crate::stdlib::{
cell::{Cell, RefCell},
cell::{Cell, RefCell, RefMut},
error,
};

Expand Down Expand Up @@ -193,6 +193,12 @@ struct State {
can_enter: Cell<bool>,
}

/// While this guard is active, additional calls to subscriber functions on
/// the default dispatcher will not be able to access the dispatch context.
/// Dropping the guard will allow the dispatch context to be re-entered.
#[cfg(feature = "std")]
struct Entered<'a>(&'a State);

/// A guard that resets the current default dispatcher to the prior
/// default dispatcher when dropped.
#[cfg(feature = "std")]
Expand Down Expand Up @@ -325,38 +331,46 @@ pub fn get_default<T, F>(mut f: F) -> T
where
F: FnMut(&Dispatch) -> T,
{
// While this guard is active, additional calls to subscriber functions on
// the default dispatcher will not be able to access the dispatch context.
// Dropping the guard will allow the dispatch context to be re-entered.
struct Entered<'a>(&'a Cell<bool>);
impl<'a> Drop for Entered<'a> {
#[inline]
fn drop(&mut self) {
self.0.set(true);
}
}

CURRENT_STATE
.try_with(|state| {
if state.can_enter.replace(false) {
let _guard = Entered(&state.can_enter);

let mut default = state.default.borrow_mut();

if default.is::<NoSubscriber>() {
if let Some(global) = get_global() {
// don't redo this call on the next check
*default = global.clone();
}
}
f(&*default)
} else {
f(&Dispatch::none())
if let Some(entered) = state.enter() {
return f(&*entered.current());
}

f(&Dispatch::none())
})
.unwrap_or_else(|_| f(&Dispatch::none()))
}

/// Executes a closure with a reference to this thread's current [dispatcher].
///
/// Note that calls to `get_default` should not be nested; if this function is
/// called while inside of another `get_default`, that closure will be provided
/// with `Dispatch::none` rather than the previously set dispatcher.
///
/// [dispatcher]: ../dispatcher/struct.Dispatch.html
#[cfg(feature = "std")]
#[doc(hidden)]
#[inline(never)]
pub fn get_current<T>(f: impl FnOnce(&Dispatch) -> T) -> Option<T> {
CURRENT_STATE
.try_with(|state| {
let entered = state.enter()?;
Some(f(&*entered.current()))
})
.ok()?
}

/// Executes a closure with a reference to the current [dispatcher].
///
/// [dispatcher]: ../dispatcher/struct.Dispatch.html
#[cfg(not(feature = "std"))]
#[doc(hidden)]
pub fn get_current<T>(f: impl FnOnce(&Dispatch) -> T) -> Option<T> {
let dispatch = get_global()?;
Some(f(&dispatch))
}

/// Executes a closure with a reference to the current [dispatcher].
///
/// [dispatcher]: ../dispatcher/struct.Dispatch.html
Expand Down Expand Up @@ -711,6 +725,42 @@ impl State {
EXISTS.store(true, Ordering::Release);
DefaultGuard(prior)
}

#[inline]
fn enter(&self) -> Option<Entered<'_>> {
if self.can_enter.replace(false) {
Some(Entered(&self))
} else {
None
}
}
}

// ===== impl Entered =====

#[cfg(feature = "std")]
impl<'a> Entered<'a> {
#[inline]
fn current(&self) -> RefMut<'a, Dispatch> {
let mut default = self.0.default.borrow_mut();

if default.is::<NoSubscriber>() {
if let Some(global) = get_global() {
// don't redo this call on the next check
*default = global.clone();
}
}

default
}
}

#[cfg(feature = "std")]
impl<'a> Drop for Entered<'a> {
#[inline]
fn drop(&mut self) {
self.0.can_enter.set(true);
}
}

// ===== impl DefaultGuard =====
Expand Down
2 changes: 1 addition & 1 deletion tracing/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ keywords = ["logging", "tracing", "metrics", "async"]
edition = "2018"

[dependencies]
tracing-core = { path = "../tracing-core", version = "0.1.14", default-features = false }
tracing-core = { path = "../tracing-core", version = "0.1.15", default-features = false }
log = { version = "0.4", optional = true }
tracing-attributes = { path = "../tracing-attributes", version = "0.1.10", optional = true }
cfg-if = "0.1.10"
Expand Down
88 changes: 60 additions & 28 deletions tracing/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -916,9 +916,9 @@ pub mod subscriber;

#[doc(hidden)]
pub mod __macro_support {
pub use crate::callsite::Callsite as _;
pub use crate::callsite::Callsite;
use crate::stdlib::sync::atomic::{AtomicUsize, Ordering};
use crate::{subscriber::Interest, Callsite, Metadata};
use crate::{subscriber::Interest, Metadata};
use tracing_core::Once;

/// Callsite implementation used by macro-generated code.
Expand Down Expand Up @@ -947,59 +947,91 @@ pub mod __macro_support {
/// without warning.
pub const fn new(meta: &'static Metadata<'static>) -> Self {
Self {
interest: AtomicUsize::new(0),
interest: AtomicUsize::new(0xDEADFACED),
meta,
registration: Once::new(),
}
}

/// Returns `true` if the callsite is enabled by a cached interest, or
/// by the current `Dispatch`'s `enabled` method if the cached
/// `Interest` is `sometimes`.
/// Registers this callsite with the global callsite registry.
///
/// If the callsite is already registered, this does nothing.
///
/// /!\ WARNING: This is *not* a stable API! /!\
/// This method, and all code contained in the `__macro_support` module, is
/// a *private* API of `tracing`. It is exposed publicly because it is used
/// by the `tracing` macros, but it is not part of the stable versioned API.
/// Breaking changes to this module may occur in small-numbered versions
/// without warning.
#[inline(always)]
pub fn is_enabled(&self) -> bool {
let interest = self.interest();
if interest.is_always() {
return true;
}
if interest.is_never() {
return false;
#[inline(never)]
// This only happens once (or if the cached interest value was corrupted).
#[cold]
pub fn register(&'static self) -> Interest {
self.registration
.call_once(|| crate::callsite::register(self));
match self.interest.load(Ordering::Relaxed) {
0 => Interest::never(),
2 => Interest::always(),
_ => Interest::sometimes(),
}

crate::dispatcher::get_default(|current| current.enabled(self.meta))
}

/// Registers this callsite with the global callsite registry.
///
/// If the callsite is already registered, this does nothing.
/// Returns the callsite's cached Interest, or registers it for the
/// first time if it has not yet been registered.
///
/// /!\ WARNING: This is *not* a stable API! /!\
/// This method, and all code contained in the `__macro_support` module, is
/// a *private* API of `tracing`. It is exposed publicly because it is used
/// by the `tracing` macros, but it is not part of the stable versioned API.
/// Breaking changes to this module may occur in small-numbered versions
/// without warning.
#[inline(always)]
pub fn register(&'static self) {
self.registration
.call_once(|| crate::callsite::register(self));
}

#[inline(always)]
fn interest(&self) -> Interest {
#[inline]
pub fn interest(&'static self) -> Interest {
match self.interest.load(Ordering::Relaxed) {
0 => Interest::never(),
1 => Interest::sometimes(),
2 => Interest::always(),
_ => Interest::sometimes(),
_ => self.register(),
}
}

pub fn dispatch_event(&'static self, interest: Interest, f: impl FnOnce(&crate::Dispatch)) {
tracing_core::dispatcher::get_current(|current| {
if interest.is_always() || current.enabled(self.meta) {
f(current)
}
});
}

#[inline]
#[cfg(feature = "log")]
pub fn disabled_span(&self) -> crate::Span {
crate::Span::new_disabled(self.meta)
}

#[inline]
#[cfg(not(feature = "log"))]
pub fn disabled_span(&self) -> crate::Span {
crate::Span::none()
}

pub fn dispatch_span(
&'static self,
interest: Interest,
f: impl FnOnce(&crate::Dispatch) -> crate::Span,
) -> crate::Span {
if interest.is_never() {
return self.disabled_span();
}

tracing_core::dispatcher::get_current(|current| {
if interest.is_always() || current.enabled(self.meta) {
return f(current);
}
self.disabled_span()
})
.unwrap_or_else(|| self.disabled_span())
}
}

impl Callsite for MacroCallsite {
Expand Down
Loading

0 comments on commit 2692560

Please sign in to comment.