Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
tracing: ensmallerate assembly generated by macro expansions (#943)
## Motivation Currently, tracing's macros generate a *lot* of code at the expansion site, most of which is not hidden behind function calls. Additionally, several functions called in the generated code are inlined when they probably shouldn't be. This results in a really gross disassembly for code that calls a `tracing` macro. Inlining code _can_ be faster, by avoiding function calls. However, so much code can have some negative impacts. Obviously, the output binary is much larger. Perhaps more importantly, though, this code may optimize much less well --- for example, functions that contain tracing macros are unlikely to be inlined, since they are quite large. ## Solution This branch "outlines" most of the macro code, with the exception of the two hottest paths for skipping a disabled callsite (the global max level and the per-callsite cache). Actually recording the span or event, performing runtime filtering via `Subscriber::enabled`, and registering the callsite for the first time, are behind a function call. This means that the fast paths still don't require a stack frame, but the amount of code they jump past is _significantly_ shorter. Also, while working on this, I noticed that the event macro expansions were hitting the dispatcher thread-local two separate times for enabled events. That's not super great. I folded these together into one `LocalKey::with` call, so that we don't access the thread-local twice. Building `ValueSet`s to actually record a span or event is the path that generates the most code at the callsite currently. In order to put this behind a function call, it was necessary to capture the local variables that comprise the `ValueSet` using a closure. While closures have impacted build times in the past, my guess is that this may be offset a bit by generating way less code overall. It was also necessary to add new `dispatcher`-accessing functions that take a `FnOnce` rather than a `FnMut`. This is because fields are _sometimes_ moved into the `ValueSet` by value rather than by ref, when they are inside of a call to `tracing::debug` or `display` that doesn't itself borrow the value. Not supporting this would be a breaking change, so we had to make it work. Capturing a `FnOnce` into the `FnMut` in an `&mut Option` seemed inefficient, so I did this instead. ## Before & After Here's the disassembly of the following function: ```rust #[inline(never)] pub fn event() { tracing::info!("hello world"); } ``` <details> <summary>On master:</summary> ```assembly event: ; pub fn event() { push r15 push r14 push r12 push rbx sub rsp, 184 ; Relaxed => intrinsics::atomic_load_relaxed(dst), lea rax, [rip + 227515] mov rax, qword ptr [rax] ; Self::ERROR_USIZE => Self::ERROR, add rax, -3 cmp rax, 3 jb 395 <event+0x1b1> lea rbx, [rip + 227427] mov qword ptr [rsp + 96], rbx ; Acquire => intrinsics::atomic_load_acq(dst), mov rax, qword ptr [rip + 227431] ; self.state_and_queue.load(Ordering::Acquire) == COMPLETE cmp rax, 3 ; if self.is_completed() { jne 36 <event+0x63> mov qword ptr [rsp], rbx ; Relaxed => intrinsics::atomic_load_relaxed(dst), mov rax, qword ptr [rip + 227398] ; if interest.is_always() { test rax, -3 je 86 <event+0xa8> mov rdi, rsp ; crate::dispatcher::get_default(|current| current.enabled(self.meta)) call -922 <tracing_core::dispatcher::get_default::h8c0486e541dd0400> ; tracing::info!("hello world"); test al, al jne 84 <event+0xb2> jmp 334 <event+0x1b1> lea rax, [rsp + 96] ; let mut f = Some(f); mov qword ptr [rsp + 136], rax lea rax, [rsp + 136] ; self.call_inner(false, &mut |_| f.take().unwrap()()); mov qword ptr [rsp], rax lea rdi, [rip + 227357] lea rcx, [rip + 219174] mov rdx, rsp xor esi, esi call qword ptr [rip + 226083] mov qword ptr [rsp], rbx ; Relaxed => intrinsics::atomic_load_relaxed(dst), mov rax, qword ptr [rip + 227312] ; if interest.is_always() { test rax, -3 jne -86 <event+0x52> ; 0 => Interest::never(), cmp rax, 2 ; tracing::info!("hello world"); jne 255 <event+0x1b1> ; &self.meta mov rbx, qword ptr [rip + 227295] ; tracing::info!("hello world"); lea r14, [rip + 2528] mov rdi, rbx call r14 lea r12, [rsp + 136] mov rdi, r12 mov rsi, rax call qword ptr [rip + 225926] mov rdi, rbx call r14 mov r14, rax mov r15, rsp mov rdi, r15 mov rsi, r12 call qword ptr [rip + 225926] ; Some(val) => val, cmp qword ptr [rsp + 8], 0 je 194 <event+0x1c0> mov rax, qword ptr [rsp + 32] mov qword ptr [rsp + 128], rax movups xmm0, xmmword ptr [rsp] movups xmm1, xmmword ptr [rsp + 16] movaps xmmword ptr [rsp + 112], xmm1 movaps xmmword ptr [rsp + 96], xmm0 ; Arguments { pieces, fmt: None, args } lea rax, [rip + 218619] mov qword ptr [rsp], rax mov qword ptr [rsp + 8], 1 mov qword ptr [rsp + 16], 0 lea rax, [rip + 165886] mov qword ptr [rsp + 32], rax mov qword ptr [rsp + 40], 0 lea rax, [rsp + 96] ; tracing::info!("hello world"); mov qword ptr [rsp + 72], rax mov qword ptr [rsp + 80], r15 lea rax, [rip + 218570] mov qword ptr [rsp + 88], rax lea rax, [rsp + 72] ; ValueSet { mov qword ptr [rsp + 48], rax mov qword ptr [rsp + 56], 1 mov qword ptr [rsp + 64], r14 lea rax, [rsp + 48] ; Event { mov qword ptr [rsp + 136], rax mov qword ptr [rsp + 144], rbx mov qword ptr [rsp + 152], 1 lea rdi, [rsp + 136] ; crate::dispatcher::get_default(|current| { call -1777 <tracing_core::dispatcher::get_default::h32aa5a3d270b4ae2> ; } add rsp, 184 pop rbx pop r12 pop r14 pop r15 ret ; None => expect_failed(msg), lea rdi, [rip + 165696] lea rdx, [rip + 218426] mov esi, 34 call qword ptr [rip + 226383] ud2 ``` </details> <details> <summary>And on this branch:</summary> ```assembly event: ; pub fn event() { sub rsp, 24 ; Relaxed => intrinsics::atomic_load_relaxed(dst), lea rax, [rip + 219189] mov rax, qword ptr [rax] ; Self::ERROR_USIZE => Self::ERROR, add rax, -3 cmp rax, 3 jb 59 <event+0x53> ; Relaxed => intrinsics::atomic_load_relaxed(dst), mov rcx, qword ptr [rip + 219105] ; 0 => Interest::never(), test rcx, rcx je 47 <event+0x53> mov al, 1 cmp rcx, 1 je 8 <event+0x34> cmp rcx, 2 jne 38 <event+0x58> mov al, 2 lea rcx, [rip + 219077] mov qword ptr [rsp + 16], rcx mov byte ptr [rsp + 15], al lea rdi, [rsp + 15] lea rsi, [rsp + 16] ; tracing_core::dispatcher::get_current(|current| { call -931 <tracing_core::dispatcher::get_current::he31e3ce09e66e5fa> ; } add rsp, 24 ret ; _ => self.register(), lea rdi, [rip + 219041] call qword ptr [rip + 218187] ; InterestKind::Never => true, test al, al ; tracing::info!("hello world"); jne -53 <event+0x34> jmp -24 <event+0x53> nop dword ptr [rax + rax] ``` </details> So, uh...that seems better. Signed-off-by: Eliza Weisman <[email protected]>
- Loading branch information