task: add Tracing instrumentation to spawned tasks #2655

hawkw · 2020-07-11T19:17:51Z

Motivation

When debugging asynchronous systems, it can be very valuable to inspect
what tasks are currently active (see #2510). The tracing crate and
related libraries provide an interface for Rust libraries and
applications to emit and consume structured, contextual, and async-aware
diagnostic information. Because this diagnostic information is
structured and machine-readable, it is a better fit for the
task-tracking use case than textual logging — tracing spans can be
consumed to generate metrics ranging from a simple counter of active
tasks to histograms of poll durations, idle durations, and total task
lifetimes. This information is potentially valuable to both Tokio users
and to maintainers.

Additionally, tracing is maintained by the Tokio project and is
becoming widely adopted by other libraries in the "Tokio stack", such as
hyper, h2, and tonic and in other parts of the broader Rust
ecosystem. Therefore, it is suitable for use in Tokio itself.

Solution

This PR is an MVP for instrumenting Tokio with tracing spans. When the
"tracing" feature flag and the "tokio_unstable" cfg are both enabled,
every spawned future will be instrumented with a tracing span. If both
the cfg flag and the "tracing" feature flag are not set, tokio will
not depend on tracing, and no spans will be generated.

The generated spans are at the TRACE verbosity level, and have the
target "tokio::task", which may be used by consumers to filter whether
they should be recorded. They include fields for the type name of the
spawned future and for what kind of task the span corresponds to (a
standard spawned task, a local task spawned by spawn_local, or a
blocking task spawned by spawn_blocking). Because tracing has
separate concepts of "opening/closing" and "entering/exiting" a span, we
enter these spans every time the spawned task is polled. This allows
collecting data such as:

the total lifetime of the task from spawn to drop
the number of times the task was polled before it completed
the duration of each individual time that the span was polled (and
therefore, aggregated metrics like histograms or averages of poll
durations)
the total time a span was actively being polled, and the total time
it was alive but not being polled
the time between when the task was spawned and the first poll

As an example, here is the output of a version of the chat example
instrumented with tracing:

And, with multiple connections actually sending messages:

I haven't added any tracing spans in the example, only converted the
existing println!s to tracing::info and tracing::error for
consistency. The span durations in the above output are generated by
tracing-subscriber. Of course, Tokio-specific subscriber could
generate even more detailed statistics, but that's follow-up work once
basic tracing support has been added.

Note that the Instrumented type from tracing-futures, which attaches
a tracing span to a future, was reimplemented inside of Tokio to avoid
a dependency on that crate. tracing-futures has a feature flag that
enables an optional dependency on Tokio, and I believe that if another
crate in a dependency graph enables that feature while Tokio's tracing
support is also enabled, it would create a circular dependency that
Cargo wouldn't be able to handle. Also, it avoids a dependency for a
very small amount of code that is unlikely to ever change.

There is, of course, room for plenty of future work here. This might
include:

instrumenting other parts of tokio, such as I/O resources and
channels (possibly via waker instrumentation)
instrumenting the threadpool so that the state of worker threads
can be inspected
writing tracing-subscriber Layers to collect and display
Tokio-specific data from these traces
using track_caller (when it's stable) to record where a task
was spawned from
moving the feature out of "tokio_unstable"

However, this is intended as an MVP to get us started on that path.

Signed-off-by: Eliza Weisman [email protected]

Signed-off-by: Eliza Weisman <[email protected]>

The context shows the currently active task, so we _really_ only need `new` and `close` events. Signed-off-by: Eliza Weisman <[email protected]>

Signed-off-by: Eliza Weisman <[email protected]>

there is no option for open/close only, currently Signed-off-by: Eliza Weisman <[email protected]>

davidbarsky · 2020-07-11T21:02:09Z

tokio/src/macros/cfg.rs

+macro_rules! cfg_trace {
+    ($($item:item)*) => {
+        $(
+            #[cfg(all(feature = "tracing", tokio_unstable))]


We talked about this, but I don't think tracing merits being placed behind the tokio_unstable unstable flag. @carllerche will need to weigh in on this, but unlike the CancelationToken, changes to the tracing output aren't what the Tokio project has historically considered to be a "breaking change". An off-by-default (e.g., excluded from full) feature flag seems like a reasonable default.

I think at one point, @carllerche suggested that any dependency on a pre-1.0 crate should require the tokio_unstable flag...but, in this case, we aren't exposing anything from tracing in Tokio's public APIs. AFAICT, it shouldn't be possible for the tracing dependency to cause any compile time breakage --- the worst failure mode, if there is eventually a breaking change to the tracing-core crate that we can't work around, is that the tracing diagnostics might silently fail to be collected.

IMO, you're right, and an off-by-default optional dep should be fine. That would be pretty similar to what we do for parking_lot, which is also pre-1.0 (and churns versions somewhat frequently!), since it's also not exposed publicly. Of course, if we were to expose tracing types (such as Span or Dispatch) in a public API later on, those APIs should require the unstable flag.

With all of that said: I made the more conservative choice of opening this PR with the tokio_unstable flag required, because I want to bias towards getting this merged sooner rather than later, and continuing to iterate once it merges. We can always remove the flag in a follow-up, but it's harder to add if these changes ship in a published release...

tokio/src/util/trace.rs

jonhoo · 2020-07-12T15:08:27Z

This is really cool, I'm excited!

I'll add to the future that from experience it'd also be really handy to have events for when a resource is notified.

examples/chat.rs

tokio/src/task/blocking.rs

hawkw · 2020-07-12T16:33:53Z

I'll add to the future that from experience it'd also be really handy to have events for when a resource is notified.

@jonhoo Yup, the next major thing to do in Tokio itself is probably going to be adding instrumentation to Wakers, IMO. That should let us track when a task expresses interest in a resource, and when a resource notifies a task.

Signed-off-by: Eliza Weisman <[email protected]>

tokio/src/util/trace.rs

Signed-off-by: Eliza Weisman <[email protected]>

carllerche

Looks great. We can remove the unstable flag.

tokio/Cargo.toml

Signed-off-by: Eliza Weisman <[email protected]>

…liza/trace-spawns Signed-off-by: Eliza Weisman <[email protected]>

- docs: misc improvements (#2572, #2658, #2663, #2656, #2647, #2630, #2487, #2621, #2624, #2600, #2623, #2622, #2577, #2569, #2589, #2575, #2540, #2564, #2567, #2520, #2521, #2572, #2493) - rt: allow calls to `block_on` inside calls to `block_in_place` that are themselves inside `block_on` (#2645) - net: fix non-portable behavior when dropping `TcpStream` `OwnedWriteHalf` (#2597) - io: improve stack usage by allocating large buffers on directly on the heap (#2634) - io: fix unsound pin projection in `AsyncReadExt::read_buf` and `AsyncWriteExt::write_buf` (#2612) - io: fix unnecessary zeroing for `AsyncRead` implementors (#2525) - io: Fix `BufReader` not correctly forwarding `poll_write_buf` (#2654) - coop: returning `Poll::Pending` no longer decrements the task budget (#2549) - io: little-endian variants of `AsyncReadExt` and `AsyncWriteExt` methods (#1915) - io: fix panic in `AsyncReadExt::read_line` (#2541) - task: add [`tracing`] instrumentation to spawned tasks (#2655) - sync: allow unsized types in `Mutex` and `RwLock` (via `default` constructors) (#2615) - net: add `ToSocketAddrs` implementation for `&[SocketAddr]` (#2604) - fs: add `OpenOptionsExt` for `OpenOptions` (#2515) - fs: add `DirBuilder` (#2524) [`tracing`]: https://crates.io/crates/tracing Signed-off-by: Eliza Weisman <[email protected]>

# 0.2.22 (July 2!, 2020) ### Fixes - docs: misc improvements (#2572, #2658, #2663, #2656, #2647, #2630, #2487, #2621, #2624, #2600, #2623, #2622, #2577, #2569, #2589, #2575, #2540, #2564, #2567, #2520, #2521, #2493) - rt: allow calls to `block_on` inside calls to `block_in_place` that are themselves inside `block_on` (#2645) - net: fix non-portable behavior when dropping `TcpStream` `OwnedWriteHalf` (#2597) - io: improve stack usage by allocating large buffers on directly on the heap (#2634) - io: fix unsound pin projection in `AsyncReadExt::read_buf` and `AsyncWriteExt::write_buf` (#2612) - io: fix unnecessary zeroing for `AsyncRead` implementors (#2525) - io: Fix `BufReader` not correctly forwarding `poll_write_buf` (#2654) - io: fix panic in `AsyncReadExt::read_line` (#2541) ### Changes - coop: returning `Poll::Pending` no longer decrements the task budget (#2549) ### Added - io: little-endian variants of `AsyncReadExt` and `AsyncWriteExt` methods (#1915) - task: add [`tracing`] instrumentation to spawned tasks (#2655) - sync: allow unsized types in `Mutex` and `RwLock` (via `default` constructors) (#2615) - net: add `ToSocketAddrs` implementation for `&[SocketAddr]` (#2604) - fs: add `OpenOptionsExt` for `OpenOptions` (#2515) - fs: add `DirBuilder` (#2524) [`tracing`]: https://crates.io/crates/tracing Signed-off-by: Eliza Weisman <[email protected]>

## Motivation When debugging proxy issues, it can be useful to inspect the list of currently spawned Tokio tasks and their states. This can be used similarly to the thread or coroutine dumps provided by other languages' runtimes. ## Solution This branch adds a new endpoint to the proxy's admin server, `/tasks`, that returns a dump of all tasks currently spawned on the Tokio runtime, using the new Tracing instrumentation added in tokio-rs/tokio#2655, and a work-in-progress [`tokio-trace`] crate that provides Tokio-specific Tracing layers. Currently, the `/tasks` admin endpoint records the following information about each task: * Whether it is a normal, local, or blocking task (not relevant to us currently, since Linkerd does not use local or blocking tasks...but we might eventually!) * Whether the task is active (currently being polled) or idle (waiting to be polled) * The type of the future that was spawned * The Tracing span context from which the task was spawned * The total number of times the task has been polled * Timing statistics about the task, including: - The time in nanoseconds between when the task was spawned and when it was first polled (essentially, measuring the Tokio scheduler's latency) - The total time in nanoseconds the task has existed - The task's _busy time_ in nanoseconds (time it was actively being polled) - The tasks _idle time_ in nanoseconds (time it was _not_ being polled) In the future, Tokio will likely expose additional Tracing information, which we'll be able to collect as well. The task dump can be accessed either as an HTML table or as JSON. JSON is returned if the request has an `Accept: application/json` header, or whenever the path `/tasks.json` is requested; otherwise, the data is rendered as an HTML table. Like the `/proxy-log-level` endpoint, access is denied to requests coming from sources other than localhost, to help restrict access to authorized users (since a high volume of requests for task dumps could be used to starve the proxy). Example JSON output (in Firefox Dev Edition's extremely nice GUI JSON viewer): ![Screenshot_20200715_121938](https://user-images.githubusercontent.com/2796466/87598059-b9f68380-c6a7-11ea-8f21-842b57793baa.png) Zoomed in on the timing data for a single task: ![Screenshot_20200715_122047](https://user-images.githubusercontent.com/2796466/87598089-c4b11880-c6a7-11ea-93ac-895f7ecee0f0.png) And HTML: ![Screenshot_20200715_143155](https://user-images.githubusercontent.com/2796466/87598414-fe821f00-c6a7-11ea-93b8-d18e4837346c.png) Because the task data is generated from Tracing spans emitted by Tokio, the task spans must be enabled for it to be used. This can be done by setting a trace filter that enables the `trace` level for the target `tokio::task`, e.g.: ``` tokio::task=trace ``` or ``` tokio=trace ``` ## Notes * This branch depends on unreleased code from upstream, including a Tokio change that has merged to master but not been published, and my unreleased work-in-progress [`tokio-trace`] crate. Therefore, I've pinned these upstreams to fixed Git SHAs, to guard against dependencies changing under us unexpectedly. * I considered requiring a build-time feature flag to enable this feature, the way we do for the mock SO_ORIG_DST implementation for testing. However, this would make it harder to use task tracking to debug issues in proxies not built with the flag. I'm happy to change this code to be feature flagged if we think that's the right approach. [`tokio-trace`]: https://github.com/hawkw/tokio-trace Closes linkerd/linkerd2#3803 Signed-off-by: Eliza Weisman <[email protected]>

hawkw added 6 commits July 10, 2020 17:38

task: instrument spawned tasks with tracing

2f03820

Signed-off-by: Eliza Weisman <[email protected]>

examples: add tracing to chat example

4239548

Signed-off-by: Eliza Weisman <[email protected]>

task: add spans to local and blocking tasks

a190eae

Signed-off-by: Eliza Weisman <[email protected]>

examples: don't need enter/exit,

d4e4aef

The context shows the currently active task, so we _really_ only need `new` and `close` events. Signed-off-by: Eliza Weisman <[email protected]>

fixup local and blocking spans

bd4068e

Signed-off-by: Eliza Weisman <[email protected]>

examples: just use FULL

5136e1f

there is no option for open/close only, currently Signed-off-by: Eliza Weisman <[email protected]>

hawkw requested review from carllerche, davidbarsky and a team July 11, 2020 19:17

davidbarsky reviewed Jul 11, 2020

View reviewed changes

davidbarsky mentioned this pull request Jul 12, 2020

tracing: Instrument std::future::Future in tracing tokio-rs/tracing#808

Merged

jonhoo reviewed Jul 12, 2020

View reviewed changes

examples/chat.rs Outdated Show resolved Hide resolved

jonhoo reviewed Jul 12, 2020

View reviewed changes

tokio/src/task/blocking.rs Show resolved Hide resolved

hawkw added 2 commits July 12, 2020 10:55

examples: document tracing setup

7b40188

Signed-off-by: Eliza Weisman <[email protected]>

hide cfgs from mainline code where practical

093398c

Signed-off-by: Eliza Weisman <[email protected]>

jonhoo approved these changes Jul 12, 2020

View reviewed changes

fix dead code with some feature combinations

468ed2d

jonhoo reviewed Jul 12, 2020

View reviewed changes

tokio/src/util/trace.rs Show resolved Hide resolved

hawkw added 2 commits July 12, 2020 11:40

fix comment

e4f421f

Signed-off-by: Eliza Weisman <[email protected]>

turns out localset requires rt-core *and* rt-util

59fec64

Signed-off-by: Eliza Weisman <[email protected]>

carllerche approved these changes Jul 13, 2020

View reviewed changes

tokio/Cargo.toml Outdated Show resolved Hide resolved

hawkw added 3 commits July 13, 2020 15:55

remove unstable flag

f749552

Signed-off-by: Eliza Weisman <[email protected]>

remove reference to unstable flag in examples

ba638df

Signed-off-by: Eliza Weisman <[email protected]>

Merge branch 'eliza/trace-spawns' of github.com:tokio-rs/tokio into e…

d4391cd

…liza/trace-spawns Signed-off-by: Eliza Weisman <[email protected]>

hawkw merged commit b9e3d2e into master Jul 13, 2020

hawkw deleted the eliza/trace-spawns branch July 13, 2020 23:47

This was referenced Jul 13, 2020

meta: instrument Tokio with tracing. #1186

Closed

Introspection of current running/idle threads and total spawned tasks #2510

Closed

hawkw mentioned this pull request Jul 15, 2020

admin: add an endpoint to dump spawned Tokio tasks linkerd/linkerd2-proxy#595

Merged

hawkw mentioned this pull request Jul 20, 2020

chore: prepare to release 0.2.22 #2672

Merged

Darksonn added A-tokio Area: The main tokio crate C-enhancement Category: A PR with an enhancement or bugfix. M-tracing Tracing support in Tokio labels Jul 25, 2020

inevity mentioned this pull request Aug 1, 2023

Constify {Mutex, Notify, OnceCell, RwLock, Semaphore}::new #5897

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

task: add Tracing instrumentation to spawned tasks #2655

task: add Tracing instrumentation to spawned tasks #2655

hawkw commented Jul 11, 2020

davidbarsky Jul 11, 2020

hawkw Jul 12, 2020

jonhoo commented Jul 12, 2020

hawkw commented Jul 12, 2020

carllerche left a comment

task: add Tracing instrumentation to spawned tasks #2655

task: add Tracing instrumentation to spawned tasks #2655

Conversation

hawkw commented Jul 11, 2020

Motivation

Solution

davidbarsky Jul 11, 2020

Choose a reason for hiding this comment

hawkw Jul 12, 2020

Choose a reason for hiding this comment

jonhoo commented Jul 12, 2020

hawkw commented Jul 12, 2020

carllerche left a comment

Choose a reason for hiding this comment