Initial work on runtime stats #4043

Darksonn · 2021-08-17T10:27:13Z

This PR is an implementation of some parts of #3845. I am mainly looking for feedback on the approach before I proceed.

Darksonn · 2021-08-17T17:41:35Z

I think the main questions I wish to discuss are:

What is the strategy for collecting the information? Just put function calls everywhere, hope you don't miss a spot?
Is this the best way to mock it out when the feature is disabled?
What do we do about LocalSet?

carllerche

Looks great to me. The harder part will be figuring out how to track queue depth over time. I would look into that next. The difficulty is we want to avoid putting a concept of time on each worker...

carllerche · 2021-08-17T17:38:23Z

tokio/Cargo.toml

@@ -47,6 +47,7 @@ io-util = ["memchr", "bytes"]
 # stdin, stdout, stderr
 io-std = []
 macros = ["tokio-macros"]
+metrics = []


I wonder if it is worth making this a feature flag at all (vs. always on).

It's going to be pretty slow on platforms that don't have an AtomicU64 as we would then go through this mock.

I don't think it is a big deal. Alternatively, we use AtomicUsize let it wrap and it is up to the receiver of data to handle that.

I don't think there's any non 64bit platforms anymore that are important for production.

We can always add a feature flag later as well if someone would like it disabled.

Adding later is tricky as it technically is a breaking change.

right not sure what I was thinking about 😅

tokio/src/runtime/metrics/metrics.rs

Matthias247

Looks great to me so far!

tokio/src/runtime/metrics/metrics.rs

tokio/src/runtime/basic_scheduler.rs

Darksonn · 2021-08-25T12:09:25Z

Sorry for breaking the diff since you all last looked at it, but I didn't want it to get too far behind master.

carllerche · 2021-08-25T16:53:52Z

How should the user think about using the "duration since the last two parks" metric? What questions does it answer? What questions does it not answer?

I would consider merging what you had before as it looked pretty solid. Then we can add new metrics in follow up PRs and focus discussion on the individual metric.

carllerche · 2021-08-25T17:23:48Z

tokio/src/runtime/metrics/metrics.rs

+
+/// This type contains methods to retrieve metrics from a Tokio runtime.
+#[derive(Debug)]
+pub struct RuntimeMetrics {


I would consider naming this just Metrics (or something like that). It already is in a runtime module. runtime::RuntimeMetrics is a bit of a stutter.

I would call this RuntimeStats or just Stats. Stats::poll_count() seems pretty natural. I feel like Metrics is decently overloaded (at least in applications they are).

carllerche · 2021-08-25T17:25:06Z

tokio/src/runtime/metrics/metrics.rs

+    }
+
+    /// Returns a slice containing the worker metrics for each worker thread.
+    pub fn workers(&self) -> impl Iterator<Item = &WorkerMetrics> {


I think we could store WorkerMetrics in a slice. We just have to make sure the struct itself is cache padded. That would be a bit of a simpler API.

Matthias247

How should the user think about using the "duration since the last two parks" metric? What questions does it answer? What questions does it not answer?

I share this question. I think this approach tries to store the latest park to park time - so users directly can fetch it.

On the positive side, this doesn't require users to do math anymore to get the latest info. The downside is that this adds more sampling than the approach which accumulates all times that was also discussed. If a user missed to grab metrics for the sample which had a long execution time - it will not be visible. The latter might therefore be preferable to avoid losing insight, and makes the metric more consistent with other monotonic incrementing metrics.

Was this approach chosen to make things easier for users - or because updating both a monotonic incrementing park counter and duration would require an atomic u128 if consistency is required? We can maybe find a way around the latter.

Matthias247 · 2021-08-25T18:40:19Z

tokio/Cargo.toml

@@ -47,6 +47,7 @@ io-util = ["memchr", "bytes"]
 # stdin, stdout, stderr
 io-std = []
 macros = ["tokio-macros"]
+metrics = []


I don't think there's any non 64bit platforms anymore that are important for production.

tokio/src/runtime/metrics/counter_duration.rs

LucioFranco

LGTM overall

LucioFranco · 2021-08-25T20:32:28Z

tokio/Cargo.toml

@@ -47,6 +47,7 @@ io-util = ["memchr", "bytes"]
 # stdin, stdout, stderr
 io-std = []
 macros = ["tokio-macros"]
+metrics = []


We can always add a feature flag later as well if someone would like it disabled.

LucioFranco · 2021-08-26T12:24:21Z

tokio/src/runtime/metrics/metrics.rs

+
+/// This type contains methods to retrieve metrics from a Tokio runtime.
+#[derive(Debug)]
+pub struct RuntimeMetrics {


I would call this RuntimeStats or just Stats. Stats::poll_count() seems pretty natural. I feel like Metrics is decently overloaded (at least in applications they are).

LucioFranco · 2021-08-26T12:33:31Z

tokio/src/runtime/metrics/metrics.rs

+    ///
+    /// The `u16` is a counter that is incremented by one each time the duration
+    /// is changed. The counter will wrap around when it reaches `u16::MAX`.
+    pub fn park_to_park(&self) -> (u16, Duration) {


This name feels odd, how about park_duration?

LucioFranco · 2021-08-26T12:34:41Z

tokio/src/runtime/metrics/metrics.rs

+    }
+
+    pub(crate) fn returned_from_park(&mut self) {
+        self.last_park = Instant::now();


So my design originally tried to avoid any Instant::now calls, should we continue to consider it? Do we know what this perf impact might be this deep in the runtime call stack?

I didn't notice this from your RFC, but I've punted this part of the feature for now.

LucioFranco · 2021-08-26T12:36:11Z

tokio/src/runtime/mod.rs

+    pub mod metrics;
+}
+cfg_not_metrics! {
+    pub(crate) mod metrics;


why do we need mocks?

I think it is easier to isolate the conditional compilation to this module by defining mocks with no-op methods than to put conditional compilation on every single use of the module.

LucioFranco · 2021-08-26T12:37:22Z

tokio/src/runtime/tests/loom_queue.rs

@@ -65,10 +67,11 @@ fn steal_overflow() {
        let inject = Inject::new();

        let th = thread::spawn(move || {
+            let mut metrics = WorkerMetricsBatcher::new(0);


if these all start with 0 should we just use a Default impl instead of calling new(0) everywhere?

LucioFranco · 2021-08-26T12:39:48Z

tokio/src/runtime/thread_pool/worker.rs

@@ -483,6 +496,8 @@ impl Context {
            self.worker.shared.notify_parked();
        }

+        core.metrics.returned_from_park();


Should we make this a drop guard or something?

Darksonn added A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime labels Aug 17, 2021

Darksonn mentioned this pull request Aug 17, 2021

rfc: Runtime stats #3845

Closed

carllerche requested review from hawkw and seanmonstar August 17, 2021 17:36

carllerche reviewed Aug 17, 2021

View reviewed changes

Matthias247 reviewed Aug 18, 2021

View reviewed changes

tokio/src/runtime/metrics/metrics.rs Outdated Show resolved Hide resolved

tokio/src/runtime/metrics/metrics.rs Outdated Show resolved Hide resolved

Matthias247 reviewed Aug 18, 2021

View reviewed changes

tokio/src/runtime/basic_scheduler.rs Outdated Show resolved Hide resolved

Initial work on metrics

bf905c3

Darksonn force-pushed the metrics branch from f21cc4a to bf905c3 Compare August 25, 2021 12:08

Darksonn added 10 commits August 25, 2021 13:34

Add another metric

e2905b7

rustfmt

db15d5b

Fix park_to_park timing

e075fb2

Allow dead code without metrics

37ef131

Add cfg(features = "rt-multi-thread") to incr_steal_count

14f7c89

Simplify formatting

816fad3

typo

34b814f

Fix test

d87027d

Fix more tests

0f7d7ee

rustfmt

2d70494

carllerche reviewed Aug 25, 2021

View reviewed changes

Matthias247 reviewed Aug 25, 2021

View reviewed changes

Darksonn and others added 2 commits August 25, 2021 23:05

Simplify set_next_duration

213218b

Remove unused constants

062e18a

LucioFranco reviewed Aug 26, 2021

View reviewed changes

Rename to stats

366a32c

Darksonn added 2 commits August 26, 2021 13:12

Punt park_to_park

5ac84f9

rustfmt

b6ec8fb

LucioFranco approved these changes Aug 26, 2021

View reviewed changes

Darksonn added 3 commits August 26, 2021 13:16

Remove things I forgot to remove

3e756d3

remove

b80d764

Add cache padding

cb60bb1

Matthias247 approved these changes Aug 26, 2021

View reviewed changes

carllerche mentioned this pull request Aug 26, 2021

meta: Runtime metrics stabilization #4073

Open

7 tasks

Darksonn merged commit 98578a6 into master Aug 27, 2021

Darksonn deleted the metrics branch August 27, 2021 09:40

Darksonn added M-metrics Module: tokio/runtime/metrics and removed M-runtime Module: tokio/runtime labels Aug 27, 2021

Matthias247 mentioned this pull request Aug 28, 2021

Add a configuration option to skip the lifo_slot optimization #4051

Closed

Darksonn mentioned this pull request Aug 31, 2021

chore: prepare Tokio v1.11.0 #4083

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial work on runtime stats #4043

Initial work on runtime stats #4043

Darksonn commented Aug 17, 2021

Darksonn commented Aug 17, 2021 •

edited

Loading

carllerche left a comment

carllerche Aug 17, 2021

Darksonn Aug 25, 2021

carllerche Aug 25, 2021

Matthias247 Aug 25, 2021

LucioFranco Aug 25, 2021

carllerche Aug 26, 2021

LucioFranco Aug 26, 2021

Matthias247 left a comment

Darksonn commented Aug 25, 2021

carllerche commented Aug 25, 2021

carllerche Aug 25, 2021

LucioFranco Aug 26, 2021

carllerche Aug 25, 2021

Matthias247 left a comment

Matthias247 Aug 25, 2021

LucioFranco left a comment

LucioFranco Aug 25, 2021

LucioFranco Aug 26, 2021

LucioFranco Aug 26, 2021

LucioFranco Aug 26, 2021

Darksonn Aug 26, 2021

LucioFranco Aug 26, 2021

Darksonn Aug 26, 2021

LucioFranco Aug 26, 2021

LucioFranco Aug 26, 2021

Initial work on runtime stats #4043

Initial work on runtime stats #4043

Conversation

Darksonn commented Aug 17, 2021

Darksonn commented Aug 17, 2021 • edited Loading

carllerche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Matthias247 left a comment

Choose a reason for hiding this comment

Darksonn commented Aug 25, 2021

carllerche commented Aug 25, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Matthias247 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LucioFranco left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Darksonn commented Aug 17, 2021 •

edited

Loading