Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial work on runtime stats #4043

Merged
merged 19 commits into from
Aug 27, 2021
Merged

Initial work on runtime stats #4043

merged 19 commits into from
Aug 27, 2021

Conversation

Darksonn
Copy link
Contributor

This PR is an implementation of some parts of #3845. I am mainly looking for feedback on the approach before I proceed.

@Darksonn Darksonn added A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime labels Aug 17, 2021
@Darksonn Darksonn mentioned this pull request Aug 17, 2021
@Darksonn
Copy link
Contributor Author

Darksonn commented Aug 17, 2021

I think the main questions I wish to discuss are:

  1. What is the strategy for collecting the information? Just put function calls everywhere, hope you don't miss a spot?
  2. Is this the best way to mock it out when the feature is disabled?
  3. What do we do about LocalSet?

Copy link
Member

@carllerche carllerche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me. The harder part will be figuring out how to track queue depth over time. I would look into that next. The difficulty is we want to avoid putting a concept of time on each worker...

tokio/Cargo.toml Outdated
@@ -47,6 +47,7 @@ io-util = ["memchr", "bytes"]
# stdin, stdout, stderr
io-std = []
macros = ["tokio-macros"]
metrics = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is worth making this a feature flag at all (vs. always on).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going to be pretty slow on platforms that don't have an AtomicU64 as we would then go through this mock.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is a big deal. Alternatively, we use AtomicUsize let it wrap and it is up to the receiver of data to handle that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any non 64bit platforms anymore that are important for production.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always add a feature flag later as well if someone would like it disabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding later is tricky as it technically is a breaking change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right not sure what I was thinking about 😅

tokio/src/runtime/metrics/metrics.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@Matthias247 Matthias247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me so far!

tokio/src/runtime/metrics/metrics.rs Outdated Show resolved Hide resolved
tokio/src/runtime/metrics/metrics.rs Outdated Show resolved Hide resolved
@Darksonn
Copy link
Contributor Author

Sorry for breaking the diff since you all last looked at it, but I didn't want it to get too far behind master.

@carllerche
Copy link
Member

How should the user think about using the "duration since the last two parks" metric? What questions does it answer? What questions does it not answer?

I would consider merging what you had before as it looked pretty solid. Then we can add new metrics in follow up PRs and focus discussion on the individual metric.


/// This type contains methods to retrieve metrics from a Tokio runtime.
#[derive(Debug)]
pub struct RuntimeMetrics {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider naming this just Metrics (or something like that). It already is in a runtime module. runtime::RuntimeMetrics is a bit of a stutter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this RuntimeStats or just Stats. Stats::poll_count() seems pretty natural. I feel like Metrics is decently overloaded (at least in applications they are).

}

/// Returns a slice containing the worker metrics for each worker thread.
pub fn workers(&self) -> impl Iterator<Item = &WorkerMetrics> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could store WorkerMetrics in a slice. We just have to make sure the struct itself is cache padded. That would be a bit of a simpler API.

Copy link
Contributor

@Matthias247 Matthias247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should the user think about using the "duration since the last two parks" metric? What questions does it answer? What questions does it not answer?

I share this question. I think this approach tries to store the latest park to park time - so users directly can fetch it.

On the positive side, this doesn't require users to do math anymore to get the latest info. The downside is that this adds more sampling than the approach which accumulates all times that was also discussed. If a user missed to grab metrics for the sample which had a long execution time - it will not be visible. The latter might therefore be preferable to avoid losing insight, and makes the metric more consistent with other monotonic incrementing metrics.

Was this approach chosen to make things easier for users - or because updating both a monotonic incrementing park counter and duration would require an atomic u128 if consistency is required? We can maybe find a way around the latter.

tokio/Cargo.toml Outdated
@@ -47,6 +47,7 @@ io-util = ["memchr", "bytes"]
# stdin, stdout, stderr
io-std = []
macros = ["tokio-macros"]
metrics = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any non 64bit platforms anymore that are important for production.

tokio/src/runtime/metrics/counter_duration.rs Outdated Show resolved Hide resolved
Copy link
Member

@LucioFranco LucioFranco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall

tokio/Cargo.toml Outdated
@@ -47,6 +47,7 @@ io-util = ["memchr", "bytes"]
# stdin, stdout, stderr
io-std = []
macros = ["tokio-macros"]
metrics = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always add a feature flag later as well if someone would like it disabled.


/// This type contains methods to retrieve metrics from a Tokio runtime.
#[derive(Debug)]
pub struct RuntimeMetrics {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this RuntimeStats or just Stats. Stats::poll_count() seems pretty natural. I feel like Metrics is decently overloaded (at least in applications they are).

///
/// The `u16` is a counter that is incremented by one each time the duration
/// is changed. The counter will wrap around when it reaches `u16::MAX`.
pub fn park_to_park(&self) -> (u16, Duration) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name feels odd, how about park_duration?

}

pub(crate) fn returned_from_park(&mut self) {
self.last_park = Instant::now();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my design originally tried to avoid any Instant::now calls, should we continue to consider it? Do we know what this perf impact might be this deep in the runtime call stack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice this from your RFC, but I've punted this part of the feature for now.

pub mod metrics;
}
cfg_not_metrics! {
pub(crate) mod metrics;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need mocks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is easier to isolate the conditional compilation to this module by defining mocks with no-op methods than to put conditional compilation on every single use of the module.

@@ -65,10 +67,11 @@ fn steal_overflow() {
let inject = Inject::new();

let th = thread::spawn(move || {
let mut metrics = WorkerMetricsBatcher::new(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these all start with 0 should we just use a Default impl instead of calling new(0) everywhere?

@@ -483,6 +496,8 @@ impl Context {
self.worker.shared.notify_parked();
}

core.metrics.returned_from_park();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this a drop guard or something?

@Darksonn Darksonn merged commit 98578a6 into master Aug 27, 2021
@Darksonn Darksonn deleted the metrics branch August 27, 2021 09:40
@Darksonn Darksonn added M-metrics Module: tokio/runtime/metrics and removed M-runtime Module: tokio/runtime labels Aug 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-metrics Module: tokio/runtime/metrics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants