-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial work on runtime stats #4043
Conversation
I think the main questions I wish to discuss are:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me. The harder part will be figuring out how to track queue depth over time. I would look into that next. The difficulty is we want to avoid putting a concept of time on each worker...
tokio/Cargo.toml
Outdated
@@ -47,6 +47,7 @@ io-util = ["memchr", "bytes"] | |||
# stdin, stdout, stderr | |||
io-std = [] | |||
macros = ["tokio-macros"] | |||
metrics = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is worth making this a feature flag at all (vs. always on).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's going to be pretty slow on platforms that don't have an AtomicU64
as we would then go through this mock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is a big deal. Alternatively, we use AtomicUsize let it wrap and it is up to the receiver of data to handle that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any non 64bit platforms anymore that are important for production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always add a feature flag later as well if someone would like it disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding later is tricky as it technically is a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right not sure what I was thinking about 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me so far!
Sorry for breaking the diff since you all last looked at it, but I didn't want it to get too far behind master. |
How should the user think about using the "duration since the last two parks" metric? What questions does it answer? What questions does it not answer? I would consider merging what you had before as it looked pretty solid. Then we can add new metrics in follow up PRs and focus discussion on the individual metric. |
tokio/src/runtime/metrics/metrics.rs
Outdated
|
||
/// This type contains methods to retrieve metrics from a Tokio runtime. | ||
#[derive(Debug)] | ||
pub struct RuntimeMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider naming this just Metrics
(or something like that). It already is in a runtime
module. runtime::RuntimeMetrics
is a bit of a stutter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call this RuntimeStats
or just Stats
. Stats::poll_count()
seems pretty natural. I feel like Metrics
is decently overloaded (at least in applications they are).
tokio/src/runtime/metrics/metrics.rs
Outdated
} | ||
|
||
/// Returns a slice containing the worker metrics for each worker thread. | ||
pub fn workers(&self) -> impl Iterator<Item = &WorkerMetrics> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could store WorkerMetrics
in a slice. We just have to make sure the struct itself is cache padded. That would be a bit of a simpler API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should the user think about using the "duration since the last two parks" metric? What questions does it answer? What questions does it not answer?
I share this question. I think this approach tries to store the latest park to park time - so users directly can fetch it.
On the positive side, this doesn't require users to do math anymore to get the latest info. The downside is that this adds more sampling than the approach which accumulates all times that was also discussed. If a user missed to grab metrics for the sample which had a long execution time - it will not be visible. The latter might therefore be preferable to avoid losing insight, and makes the metric more consistent with other monotonic incrementing metrics.
Was this approach chosen to make things easier for users - or because updating both a monotonic incrementing park counter and duration would require an atomic u128 if consistency is required? We can maybe find a way around the latter.
tokio/Cargo.toml
Outdated
@@ -47,6 +47,7 @@ io-util = ["memchr", "bytes"] | |||
# stdin, stdout, stderr | |||
io-std = [] | |||
macros = ["tokio-macros"] | |||
metrics = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any non 64bit platforms anymore that are important for production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall
tokio/Cargo.toml
Outdated
@@ -47,6 +47,7 @@ io-util = ["memchr", "bytes"] | |||
# stdin, stdout, stderr | |||
io-std = [] | |||
macros = ["tokio-macros"] | |||
metrics = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always add a feature flag later as well if someone would like it disabled.
tokio/src/runtime/metrics/metrics.rs
Outdated
|
||
/// This type contains methods to retrieve metrics from a Tokio runtime. | ||
#[derive(Debug)] | ||
pub struct RuntimeMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call this RuntimeStats
or just Stats
. Stats::poll_count()
seems pretty natural. I feel like Metrics
is decently overloaded (at least in applications they are).
tokio/src/runtime/metrics/metrics.rs
Outdated
/// | ||
/// The `u16` is a counter that is incremented by one each time the duration | ||
/// is changed. The counter will wrap around when it reaches `u16::MAX`. | ||
pub fn park_to_park(&self) -> (u16, Duration) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name feels odd, how about park_duration
?
tokio/src/runtime/metrics/metrics.rs
Outdated
} | ||
|
||
pub(crate) fn returned_from_park(&mut self) { | ||
self.last_park = Instant::now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my design originally tried to avoid any Instant::now
calls, should we continue to consider it? Do we know what this perf impact might be this deep in the runtime call stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't notice this from your RFC, but I've punted this part of the feature for now.
tokio/src/runtime/mod.rs
Outdated
pub mod metrics; | ||
} | ||
cfg_not_metrics! { | ||
pub(crate) mod metrics; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need mocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is easier to isolate the conditional compilation to this module by defining mocks with no-op methods than to put conditional compilation on every single use of the module.
@@ -65,10 +67,11 @@ fn steal_overflow() { | |||
let inject = Inject::new(); | |||
|
|||
let th = thread::spawn(move || { | |||
let mut metrics = WorkerMetricsBatcher::new(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if these all start with 0 should we just use a Default
impl instead of calling new(0)
everywhere?
@@ -483,6 +496,8 @@ impl Context { | |||
self.worker.shared.notify_parked(); | |||
} | |||
|
|||
core.metrics.returned_from_park(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this a drop guard or something?
This PR is an implementation of some parts of #3845. I am mainly looking for feedback on the approach before I proceed.