-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[discuss] event loop idle metrics #33026
Comments
I think its important that what the Node.js API returns is mappable to Prometheus in as high-value a way as possible, I'll take a look at this in the next day or so. |
Is utilization here the wall clock time since measurement starts or the actual CPU time consumed by the thread executing the loop tasks? |
@Flarna It's a bit in depth, but you should read my post on libuv/libuv#2725 (comment) about the differences. TL;DR "idle time" is measured using |
@trevnorris Good post! I think a common use case would be to calculate utilization for regular intervals not always from loop startup. Therefore not sure if adding this overall utilization from loop start is that useful. Looks very useful to me. |
Currently this mechanism uses In theory, we could record the moment
Definitely possible but obviously starts to increase the complexity of the API so we need to be careful. |
@Flarna Instead of reporting A simple example: function loopUtilization(last, current = performance.idleTime()) {
return (current.idle - last.idle) / (current.active - last.active);
}
const now = performance.idleTime();
const t = Date.now();
while (Date.now() - t < 3000);
console.log(loopUtilization(now)); Though if |
Yep, so the key question here is: do we provide the calculation of utilization for users or do we just provide the idle time and leave it to them to calculate. |
If it's similar to my example above, to prevent yet another one-liner module we might just as well add it. So maybe it would operate something like so: // This simply logs out the loop utilization for a three second duration
setTimeout((last) => {
console.log(performance.loopUtilization(last));
}, 3000, performance.idleTime()); Internally we just store
|
I'm in favor of an easy, stateless API. If we add state we have to think about multi-user support. |
Yes. It is the same unit as idle time. |
Given the likely complexity of any other option, I'm +1 on the really simple Just Give Me The Idle Time API. We can calculate active from the existing perf milestones, we don't actually need the new API to give us that. |
@jasnell Just as a note here, we should probably make idle time include |
I'm not super familiar with how we'd be able to observe waits but +1 to including those. @trevnorris , have you looked into that at all? |
@jasnell Luckily, I’ve done quite a bit of work around that in the past: https://github.com/v8/v8/blob/1d00b7856f9071c2c62ad66dbdbfdebff6dc370c/include/v8.h#L8867-L8944 Adding support for measuring that timing should be fairly straightforward, we basically register a callback that’s called once when (I’ve also been meaning to add a logging API for those events to Node.js, for better debugging |
To my understanding the time spent in |
@Flarna Well … that’s a good point, but I don’t think it’s a bit more complex than that. Maybe we need to make this customizable. I’m currently working on code that uses |
The metric should only measure idle time in the event provider. Specifically the amount of time idle while it waits for an event to be received (reason for this is explained in libuv/libuv#2725 (comment)). Looking at Well, I guess that actually depends on whether the Worker's event loop has any handles from which it expects to receive an event. If not then That being said, measuring the amount of time spent in So for usability maybe just provide both times as separate values? IMO that would be the most helpful. TL;DR |
In respect to the question about whether to just return the raw number since idle time metric was enabled or trying to return a calculated value over an interval, my preference is to leave it as the raw number. Otherwise it's a slippery slope of people wanting to be able to specify the interval etc. so probably better to just let them calculate from the raw number in the first place and keep it as simple as possible. |
Just had a great catch up with @trevnorris... it's been too long since we've chatted! On this specific API, I'm going to move forward with a draft PR that pulls in his libuv changes and adds a field to the existing Performance API nodeTiming.idleTime / (process.hrtime() - nodeTiming.loopStart) We can provide the cumulative Atomics.wait time as a separate property off |
Now that the patch has landed in libuv v1.39.0 I'll create a PR with this feature |
It has already landed in Node in #34915. EDIT: Ah, I see the original comment was already updated. Ignore this comment 😄 |
PR created: #34938 |
PR #34938 landed @trevnorris I assume this can be closed now? |
Probably. :) |
@trevnorris is working on landing a change to libuv that will track and report the amount of time the event loop spends in idle time. It's an extremely useful metric that can provide for us a measurement of "event loop utilization". In a world of worker threads, monitoring CPU no longer becomes an effective way of monitoring performance and event loop delay is not enough on it's own, so having a built in mechanism for measuring event loop utilization would be fantastic. While there is some work still to be done to get the PR landed in libuv and get that new libuv version landed in core, I did want to briefly discuss how the new metric should be exposed in core.
In this comment @trevnorris suggests a simple
performance.idleTime()
that returns the direct value of this metric, which records the cumulative time spent in idle since the loop was configured to track. To calculate event loop utilization, however, we also need to know how long the event loop has been running (well, to be specific, how long it's been since the loop was configured to collect the data, which can be turned on but not turned off). Assuming we started the loop and started collecting the metric from the start, we do already record the start time of the event loop (using the performance milestones) so someone could calculate the utilization on their own by accessing those values. However, I think it might make more sense for us to just do the calculation for users and provide an API likeperformance.idleTime()
that returns an object with two values{ idle: n, utilization: y }
whereidle
is the raw idle time andutilization
is the calculated utilization value. The API should be very low cost to sample using AliasedArray or AliasedStruct as a backing./cc @nodejs/diagnostics @addaleax @mcollina
The text was updated successfully, but these errors were encountered: