-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification needed for ".utilization" metrics convention #819
Comments
@bogdandrutu and @open-telemetry/specs-metrics-approvers regarding open-telemetry/opentelemetry-proto#199 |
That would make sense to me. |
To me, there is still a minor concern. We have argued that SumObserver and UpDownSumObserver should accept cumulative inputs so that they can remain stateless. Observer callbacks do not need to know the last time they were called or remember the last value. In order for instrumentation to compute CPU utilization from am Observer callback breaks this rule. The callback has to remember the last timestamp it reported and the last value it recorded in order to output the current interval's utilization. The final destination of a |
I think it would great to keep it, but as you mentioned it could be added back in. Which way are you leaning @jmacd?
Are we talking about the SDK instrument to use or the OTLP temporality? My takeaway from the today's (Tuesday) meeting was that using a stateful ValueObserver (where the last value and call time is saved by the callback from its previous call) would be the easiest way to implement this with the SDK. This would send an OTLP gauge which seems ok to me. Then once we have views, it would be best to calculate this from the This does see like a common use case though, it's called out in the Metrics API spec a few times: "monotonic instruments are useful for monitoring rate information." Is "calculating" here meaning with a view or in the backend? Someone also mentioned OTEP 88 had a proposal for this interval/delta, but no concrete use cases. Would something like request rate not be an equivalent synchronous example of this ( # requests in interval / time delta )? |
This was discussed in the 8/18 Metrics SIG (OTLP) meeting. We agreed to address this in the short term by using stateful Observer callbacks that track both their last CPU time measurement and their last timestamp. A side-note was raised relevant to OTLP: If we had a way to encode deltas from observer instruments, it would be natural to do so here. OTLP actually supports this concept, but we have not standardized any form of Delta Observer, and this may be such a special case that we continue to ignore this matter. However, if we had a Delta Observer then it would be natural to encode "CPU time elapsed" measurements. We compute |
What are you trying to achieve?
OTEP #119 specified a convention for metrics ending in ".utilization":
open-telemetry/oteps#119
It's not clear how to implement this in some cases, clarification may be needed. For a metric such as
process.cpu.time
which is emitted as a cumulative value (e.g., from a SumObserver), we'll naturally be able to compute a cumulative utilization score, i.e., the total CPU time used divided by the total time. This number, the lifetime utilization, may not be very useful. It would be perhaps more useful expressed as "Interval" temporality. The ".utilization" for cumulative time metrics has the same problem as Summary data points have, that they are rarely useful in cumulative form. Moreover, they can be derived in a backend.Should we drop ".utilization" metrics for CPU usage? Should we specify they be conveyed as Interval summaries (i.e., Difference in cumulative usage divided by difference in time)? (@aabmass)
The text was updated successfully, but these errors were encountered: