Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Remove Gauge instrument #412

Closed
bogdandrutu opened this issue Jan 17, 2020 · 5 comments
Closed

Proposal: Remove Gauge instrument #412

bogdandrutu opened this issue Jan 17, 2020 · 5 comments

Comments

@bogdandrutu
Copy link
Member

During the last metrics specs meeting we discussed the option to remove from the API the Gauge instruments and the LastValue aggregation support, here are some of the reasons:

  • We could not come with concrete examples where a Gauge instrument will be used instead of Observer (to report things like system metrics: cpu, memory, etc.) or Measure (when a distribution of the instantaneous measurements is needed, what is the average queue length when executing "foo" requests).
  • Reducing/Merging labels for a LastValue aggregation is hard to do it correctly because sometimes a Sum is needed sometimes a LastValue merge aggregation. (E.g. if reporting a cpu-usage metric with the following labels "socket_id", "core_id" (every socket has a chipset with multiple cores), "user_id" (the user of the current request in witch we report the gauge) - if we do report this metric for every request, computing the cpu-usage metric with only "socket_id", "core_id" will be aggregated by removing the "user_id" and applying LastValue aggregation, but if a metric like cpu-usage with only "socket_id" is required then the aggregation needs to remove the "user_id" by applying LastValue aggregation while merging and a Sum aggregation while removing the core_id for the same "socket_id"). This problem applies to Observer as well, but they are not associated with a request so random labels coming from the request context will not be present.
  • Apply the rule of minimal API surface: because there are some unknowns with the Gauge implementation and because we don't have very good use-cases for these type of instruments it is better to avoid adding them in the API for the moment. In case of a clear use-case determined later we should re-evaluate if adding the Gauge instrument back is the right decision or we do have different other solutions.

Please provide examples when a Gauge metric MUST be used instead of the Observer or Measure and explain the use-case.

@jmacd
Copy link
Contributor

jmacd commented Jan 21, 2020

I support this, but only weakly. I wonder if anyone is strongly opposed.

I looked for examples where a Gauge metric would be used (not MUST be used) instead of an Observer metric. The one example that sticks out are situations where in a concurrent environment, an Observer gauge is likely to hit a lock and block collection. When optimizing this scenario, we end up in a "don't call me, I'll call you" situation. We'd like to set metrics at a time when the lock is not contended. Often this means managing your own state and delaying updates until the time is right to record them. At this point, calling Set() is natural. However, you could address this by building on top of Observer instruments. Each Settable-gauge would be backed by one Observer and a last-value. What we potentially lose is an ability to feed these events into other aggregators using a dynamic configuration--but this seems unlikely to matter.

On the opposing side, I feel that the arguments about query support (needing a Sum vs a LastValue) are not bulletproof. Using Observer instruments will encounter the same trouble. The difference, I guess, is that we may assume that all label combinations are set in a single collection period for an Observer, therefore the range of time and label values covered by a Sum aggregation is well defined.

I believe it boils down to advice that: if you will pre-aggregate an observer (or gauge) instrument, be careful and know what you are doing.

I see this as an improvement. It will be nice to be rid of Monotone non-observer gauge logic.

@jmacd
Copy link
Contributor

jmacd commented Jan 22, 2020

I've written up the rationale that I came to while thinking this through and posted OTEP 80.

@jkwatson
Copy link
Contributor

I also think this is a good change. I suspect we'll end up adding it back eventually, but I also like having a concrete example to justify it. @jmacd Your example is very abstract. Can you provide a concrete example from your experience where you ran into this?

@jmacd
Copy link
Contributor

jmacd commented Jan 22, 2020

I added a lot more text based on early feedback. Thanks and PTAL.

@jmacd
Copy link
Contributor

jmacd commented Jan 30, 2020

This was accepted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants