Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Add type and unit metadata labels #39
base: main
Are you sure you want to change the base?
Proposal: Add type and unit metadata labels #39
Changes from 1 commit
25305b0
e56b2be
6cfdbd4
de68600
e0d62d3
619537d
a2e5ed9
75ccddd
d9e0ad3
f1760bd
bce6d9c
0a545c8
94796fb
dec6f78
bc88072
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And PromQL could finally warn reliably about "inappropriate" operations (apply
rate
to a gauge,delta
to a counter, …). (Recently, Prometheus has added an info-level annotation for cases whererate
is applied to a metric that doesn't end on_total
etc., but that's error prone (gauges could have a name ending on_total
, and not all counters end on_total
).)By making unit and type a label (and thus part of the series' identity) it becomes implicitly harder to accidentally query across metrics with different unit or type anyway (there aren't any "mixed series" anymore). Maybe that is something we could work into the goals as preventing the mismatch from happening in the first place is a much more attractive selling point than just stating that we can "return a warning".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say that's a happy side effect of the proposal, but not a requirement :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning about inappropriate operations sgtm. Added to goals.
In the context of preventing collisions from happening in the first place, i've been thinking about what best practices should be. If users write queries and always include
__unit__
and__type__
filters in queries, it is impossible for mismatch to happen in the first place. It also makes queries more readable, as you can easily tell the type and unit of the metric when reading a query. But it does seem quite verbose, and I'm pessimistic that users would actually do this if we tell them they should.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe the type next to the metric name and labels, but as a colored badge or something (cf. how it's done in the PGW, which utilizes the full exposition data model so it even has the help string, which is of course out of scope here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, so this is UI, but what we envision in PromQL response? Should it give those labels back in JSON? If yes, in what order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This goes back to TSDB API too. I assume TSDB API provide those labels as normal labels (worth to clarify?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the alternative explains it, should we mention that both API and PromQL contains those labels as-is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the TSDB API. Can you link to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For exposition formats, I guess it means this would allow:
Ideally I do opposite to what's written, so TYPE and UNIT wins here, or this could be blocked even by spec. Essentially I wonder about efficiency of parsers here.
For ingestion protocols, at least it feels metadata should override it (not what is written now) so it's easier for everyone.
Any further concerns around perf of parsing here for this @bboreham?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about what would be on the
/federate
endpoint if the Prometheus server scraped conflicting metrics from applications. Our choices are:#UNIT
metadata, and no__unit__
labels.__unit__
labels, and no#UNIT
metadata.__unit__
labels, and a#UNIT
with one of the units the metric has./federate
endpoint. It isn't a regression from what we have today.#UNIT
.I'm OK with disallowing mixing
__unit__
and#UNIT
, since we could relax that in the future if needed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed offline, and:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too fond of adding
__unit__
and__type__
as labels in the exposition format. I think they should stay as a single line at the top of the metric family like it is today.The Unit/Type will be the same for all time series in a metric family. If they become labels in the exposition format, it will just be repeated information for all time series. It increases the payload, and we waste CPU time parsing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have "Users see no difference to exposition formats." stated above, but if you put the type and especially the unit in the
# UNIT
part then you need them in the metric name as well, defeating the purpose. You're other option is to put them into the series on each line.Looking at /metrics is for humans and follows what you see is what you get in terms of showing what you can query - implying that if you can query by
__type__
or the API/UI returns the__type__
then you should put them on each line.And the implication of that is that there needs to be at least one more efficient exposition format that is not so verbose for machines to read.
So I'm not sure this would work without either putting the new labels on every line or modifying OpenMetrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course the above argument doesn't stand if we follow @beorn7 's advice and not show
__type__
,__unit__
on the query API (/query
,/query_range
) , although I would love to have it on the metadata queries endpoint like (/series
) for Grafana to be able to know what's a float series and what's a native histogram.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should show
__type__
and__unit__
in the query APIs. I'm just saying, in the zeroth step, we should remove those labels in PromQL operations (like the name is removed).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with showing
__type__
and__unit__
in query APIs, just like we show__name__
.That's a really good catch! This is something we'll probably need to tackle with OM 2.0. Is this what you had in mind with this issue @bwplotka prometheus/OpenMetrics#280?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "Users see no difference to exposition formats", I meant that this proposal does not change how metrics are currently exposed in exposition formats. I'll leave that for follow-up discussions in prometheus/OpenMetrics#280 and elsewhere. I updated the language in 0a545c8.
I am open to ways of querying for type and unit that align better with the existing #TYPE and #UNIT comments, if we want to preserve the "what you see is what you get" experience of Prometheus. We do support querying for
__name__
, so there is at least some precedence.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how safe this is. If PromQL does not hide the return, new labels will suddenly appear in users' queries.
I'm also not sure about the potential impact other than dashboards/alerts having new labels, could that be a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely open to this alternative. I like having the option of showing
__type__
and__unit__
in the UI, but if it has to wait for Prom 4.0 because it is breaking, that seems like a significant downside.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it could be also merge of both, where PromQL only returns those labels based on something e.g. on conflict or when type or unit is selectors.
The downside is that we now have to really deeply design groupings and joins.. should they use those labels or not. With the current proposal the answer is trivial and straightforward. This helps with creating alerts and recording rules etc
I think I would be keen to try the current approach e.g. behind feature flag and see who will use it and how to tell more. cc @juliusv @roidelapluie WDYT from UX point of view for non-UI things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I like the idea to model OTel's notion of "identifying metadata" as labels in Prometheus (to be distinguished from "non-identifying metadata", which is closer to Prometheus's original metadata idea).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately OTel resource attributes are all considered identifying today (correct me if I'm wrong David).
There is work in progress to replace resource attributes with a new thing called
Entity
. Entity will be able to distinguish, using otel lingo,Identifying
fromDescriptive
attributes, but this will take some years to get popular in the industryThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OTel resource attributes are all considered identifying today, but we don't treat them all as identifying when translating from OTel -> Prometheus. Instead, we just treat service.name + service.instance.id + service.namespace as identifying. That is correct if the OTLP originates from an OTel-instrumented application, but not if it wasn't (e.g. comes from an otel collector receiver).
Anyways, what I had in mind was more things that users don't think of as "labels", like schema url, or maybe scope name + version. I think resource attributes make sense as "real" labels, rather than metadata.