-
Notifications
You must be signed in to change notification settings - Fork 164
Update 0003-measure-metric-type to match current spec #61
Update 0003-measure-metric-type to match current spec #61
Conversation
I'm not sure what good it makes to update OTEPS. Do you see an OTEP as a source of valuable information beyond the fact that feature was approved fro implementation? In my mind OTEPs are proposal/discussion of a features and actual spec is in specs repository. So names mismatch is totally normal situation. I'm happy to approve this PR if needed |
@bogdandrutu asked me to update the OTEPs. |
|
||
| Field | Description | | ||
|------|-----------| | ||
| Name | A string. | | ||
| Kind | One of Cumulative, Gauge, or Measure. | | ||
| Required Keys | List of always-defined keys in handles for this metric. | | ||
| Recommended Keys | Default aggregation keys. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to better understand the difference between recommended vs required. It feels to me that the whole metrics derived a bit more than expected with the LabelSet and got pushed more towards statsd than I feel comfortable (this is mostly coming from the Go implementation that I looked at).
Allowing users to call getHandlers and add/set/record with LabelSet that are larger than the "required keys" removes the optimization that this operation can be transformed in a simple atomic increment for the most common case (default aggregation using recommended/required keys).
As an example:
- I want a counter with "recommended keys" ->
queue_topic
(which has 3 possible valuesvalue_1
,value_2
,value_3
). - Let's assume the user want to build the metric with only the "recommended keys" - which I expect to be the common case.
- If I call
add
with aLabelSet
that includesqueue_topic
andqueue_name
then I need to do expensive operation to extract the queue_topic from theLabelSet
then look for the Handle then update the atomic variable (even if the LabelSet includes only thequeue_topic
I need to look for the value and compare strings). This will definitely be extra overhead for the common case.
I would like to confirm as I mentioned before that a Prometheus like performance can be achieved for the "required/recommended keys" and default aggregation with our current API, while allow more extensibility and custom aggregations.
This comment probably applies to the whole LabelSet
concept. I feel that the current API spec is focused on the fact that the entire LabelSet
will just be serialized on the wire, which is different than what I think an optimized metrics implementation should do which is to pre-aggregate metrics and drop all the extra labels that are not necessary.
Anyway don't want to block this PR with this comment, but want to have this documented and addressed before we move forward with the LabelSet
idea (which I do like from the perspective of having an implementation performant for statsd, but I do not like that adds unnecessary overhead for something like Prometheus).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I call add with a LabelSet that includes queue_topic and queue_name then I need to do expensive operation to extract the queue_topic from the LabelSet then look for the Handle then update the atomic variable (even if the LabelSet includes only the queue_topic I need to look for the value and compare strings). This will definitely be extra overhead for the common case.
If you add/set/record with a handle that has more labels than the recommended labels for the instrument in question, you still get an atomic operation (assuming the aggregation supports it--some measure-aggregations do not). The atomic operation updates a record that is bound to the pair of (descriptor, labelset). There is no additional computation in the instrumentation code path.
On the collect/export code path, the implementation "groups" all the records for a particular descriptor, meaning to fold results from several label sets into one aggregate result. So you do have additional cost, but it is not in the fast path, it's in the export path. There's a possibility to compute and cache a mapping from (Descriptor, LabelSet) to ordinal recommended-key positions, that would mostly alleviate the computation here. It's a simple matter of running through the label sets, mapping into the unique, ordered recommended-key labels, and then merging aggregates. [I haven't implemented this cache, it's a NOTE in the code.]
Now consider the benefit. You suggested a LabelSet that includes both queue_topic
(the recommended key) and queue_name
(an additional label). During the collect/export pass, the implementation will merge records from different label sets according to their value for queue_topic
, but we have new information that could be very useful to the user. We're able to derive, for each of the aggregate results, the distribution of queue_name
. This could be used to automatically ensure that a unique exemplar was supplied for each distinct queue_name
, for example. This is a huge benefit, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the cost of a handle Add()
in the Go benchmarks stands at ~15ns (regardless of how many labels there are, compared with recommended labels), where the direct call to Add()
with a label set stands at ~200ns. So using a handle is an order of magnitude cheaper than a direct call (also note that a direct call is much cheaper than constructing a label set, for realistic label sets).
It remains to be seen how much the impact of additional CPU at collection will be, but we cannot let the future metrics world be constrained by Prometheus which has a very limited view of how we can use labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I resolve this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to see an implementation (e2e) before convincing myself that we are not making a wrong API decision. But I think we should close for the moment, we documented the concern and will revisit once we have an e2e implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just open an issue with the conversation to track this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an e2e implementation. There are two export.Batcher
implementations, one groups by recommended keys, one groups by all keys in the label set.
open-telemetry/opentelemetry-go#265
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bogdandrutu I'm not sure I could summarize the issue you intend to track. Is it that LabelSets permit additional labels and you're worried about performance?
My 2 cents on this: Sometimes OTEPs contain comprehensive and summarized information information in a form that does not exist in specification. Applying OTEPs to the spec sometimes results in modifications in multiple places in multiple files using many PRs and the big picture is not visible that way. For such cases I believe it is value to have the OTEP as one document and even reference to it where appropriate from spec so that a particular small detail in an obscure corner of the spec becomes clearer. This is for example the case with my OTLP proposal where protocol elements will be scattered among many .proto files plus an explanation will be added in a separate doc in the proto repository (and the big picture will be more difficult to see). For some other OTEPs the entire contents of the OTEP is verbatim copied to the spec, in which case I agree OTEP should be considered of historical interest only and probably even marked so somehow with a note saying that the source of truth is now the spec. Updating OTEP in such cases is I think unnecessary. I am not sure which of these cases applies to this particular OTEP, I haven't searched the spec thoroughly to see if the entire content of this OTEP also exists in the spec. |
This PR was opened for a long time and it has 3 official approver LGTM + couple others coming from people interested in metrics. Also this PR just clarifies some changes we agreed in the specs. Entering God mode and merge this. |
…/oteps#61) * Propose approval / start round 3 of discussion * Use Monotonic * Updates to use Monotonic/NonMonotonic, and NonNegtive/Signed * Update 0003 * Minor * Revert * Revert
There were a few changes made to the current metrics spec that should be updated here, since this OTEP was never set to approved.
The significant changes in this text to match the current spec are:
Meter
I believe this puts the OTEP in line with the spec, there is nothing else new here.