-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[connector/servicegraph] Coalesce different attr sets into single ScopeMetrics metric entry #34070
[connector/servicegraph] Coalesce different attr sets into single ScopeMetrics metric entry #34070
Conversation
Tests were failing because the `buildMetrics()` code was creating multiple metric entries inside a single MetricScope. The `pdatatest` test functionality is not designed to handle this. The code now creates one metric within a scope, but creates one datapoint for each unique set of attributes.
No changelog entry as this is a fix for a flaky unit-test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense 👍🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the following PR has the same purpose as this, but might be a better solution. If this PR here has further improvements, we can consider merging it as well. I'm blocking this, so that we can take that into account before merging.
Restore skipped test introduced in #34120 |
#34076 was not successful in fixing the servicegraph connector flaky test. The root cause of the error is at a higher level than the order of datapoint attributes within a metric. As mentioned earlier
|
Would it make sense to undo #34076 as part of this PR? |
As far as I can tell from #34076, it attempts to sort all datapoint slices based on a canonicalized order of the attributes within each datapoint. The existing
If there is a match, but they are not in the same index order The only way to avoid an OutOfOrderErr and get to the point of comparing the actual values is to ensure that the order of the metric datapoints within a DataPointSlice are exactly the same between the expected and the actual which is what #34076 does by sorting both So no, I don't think #34076 needs to be reverted. It's a useful option, particularly if there are multiple datapoints within a single Resource::Scope::Metric structure, and it's non-deterministic as to which order they will be added. I do see that the merged commit didn't fix the copy/paste error on the function comment for Fixing that comment is not really part of this PR, but I can put it in as an additional commit to avoid opening/merging another PR just for that comment change. |
Thanks for the comments. I'll merge this as is, the comment can be fixed on a follow up PR |
Tests in
connector/servicegraph
were failing because the servicegraphbuildMetrics()
code was creating multiple metric entries for the same metric name within a single MetricScope. Although this may not be forbidden by the Otel specification, I think there is a general assumption that a metric name does not appear more than once within the same MetricScope.Instead, different values (e.g. with different sets of attribute values) should be created as separate datapoints within the same metric.
The
pmetrictest.CompareScopeMetrics
test functionality is not designed to handle multiple metric entries with the sameName()
. Instead, it is assumed that in cases where Order is ignored, the first entry found in the actual metrics which matches the name of the expected metrics must be the metric to compare.This fix changes the
buildMetrics()
code to create one metric within a scope, and instead create multiple datapoints per metric when there are entries where the datapoint attribute set is unique (i.e. all entries in the internal mapsserviceGraphConnector.req{Total,FailedTotal,ServerDurationSecondsCount}
are coalesced into a single named metric as appropriate.)Note: I don't have any past experience working with servicegraphconnector, but just observing that
collectClientLatencyMetrics()
andcollectServerLatencyMetrics()
both range over the same map -p.reqServerDurationSecondsCount
, although the actual values collected incollectClientLatencyMetrics()
are fromp.reqServerDurationSeconds{Count,Sum,BucketCounts}
and the values collected incollectServerLatencyMetrics()
are fromp.reqClientDurationSeconds{Count,Sum,BucketCounts}
.This seems a little asymmetrical, but I don't have enough experience to say whether this is an error or not.
Description: Fixes #33998
Testing: All other unit tests now complete, and the previously failing unit test now works reliably.
Documentation: No documentation added. This is a unit test fix.