-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Messaging-System Metric semantic conventions #1077
Messaging-System Metric semantic conventions #1077
Conversation
Please add an entry in CHANGELOG.md |
brief: 'Connection string.' | ||
examples: ['tibjmsnaming://localhost:7222', 'https://queue.amazonaws.com/80398EXAMPLE/MyQueue'] | ||
constraints: | ||
- any_of: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the work on #1027, the net.*
names should be removed as constraints and added as references like for tracing
|
||
| Name | Instrument | Units | Description | | ||
|----------------------|---------------|--------------|-------------| | ||
| `messaging.consumer.processed.messages` | Counter | messages | Sum of messages processed. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metric name might need altering, or messaging.consumer.messages
needs altering to include "receive" in the name.
Otherwise, this metric could be implied to be a child of the other metric by virtue of the naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, I'll think about how to reword these. My instinct was to have the metric name be messaging.processed.messages
or messaging.processor.messages
, however following the span kind naming convention it says that the processor should use "consumer" as its span kind.
9e89bdf
to
2cd7b4c
Compare
I was able to resolve some of my issues with the semconv generator. There were a few things going on:
While the generator works correctly for me it seems I'm failing the md-check with |
b8c94dc
to
8c0d5b2
Compare
Discussed in the Metrics SIG today: we think it would be best to eliminate the superfluous counters. This may be problematic in the short term, but in the long term we can address these with a Views API, we think. |
@jmacd thanks for the heads up. Removed those. |
@open-telemetry/specs-metrics-approvers this is ready for review! |
1 similar comment
@open-telemetry/specs-metrics-approvers this is ready for review! |
ca01fa0
to
cf9ba47
Compare
cf9ba47
to
4988c5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structure, labels, and instrument definitions look good. Just issues with some of the terminology and formatting. Good job 👍
at the metrics sig mtg today talked about this issue's staleness |
@MrAlias I think I addressed almost all of your feedback. I had questions/comments on a couple of them. Thanks! |
aa5f02c
to
8683a2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I re-read this PR and wonder if the recently-added guidelines on metrics named .io
should be applied here. Instead of having two metrics <something>.bytes
for each direction of travel, we can use <something>.io
with a label to indicate whether it was produced or consumed.
@jmacd interesting. To be a little more concrete you're imagining renaming these metrics:
And then we'd have some labels like Some of my initial reactions to this:
|
@cwildman maybe this could be helpful to decide:
Would a user ever want to aggregate together the produced/consumed bytes into one timeseries (or would it be meaningful)? If not, then I suppose they should be separate.
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
I'd like to reopen this PR and hopefully discuss at the spec meeting this week. |
@cwildman Hi! |
Changes
Add semantic conventions for messaging system metrics.
Related issues:
Fixes #1014
PR Details
The general approach I took here was to take the messaging operations referenced in the tracing semantic conventions (sending, receiving, processing) and break them into 3 distinct metric groups. I created metrics for monitoring throughput and latency within these operations.
I've left individual commits to highlight some open questions/points of interest that I will elaborate on here.
Metric naming with Span Kind
@justinfoote mentioned that the preference is that when name spacing labels and metrics the second field should be the "span kind". The second commit attempts to follow this convention. However the span kind as defined in the tracing semantic conventions of messaging suggest producer/consumer should be used for async message processing, and server/client should be used for synchronous message processing. Should we change the metric names based on whether the application is sync or async? I assumed no. Also there's a bit of ambiguity here because the tracing semantic conventions yaml refers to things like
messaging.consumer.synchronous
instead ofmessaging.client
.Superfluous Counters vs. ease of use
I created a specific Counter to track the messages sent, received and processed. This information is already captured in the
count of the
duration
metric ValueRecorder. Personally I feel like the extra metric is worth the ease of use but I don't know where the OTEL community stands on this.Semconv format and generator
I went ahead and attempted to use the semconv yaml format and generator. Many of these labels are identical to tracing messaging attributes but I went ahead and copied them into a separate yaml for messaging metrics. I could not get the generator to actually work though. Here's an example:
docker run --rm otel/semconvgen --yaml-root ../../opentelemetry-specification/semantic_conventions markdown --markdown-root ../../opentelemetry-specification/specification
Batch metrics
The last commit adds metrics around batching because I see that as a common feature that needs to be monitored. Introducing metrics around batching does create confusion around whether some of the non-batch metrics still exist or what they mean. I'm interested to hear if others feel the behavior is sufficiently explained or if batch metrics should be dropped entirely.
Other metrics
Here's a list of other potential metric categories that I omitted. I'm assuming these are too specific to only a subset of messaging systems to be included here:
@jmacd @bogdandrutu mentioning you as potential reviewers.