-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define span attributes for messaging systems #418
Define span attributes for messaging systems #418
Conversation
2b9f71c
to
c699fdb
Compare
@open-telemetry/specs-approvers PTAL 🙂 |
@open-telemetry/specs-approvers PTAL 🙂 |
@open-telemetry/specs-approvers Please review this PR. It's around semantic conventions, and it augments what we have (in this case, adding a section for messaging Would love to have this merged (or reviewed/tuned) soon ;) |
b7e26b2
to
ba3ebe7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general, It would be great to have some specific examples. Also requested some clarifications
The last question I have left is the |
specification/data-messaging.md
Outdated
| `messaging.temp_destination` | A boolean that is `true` if the message destination is temporary. | If temporary (assumed to be `false` if missing). | | ||
| `messaging.protocol` | The transport protocol such as `AMQP` or `MQTT`. | No | | ||
| `messaging.url` | Connection substring such as `tibjmsnaming://localhost:7222` or `https://queue.amazonaws.com/80398EXAMPLE/MyQueue`. | No | | ||
| `messaging.message_id` | A value used by the messaging system as an identifier for the message, represented as a string. | No | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if messages are batched? I assume message_id is not populated, is it correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you talking about producing, receiving or processing of the messages?
If they are produced one at a time, received in a batch and then processed one at a time, each producing and each processing span should have the respective message ID. The receiver span won't have any. Please see the batch example at the end of the file.
specification/data-messaging.md
Outdated
| `messaging.operation` | A string identifying which part and kind of message consumption this span describes: either `receive` or `process`. (If the operation is `send`, this attribute must not be set: the operation can be inferred from the span kind in that case.) | No | | ||
|
||
The _receive_ span is be used to track the time used for receiving the message(s), whereas the _process_ span(s) track the time for processing the message(s). | ||
Note that one or multiple Spans with `messaging.operation` = `process` may often be the children of a Span with `messaging.operation` = `receive`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you provide reasons/preconditions behind making receive
span a parent of process
? What about receive
duration then, it seems to end before children (which is likely ok)?
Also, in the context of sampling, how receive
is sampled considering it knows linked context after call completes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to track the time spent receiving the message(s) separately from the time spent for processing them. The receiver span might or might not end before the processor spans are started - both would be correct.
I cannot quite follow the last question. If receiving is tracked separately, the context will usually not have been propagated at the time the receiver span is started. Also, in batch receive scenarios, there won't be one single context to act as a parent for the receive span. Therefore, the receive span will usually be a root span. The child spans tracking message processing will link to the message producer spans as shown in the batch example at the bottom of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's likely that consumer writes code like
messages = queue.receive(5);
foreach (message in messages)
{
processSpan = trace.StartActiveSpan("my-queue");
// do stuff
span.end()
}
Logically by the time processing starts, receive is completed and it's span context is not current/may be lost. Do I have to pass it to process span?
I'm on the library instrumentation side (Azure EventHubs and other messaging offerings in Azure) i.e. if I instrument receive, I would have no reasonable means to pass it's context to users so they can not possibly start 'process' span as a child of 'receive'.
I want to understand reasons behind the choice of making process child of receive and perhaps removing mentions of it if this is optional, there is no strong need for it.
Regarding sampling:
OTel samplers check links and if there are links that are sampled in, sampler will also make 'sampled in' decision for span.
But links are not known, as you mentioned, before the 'receive' span starts. This basically means low chance for receive to be consistently sampled with the messages in the batch which makes it useless.
I decided against instrumenting receive at all for the near future, and this is one of the reasons for it.
I've heard some instrumentations solve it through delaying span creation until messages are received to make sampling consistent. Was it considered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logically by the time processing starts, receive is completed and it's span context is not current/may be lost. Do I have to pass it to process span?
This depends on the OTel implementation. In Java, for example, one could create a scope around the foreach in which the receive span is active. Ending the receive span does not "deactivate" it, so it would still be picked up as parent by the process spans. Passing the receive span or its spancontext explicitly would also be an option, yes.
I'm on the library instrumentation side (Azure EventHubs and other messaging offerings in Azure) i.e. if I instrument receive, I would have no reasonable means to pass it's context to users so they can not possibly start 'process' span as a child of 'receive'.
If you're on the library instrumentation side and only instrument the receive call which returns a number of messages without having any further hooks to instrument, then you will not be able to produce the process spans, yes. This is a limitation by the instrumentation approach and it should be fine to just have receive spans without process spans in this case since this is all you can trace this way. How else would you trace the time spent processing messages given these limitations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding sampling
I'd expect the sampler used to sample the receive calls if the information is deemed interesting/valuable based on the span attributes being set. In this case the sampler can deduce from the required span attributes that this is a message-receive span and decide accordingly.
If the receive call is sampled, the process spans would be children of the receive span and bear links to the send/create spans from the spancontext being propagated in the message metadata, if available.
If the receive call is not sampled, the process spans will either be the child of some other span and add the aforementioned links or - if there is no implicit parent - use the spancontext extracted from the message metadata as a parent.
I've heard some instrumentations solve it through delaying span creation until messages are received to make sampling consistent. Was it considered?
The process spans are supposed to be created after the message was received, so this is already solved by having two different spans for receiving and processing.
Co-Authored-By: Sergey Kanzhelev <[email protected]>
Co-Authored-By: Christian Neumüller <[email protected]>
18e240a
to
1cf72e5
Compare
@open-telemetry/specs-approvers who haven't had the chance to review this PR yet, please consider taking a look. Thank you! |
We already prepared a proposal regarding semantic conventions for messaging systems but @kbrockhoff was faster with his PR #395. I sent him our proposal and he's fine with it, so I integrated his suggestions into ours and we're moving forward with this PR now.
This proposal includes the
component
field like the other semantic conventions currently do. There are discussions (#271 and #336 ) for removing that field but there hasn't been any agreement yet.Regarding an attribute describing the service account for connecting to the messaging system (if desired): Should we add a "user" attribute per type (
db.user
,messaging.user
,rpc.user
and the like) or should we define a general attribute for that? (brought up by @bogdandrutu on #395 (comment))