Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify the interpretation of
SpanKind
#337Clarify the interpretation of
SpanKind
#337Changes from 1 commit
c7ff294
6f30bff
dfc3cc9
1b36fbc
e03974d
7595543
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"remote" seemed relevant, why was it removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's nothing to prevent a service from calling itself. Why does the "remote" aspect matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because semantically we (I think) want "client" kind to mean "network call", not in process function call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a fan of such example in the spec, because it is against the traditional instrumentation guidelines that this scenario should have both server and client spans in that same service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote that example to illustrate what I see as an ambiguity and a problem with
SpanKind
generally. I am not familiar with the guidelines you refer to, but it feels odd if we're suggesting that a server should start a child span in order to issue an RPC. This just raises the cost of tracing.As far as I've been able to determine, there are a few uses for
SpanKind
. Currently, I believe SpanKind is ambiguous at best, so I'd like a better understanding.One thing which was part of OpenTracing was this
ChildOf
vsFollowsFrom
distinction. I believe the choice of "producer/consumer" is associated with FollowsFrom and that "client/server" is associated with ChildOf. The important question is should the tracing system assume that the child span is going to complete before the parent span. At LightStep we call client/server spans "well-formed", whereas FollowsFrom spans are not. We can't use FollowsFrom spans for critical path analysis.Another thing I see SpanKind being used for is to detect spans which either start with a remote parent or are themselves a remote parent. At LightStep we refer to these as "ingress" and "egress" spans. They are more interesting for monitoring purposes than internal spans.
My understanding is that
SpanKind
is sort of doing both of these things, telling us whether a child should complete before its parent and whether a parent and child used context propagation rather than an internal relationship.I don't see
SpanKind
actually accomplishing these things, unless the guidelines you refer to are met. I would prefer if we had less ambiguous ways to do this, which do not require extra spans where they are not necessary.Some solutions that I would prefer.
Spans could count the number of times their context is Extracted. Spans with >0 extracted contexts ought to have remote children. These are "egress" spans. Spans that are started from an Injected context ought to have a remote parent. These are "ingress" spans.
The ChildOf vs FollowsFrom distinction ought to be a Link property, since a span can have multiple children. We do not currently use a Link to store the parent span relationship, but conceptually each Link should have an attribute to say whether it is a well-formed child or an asynchronous one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting a child "client" span for outbound network call is how every instrumentation I ever came across operates. If you don't do that, you cannot add any tags describing the outbound call either, because they would end up on the parent "server" span where they don't belong.
The cost of tracing can be controlled by not recording those spans, but if instrumentation doesn't even create them then it is simply not describing the semantics of distributed transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, with this guideline in place the meaning of
SpanKind
makes sense. This guideline should be included in the explanation for SpanKind. I haven't seen it written down anywhere.I think a server with high fanout shouldn't be required to create a child span for each call it makes. I could add an Event to the parent span to describe the call being made. The child span will have the tags I'm interested in, and I don't feel any semantics are lost.
I will update this PR with the guideline and explain how to interpret these values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, high fanout does occasionally present problems (e.g. 100s of calls to Redis). If the application itself knows about it and is conscious of the performance overhead of those tiny spans, then it makes sense for it to just log events. But if the API itself is efficient enough, then the optimization may still be done by the tracer, rather than the application, e.g. by combining some of those spans into summaries.