-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify the interpretation of SpanKind
#337
Clarify the interpretation of SpanKind
#337
Conversation
@jmacd IMO StatusKind is useful precisely because it is narrowly defined and allows backends to learn the semantics of operation and visualize it differently based on the value of the SpanKind. Perhaps there is a specific use case which you have in your mind that is not solvable by the status quo, in that case please describe it. The use case you wrote "This would allow users to perform offline analysis on spans by other designated kinds" seems achievable by just adding a custom attribute to the Span. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Request change" as a way to block the PR.
It is not clear what extra scenarios this opens comparing to custom attribute on a span. It seems like too big of a change for 1.0 milestone and it will negatively affect memory, perf of runtime and spans protocol.
@jmacd if you feel that this is a needed change, let's initiate it from the OTEP and specific scenarios it opens up
@tigrannajaryan there is no loss of value in this regard. There are still four well-known kinds that you can use to tell a backend the semantics / visualization strategy. Backends would continue to recognize those four values and use special visualizations, but backends could simply display the Kind string for other unrecognized values and provide value to the user. Disallowing the use of any other kind of Span prevents users from naturally expressing their system. If they can't use a standard SpanKind value (b/c their use doesn't match one of the four), then they're forced to use a separate span attribute, which will not display any useful information when rendered. Simply put, if my use is not one of those four, there is still value in displaying that this is an "ingress" span, a "background" span, a "watchdog" span, etc. There are millions of useful descriptions for the kind of span that can be imagined, and if SpanKind can be a string, my system can display those to me prominently. |
@jmacd I see your point. I agree that users may have span kinds that don't fit into the 4 categories we defined and we will be artificially forcing them to choose one or leave it unspecified and that may be undesirable in situations when they actually know and want to specify some specific kind. My concern with relaxing the specification would be that the users may put all sorts of unexpected string values in the span kind even when one of the predefined values would be a good match and thus end up with less accurate and less valuable data. Perhaps this is not a big deal, especially if the API encourages the use of one of 4 predefined values and requires a slight extra effort to specify a custom one. |
I would be happier if we removed |
I agree, it may make more sense. |
In OpenTracing is was defined as one of 4 strings, not an arbitrary string:
In OpenCensus it was added as a replacement of Recv. and Sent. prefixes in span names: census-instrumentation/opencensus-proto#51 And originally only had So I'd question the desire to "unseal" the values limit as neither of initial projects had this scenario. Now w.r.t. attribute vs. field - for me there are two considerations:
|
@tigrannajaryan @jmacd I am not sure what an open form will give us in the future - we can always add new types if necessary, there is no real benefit to have this as a free from string. I can see only downsides in having it as a free form string - misspells, backends without proper support because a user just invented a new kind, etc. @SergeyKanzhelev I completely agree with your logic and arguments. I think the decision was made to keep this as a first class citizen of the Span during the merge discussions and we cannot revert that now until we deprecate opencensus and opentracing for backwards compatible reasons. |
It seems this field is not required for a backend to operate correctly, that it's just a hint. @SergeyKanzhelev In the OpenTracing document, those are "Notes and Examples". Nothing says you can't use any string. SpanKind is also ambiguous. What should I use if I'm both a server and a producer? And it's meant for compatibility with a hint for OpenCensus? This is a lot to make up for backward compatibility, when we could just use a span attribute and leave it out of the SpanData. I don't see how we (as vendors) benefit from this annotation, but as a user being limited to exactly four values makes it feel not very useful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion on enum vs. string. In OpenTracing span.kind
tag was a string.
I am not sure if span kind is much more important that all the other data elements for which we define semantic conventions. In strongly typed languages it is easy to make the API accept a typed constant, which prevents people from freely putting garbage values, while still being extensible.
specification/api-tracing.md
Outdated
* `SERVER` Indicates that the span covers server-side handling of an RPC or | ||
other remote request. | ||
* `CLIENT` Indicates that the span describes a request to some remote service. | ||
other request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"remote" seemed relevant, why was it removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's nothing to prevent a service from calling itself. Why does the "remote" aspect matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because semantically we (I think) want "client" kind to mean "network call", not in process function call.
specification/api-tracing.md
Outdated
|
||
`SpanKind` is associated with the activity that started the span, | ||
which helps resolve ambiguity. For example, a server span that | ||
directly calls another server could be described as both a SERVER and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a fan of such example in the spec, because it is against the traditional instrumentation guidelines that this scenario should have both server and client spans in that same service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote that example to illustrate what I see as an ambiguity and a problem with SpanKind
generally. I am not familiar with the guidelines you refer to, but it feels odd if we're suggesting that a server should start a child span in order to issue an RPC. This just raises the cost of tracing.
As far as I've been able to determine, there are a few uses for SpanKind
. Currently, I believe SpanKind is ambiguous at best, so I'd like a better understanding.
One thing which was part of OpenTracing was this ChildOf
vs FollowsFrom
distinction. I believe the choice of "producer/consumer" is associated with FollowsFrom and that "client/server" is associated with ChildOf. The important question is should the tracing system assume that the child span is going to complete before the parent span. At LightStep we call client/server spans "well-formed", whereas FollowsFrom spans are not. We can't use FollowsFrom spans for critical path analysis.
Another thing I see SpanKind being used for is to detect spans which either start with a remote parent or are themselves a remote parent. At LightStep we refer to these as "ingress" and "egress" spans. They are more interesting for monitoring purposes than internal spans.
My understanding is that SpanKind
is sort of doing both of these things, telling us whether a child should complete before its parent and whether a parent and child used context propagation rather than an internal relationship.
I don't see SpanKind
actually accomplishing these things, unless the guidelines you refer to are met. I would prefer if we had less ambiguous ways to do this, which do not require extra spans where they are not necessary.
Some solutions that I would prefer.
Spans could count the number of times their context is Extracted. Spans with >0 extracted contexts ought to have remote children. These are "egress" spans. Spans that are started from an Injected context ought to have a remote parent. These are "ingress" spans.
The ChildOf vs FollowsFrom distinction ought to be a Link property, since a span can have multiple children. We do not currently use a Link to store the parent span relationship, but conceptually each Link should have an attribute to say whether it is a well-formed child or an asynchronous one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it feels odd if we're suggesting that a server should start a child span in order to issue an RPC. This just raises the cost of tracing.
Starting a child "client" span for outbound network call is how every instrumentation I ever came across operates. If you don't do that, you cannot add any tags describing the outbound call either, because they would end up on the parent "server" span where they don't belong.
The cost of tracing can be controlled by not recording those spans, but if instrumentation doesn't even create them then it is simply not describing the semantics of distributed transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, with this guideline in place the meaning of SpanKind
makes sense. This guideline should be included in the explanation for SpanKind. I haven't seen it written down anywhere.
I think a server with high fanout shouldn't be required to create a child span for each call it makes. I could add an Event to the parent span to describe the call being made. The child span will have the tags I'm interested in, and I don't feel any semantics are lost.
I will update this PR with the guideline and explain how to interpret these values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a server with high fanout shouldn't be required to create a child span for each call it makes.
Yes, high fanout does occasionally present problems (e.g. 100s of calls to Redis). If the application itself knows about it and is conscious of the performance overhead of those tiny spans, then it makes sense for it to just log events. But if the API itself is efficient enough, then the optimization may still be done by the tracer, rather than the application, e.g. by combining some of those spans into summaries.
I do agree that we should have an enumerated list of values for For Personally, with I would prefer to have a clearer definition of the purpose of |
SpanKind
Please take another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it! Thank you for adding clarification for a SpanKind
the parent is expected to wait for it to complete under ordinary | ||
circumstances. It can be useful for tracing systems to know this | ||
property, since synchronous Spans may contribute to the overall trace | ||
latency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some RPC frameworks (well, not really 'P' as in 'procedure) support one-way calls, that may or may not be acked on delivery, but certainly do not provide for a result to be returned to the caller (e.g. firing off a UDP message).
What span_kind should the caller use in that case? Producer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this question points out that SpanKind
is still flawed.
The question is whether these two should be independent or not. Can you be a CLIENT/PRODUCER? or a SERVER/CONSUMER? I.e., can you declare that there is a remote parent/child and an asynchronous relationship?
Can you be an internal span that has a PRODUCER/CONSUMER relationship with its child? These are capabilities we found in OpenTracing via ChildOf and FollowsFrom.
I am still skeptical of this field, but at least in the current state of this PR my understanding of the intention is closer to the group's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SergeyKanzhelev what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you that this PR improves current spec, but not addresses the problem of SpanKind expressiveness. Similar to #65 we just stalling on decision as not many consumer uses this information to the full extent.
For me - important thing is to distinguish incoming, internal and outgoing calls. As we had in OpenCensus. Mostly for the in-proc aggregation and analysis.
@open-telemetry/specs-approvers this PR makes spec better by explaining the intent behind SpanKind. Not semantics change. Please review. |
@SergeyKanzhelev OK :) . Maybe we should remove PRODUCER/CONSUMER then. I could imagine using bitwise logic to express more kinds of span, too. The dimensions are: CLIENT/SERVER/INTERNAL these are exclusive properties to indicate whether a remote parent/child or not. one of these values is required to be set. These I perfectly agree with. Now consider this proposal: SYNC/ASYNC these are options, exclusive with each other, to indicate synchronicity, one of these may be set in addition to CLIENT/SERVER/INTERNAL. That leaves 6 possible span kinds, but it doesn't quite help with the problem as I see it, and the problem is that we're mixing descriptions that apply to the parent and to the child. In the existing spec, PRODUCER says "I will not wait for my child", CONSUMER says "My parent will not wait for me". One describes the relationship to a child, one describes the relationship to a parent. If I'm an INTERNAL span, should I prefer to annotate the relationship with my parent, or my child? What if I want to express "I will not wait for my children AND my parent will not wait for me"? (It's a producer and a consumer.) This leads me to think that the two options are really not exclusive, so consider this proposal: ASYNC_PARENT: I will not wait for (all) my children This leads to 9 possible span kinds CLIENT All of this leaves me feeling like the OpenTracing |
^^^ That said, I'd be happy to merge this and continue working on it. |
@jmacd I like your suggestion of separating the different semantic bits. However, if we do that, I would prefer the following
For critical path analysis I don't think it is sufficient to know simply sync/async flag. A span represents a pair of events, start and end. The causality relationships are easy to define in terms of events (via "happens before" relationship), but not as easy between pairs of events. I would argue that knowing causality is more important than knowing sync/async nature, as the latter can be inferred from the former. Microsoft Project has long used 4 types of "task links" to define relationships between tasks, like start-to-finish, finish-to-finish, etc. I am not sure if those are sufficient to express all possible happens-before combinations between parent start/end and child start/end. Ideally, we would come up with a nomenclature that could also be applied to Span Links, which have an additional uncertainty about the start events (in case of parent/child spans, we always know that NB: I don't have objections to merging the current PR, but I think we need to address the causality uncertainty. |
PS: if ingress/egress terminology might be confusing, another one is inbound/outbound/internal. |
Another interesting span kind could be "proxy", which is both inbound/outbound. As the spec if phrased now, it's not clear what span kind would, say, a sidecar generate (for sidecar it does not make sense to be generating a pair of server/client spans). |
@yurishkuro I proposed the same thing some time ago, load_balancer fits into the same category. |
@yurishkuro I feel like you've contradicted yourself, here #337 (comment) and here #337 (comment) I'd be happy to iterate on this, let's get this PR merged. |
Co-Authored-By: Sergey Kanzhelev <[email protected]>
I don't think so. #337 (comment) refers to spans created by a regular microservice. Even if it has 1-1 inbound/outbound spans, the purpose of those spans is generally different. But in case of a proxy or load-balancer, I don't consider its outbound spans as "calling a dependency", nor its inbound spans as "handling an endpoint". One can, of course, model them that way, but in practice nobody needs such overhead and proxies usually create just a single span (or sometimes no span at all, just enrich the existing one from the service through post-processing merge). But it does need a different span kind to represent that (very common for proxies) behavior. |
Thanks. |
* Loosen the SpanKind specification to allow string values * Updates * Updates * Updates * Update specification/api-tracing.md Co-Authored-By: Sergey Kanzhelev <[email protected]>
* Loosen the SpanKind specification to allow string values * Updates * Updates * Updates * Update specification/api-tracing.md Co-Authored-By: Sergey Kanzhelev <[email protected]>
This adds clarification to the
SpanKind
field values.