Clarify the interpretation of `SpanKind` #337

jmacd · 2019-10-24T19:00:33Z

This adds clarification to the SpanKind field values.

tigrannajaryan · 2019-10-24T20:12:53Z

@jmacd IMO StatusKind is useful precisely because it is narrowly defined and allows backends to learn the semantics of operation and visualize it differently based on the value of the SpanKind.
If this becomes an arbitrary string value unclear how the backends will implement the equivalent functionality (perhaps by having semantic conventions around SpanKind strings, but then what's the value of moving from enum to a list of pre-defined strings if any user-defined value is unknown to the backend?).

Perhaps there is a specific use case which you have in your mind that is not solvable by the status quo, in that case please describe it. The use case you wrote "This would allow users to perform offline analysis on spans by other designated kinds" seems achievable by just adding a custom attribute to the Span.

SergeyKanzhelev

"Request change" as a way to block the PR.

It is not clear what extra scenarios this opens comparing to custom attribute on a span. It seems like too big of a change for 1.0 milestone and it will negatively affect memory, perf of runtime and spans protocol.

@jmacd if you feel that this is a needed change, let's initiate it from the OTEP and specific scenarios it opens up

jmacd · 2019-10-24T21:01:57Z

allows backends to learn the semantics of operation and visualize it differently based on the value of the SpanKind.

@tigrannajaryan there is no loss of value in this regard. There are still four well-known kinds that you can use to tell a backend the semantics / visualization strategy. Backends would continue to recognize those four values and use special visualizations, but backends could simply display the Kind string for other unrecognized values and provide value to the user.

Disallowing the use of any other kind of Span prevents users from naturally expressing their system. If they can't use a standard SpanKind value (b/c their use doesn't match one of the four), then they're forced to use a separate span attribute, which will not display any useful information when rendered. Simply put, if my use is not one of those four, there is still value in displaying that this is an "ingress" span, a "background" span, a "watchdog" span, etc. There are millions of useful descriptions for the kind of span that can be imagined, and if SpanKind can be a string, my system can display those to me prominently.

tigrannajaryan · 2019-10-24T21:28:44Z

@jmacd I see your point. I agree that users may have span kinds that don't fit into the 4 categories we defined and we will be artificially forcing them to choose one or leave it unspecified and that may be undesirable in situations when they actually know and want to specify some specific kind.

My concern with relaxing the specification would be that the users may put all sorts of unexpected string values in the span kind even when one of the predefined values would be a good match and thus end up with less accurate and less valuable data. Perhaps this is not a big deal, especially if the API encourages the use of one of 4 predefined values and requires a slight extra effort to specify a custom one.

jmacd · 2019-10-24T21:30:12Z

I would be happier if we removed SpanKind from the spec and specified a conventional span.kind attribute, by the way.

tigrannajaryan · 2019-10-24T21:39:38Z

I would be happier if we removed SpanKind from the spec and specified a conventional span.kind attribute, by the way.

I agree, it may make more sense.

SergeyKanzhelev · 2019-10-24T22:33:48Z

In OpenTracing is was defined as one of 4 strings, not an arbitrary string:

Span tag name	Type	Notes and examples
span.kind	string	Either "client" or "server" for the appropriate roles in an RPC, and "producer" or "consumer" for the appropriate roles in a messaging scenario.

In OpenCensus it was added as a replacement of Recv. and Sent. prefixes in span names: census-instrumentation/opencensus-proto#51 And originally only had Client and Server values. Producer and Consumer were added on projects merge.

So I'd question the desire to "unseal" the values limit as neither of initial projects had this scenario.

Now w.r.t. attribute vs. field - for me there are two considerations:

if this is a field that is required for the proper work of backends - it is beneficial to have a special field rather than risk that other attributes will push this attribute out of the collection.
If in many apps there will be a processing (sampling, aggregation) in-proc or in collector that relies on span kind - it is beneficial to have it strongly typed.

bogdandrutu · 2019-10-24T23:17:18Z

@tigrannajaryan @jmacd I am not sure what an open form will give us in the future - we can always add new types if necessary, there is no real benefit to have this as a free from string. I can see only downsides in having it as a free form string - misspells, backends without proper support because a user just invented a new kind, etc.

@SergeyKanzhelev I completely agree with your logic and arguments.

I think the decision was made to keep this as a first class citizen of the Span during the merge discussions and we cannot revert that now until we deprecate opencensus and opentracing for backwards compatible reasons.

jmacd · 2019-10-25T00:33:19Z

It seems this field is not required for a backend to operate correctly, that it's just a hint. @SergeyKanzhelev In the OpenTracing document, those are "Notes and Examples". Nothing says you can't use any string.

SpanKind is also ambiguous. What should I use if I'm both a server and a producer?

And it's meant for compatibility with a hint for OpenCensus? This is a lot to make up for backward compatibility, when we could just use a span attribute and leave it out of the SpanData.

I don't see how we (as vendors) benefit from this annotation, but as a user being limited to exactly four values makes it feel not very useful.

yurishkuro

I don't have a strong opinion on enum vs. string. In OpenTracing span.kind tag was a string.

I am not sure if span kind is much more important that all the other data elements for which we define semantic conventions. In strongly typed languages it is easy to make the API accept a typed constant, which prevents people from freely putting garbage values, while still being extensible.

yurishkuro · 2019-10-24T19:57:45Z

specification/api-tracing.md

 * `SERVER` Indicates that the span covers server-side handling of an RPC or
-  other remote request.
-* `CLIENT` Indicates that the span describes a request to some remote service.
+  other request.


"remote" seemed relevant, why was it removed?

There's nothing to prevent a service from calling itself. Why does the "remote" aspect matter?

Because semantically we (I think) want "client" kind to mean "network call", not in process function call.

yurishkuro · 2019-10-24T20:01:38Z

specification/api-tracing.md

+
+`SpanKind` is associated with the activity that started the span,
+which helps resolve ambiguity.  For example, a server span that
+directly calls another server could be described as both a SERVER and


not a fan of such example in the spec, because it is against the traditional instrumentation guidelines that this scenario should have both server and client spans in that same service.

I wrote that example to illustrate what I see as an ambiguity and a problem with SpanKind generally. I am not familiar with the guidelines you refer to, but it feels odd if we're suggesting that a server should start a child span in order to issue an RPC. This just raises the cost of tracing.

As far as I've been able to determine, there are a few uses for SpanKind. Currently, I believe SpanKind is ambiguous at best, so I'd like a better understanding.

One thing which was part of OpenTracing was this ChildOf vs FollowsFrom distinction. I believe the choice of "producer/consumer" is associated with FollowsFrom and that "client/server" is associated with ChildOf. The important question is should the tracing system assume that the child span is going to complete before the parent span. At LightStep we call client/server spans "well-formed", whereas FollowsFrom spans are not. We can't use FollowsFrom spans for critical path analysis.

Another thing I see SpanKind being used for is to detect spans which either start with a remote parent or are themselves a remote parent. At LightStep we refer to these as "ingress" and "egress" spans. They are more interesting for monitoring purposes than internal spans.

My understanding is that SpanKind is sort of doing both of these things, telling us whether a child should complete before its parent and whether a parent and child used context propagation rather than an internal relationship.

I don't see SpanKind actually accomplishing these things, unless the guidelines you refer to are met. I would prefer if we had less ambiguous ways to do this, which do not require extra spans where they are not necessary.

Some solutions that I would prefer.

Spans could count the number of times their context is Extracted. Spans with >0 extracted contexts ought to have remote children. These are "egress" spans. Spans that are started from an Injected context ought to have a remote parent. These are "ingress" spans.

The ChildOf vs FollowsFrom distinction ought to be a Link property, since a span can have multiple children. We do not currently use a Link to store the parent span relationship, but conceptually each Link should have an attribute to say whether it is a well-formed child or an asynchronous one.

it feels odd if we're suggesting that a server should start a child span in order to issue an RPC. This just raises the cost of tracing.

Starting a child "client" span for outbound network call is how every instrumentation I ever came across operates. If you don't do that, you cannot add any tags describing the outbound call either, because they would end up on the parent "server" span where they don't belong.

The cost of tracing can be controlled by not recording those spans, but if instrumentation doesn't even create them then it is simply not describing the semantics of distributed transaction.

I see, with this guideline in place the meaning of SpanKind makes sense. This guideline should be included in the explanation for SpanKind. I haven't seen it written down anywhere.

I think a server with high fanout shouldn't be required to create a child span for each call it makes. I could add an Event to the parent span to describe the call being made. The child span will have the tags I'm interested in, and I don't feel any semantics are lost.

I will update this PR with the guideline and explain how to interpret these values.

I think a server with high fanout shouldn't be required to create a child span for each call it makes.

Yes, high fanout does occasionally present problems (e.g. 100s of calls to Redis). If the application itself knows about it and is conscious of the performance overhead of those tiny spans, then it makes sense for it to just log events. But if the API itself is efficient enough, then the optimization may still be done by the tracer, rather than the application, e.g. by combining some of those spans into summaries.

tedsuo · 2019-10-25T01:27:10Z

I do agree that we should have an enumerated list of values for span.kind, and new values should not be getting invented left and right. But I would like to understand how this data is intended to be used, in order to understand how it would be extended, if at all.

For span.kind, It's not totally clear to me why this needs to be an enum, instead of a semantic definition like all of the rest. It would be mildly more efficient to use an enum, a little bit harder to mess up. But it would also more extensible as a string value - with other semantic definitions, it is possible to model new keys and values without forking the API. With an enum, we would have to fork/version the spec every time we make a change. Part of the point of the semantic definitions is that we don't have to do something that drastic every time we want to model a new concept. Constants, helpers, etc, for the semantics are only added to the spec as a convenience.

Personally, with span.kind, I think it is a bit of a bikeshed either way. :) The arguments for how difficult it would be to get the value correct if it we used strings feel a little overblown - after all, the rest of the semantics are going to work this way. At the same time, I don't see us adding values to to span.kind very quickly, if at all. So I'm a bit 🤷‍♂ on all of this in general.

I would prefer to have a clearer definition of the purpose of span.kind. Are we just trying to denote serialization/propagation points in our trace? In that case, three values are enough: sender, receiver, and internal (call them what you will). I don't know that we need to be getting fine grained with this particular type system - i'd rather just see more semantic definitions for various types of span operations. In that sense, the producer and consumer values feel like half measures - what use do they provide?

jmacd · 2019-10-25T21:03:51Z

Please take another look.

SergeyKanzhelev

I like it! Thank you for adding clarification for a SpanKind

yurishkuro · 2019-10-28T19:21:54Z

specification/api-tracing.md

+the parent is expected to wait for it to complete under ordinary
+circumstances.  It can be useful for tracing systems to know this
+property, since synchronous Spans may contribute to the overall trace
+latency.


Some RPC frameworks (well, not really 'P' as in 'procedure) support one-way calls, that may or may not be acked on delivery, but certainly do not provide for a result to be returned to the caller (e.g. firing off a UDP message).

What span_kind should the caller use in that case? Producer?

I think this question points out that SpanKind is still flawed.

The question is whether these two should be independent or not. Can you be a CLIENT/PRODUCER? or a SERVER/CONSUMER? I.e., can you declare that there is a remote parent/child and an asynchronous relationship?

Can you be an internal span that has a PRODUCER/CONSUMER relationship with its child? These are capabilities we found in OpenTracing via ChildOf and FollowsFrom.

I am still skeptical of this field, but at least in the current state of this PR my understanding of the intention is closer to the group's.

@SergeyKanzhelev what do you think?

I agree with you that this PR improves current spec, but not addresses the problem of SpanKind expressiveness. Similar to #65 we just stalling on decision as not many consumer uses this information to the full extent.

For me - important thing is to distinguish incoming, internal and outgoing calls. As we had in OpenCensus. Mostly for the in-proc aggregation and analysis.

SergeyKanzhelev · 2019-11-01T15:52:40Z

@open-telemetry/specs-approvers this PR makes spec better by explaining the intent behind SpanKind. Not semantics change. Please review.

jmacd · 2019-11-01T16:52:19Z

@SergeyKanzhelev OK :) . Maybe we should remove PRODUCER/CONSUMER then.

I could imagine using bitwise logic to express more kinds of span, too. The dimensions are:

CLIENT/SERVER/INTERNAL these are exclusive properties to indicate whether a remote parent/child or not. one of these values is required to be set. These I perfectly agree with.

Now consider this proposal:

SYNC/ASYNC these are options, exclusive with each other, to indicate synchronicity, one of these may be set in addition to CLIENT/SERVER/INTERNAL.

That leaves 6 possible span kinds, but it doesn't quite help with the problem as I see it, and the problem is that we're mixing descriptions that apply to the parent and to the child.

In the existing spec, PRODUCER says "I will not wait for my child", CONSUMER says "My parent will not wait for me". One describes the relationship to a child, one describes the relationship to a parent.

If I'm an INTERNAL span, should I prefer to annotate the relationship with my parent, or my child? What if I want to express "I will not wait for my children AND my parent will not wait for me"? (It's a producer and a consumer.)

This leads me to think that the two options are really not exclusive, so consider this proposal:

ASYNC_PARENT: I will not wait for (all) my children
ASYNC_CHILD: My parent will not wait for me

This leads to 9 possible span kinds

CLIENT
CLIENT/ASYNC_PARENT
CLIENT/ASYNC_PARENT/ASYNC_CHILD
INTERNAL/... (3 total)
SERVER/... (3 total)

All of this leaves me feeling like the OpenTracing FollowsFrom concept was spot on, which is ASYNC_CHILD in the above proposal. Maybe we don't need ASYNC_PARENT.

jmacd · 2019-11-01T16:54:04Z

^^^ That said, I'd be happy to merge this and continue working on it.

yurishkuro · 2019-11-03T23:33:41Z

@jmacd I like your suggestion of separating the different semantic bits. However, if we do that, I would prefer the following

instead of client/server, use egress/ingress/internal
do not mix async-ness into span kind at all

For critical path analysis I don't think it is sufficient to know simply sync/async flag. A span represents a pair of events, start and end. The causality relationships are easy to define in terms of events (via "happens before" relationship), but not as easy between pairs of events. I would argue that knowing causality is more important than knowing sync/async nature, as the latter can be inferred from the former.

Microsoft Project has long used 4 types of "task links" to define relationships between tasks, like start-to-finish, finish-to-finish, etc. I am not sure if those are sufficient to express all possible happens-before combinations between parent start/end and child start/end. Ideally, we would come up with a nomenclature that could also be applied to Span Links, which have an additional uncertainty about the start events (in case of parent/child spans, we always know that parent-start happens-before child-start).

NB: I don't have objections to merging the current PR, but I think we need to address the causality uncertainty.

yurishkuro · 2019-11-03T23:35:16Z

PS: if ingress/egress terminology might be confusing, another one is inbound/outbound/internal.

yurishkuro · 2019-11-03T23:37:38Z

Another interesting span kind could be "proxy", which is both inbound/outbound. As the spec if phrased now, it's not clear what span kind would, say, a sidecar generate (for sidecar it does not make sense to be generating a pair of server/client spans).

bogdandrutu · 2019-11-04T11:14:23Z

@yurishkuro I proposed the same thing some time ago, load_balancer fits into the same category.

specification/api-tracing.md

jmacd · 2019-11-04T18:01:53Z

@yurishkuro I feel like you've contradicted yourself, here #337 (comment) and here #337 (comment)

I'd be happy to iterate on this, let's get this PR merged.

Co-Authored-By: Sergey Kanzhelev <[email protected]>

yurishkuro · 2019-11-04T22:01:18Z

@yurishkuro I feel like you've contradicted yourself...

I don't think so. #337 (comment) refers to spans created by a regular microservice. Even if it has 1-1 inbound/outbound spans, the purpose of those spans is generally different. But in case of a proxy or load-balancer, I don't consider its outbound spans as "calling a dependency", nor its inbound spans as "handling an endpoint". One can, of course, model them that way, but in practice nobody needs such overhead and proxies usually create just a single span (or sometimes no span at all, just enrich the existing one from the service through post-processing merge). But it does need a different span kind to represent that (very common for proxies) behavior.

jmacd · 2019-11-04T22:06:13Z

Thanks.

* Loosen the SpanKind specification to allow string values * Updates * Updates * Updates * Update specification/api-tracing.md Co-Authored-By: Sergey Kanzhelev <[email protected]>

Loosen the SpanKind specification to allow string values

c7ff294

jmacd requested review from AloisReitbauer, bogdandrutu, c24t, carlosalberto, iredelmeier, reyang, SergeyKanzhelev, songy23, tedsuo, tigrannajaryan and yurishkuro as code owners October 24, 2019 19:00

jmacd mentioned this pull request Oct 24, 2019

add WithSpanKind option to span creation open-telemetry/opentelemetry-go#234

Merged

SergeyKanzhelev suggested changes Oct 24, 2019

View reviewed changes

jmacd mentioned this pull request Oct 25, 2019

Clarification on Span Creation API #322

Merged

yurishkuro reviewed Oct 25, 2019

View reviewed changes

elskwid mentioned this pull request Oct 25, 2019

Add http client/server example open-telemetry/opentelemetry-ruby#131

Merged

jmacd added 2 commits October 25, 2019 13:58

Updates

6f30bff

Updates

dfc3cc9

jmacd changed the title ~~Loosen the SpanKind specification to allow string values~~ Clarify the interpretation of SpanKind Oct 25, 2019

Updates

1b36fbc

SergeyKanzhelev approved these changes Oct 28, 2019

View reviewed changes

yurishkuro reviewed Oct 28, 2019

View reviewed changes

Merge branch 'master' into jmacd/loosen_span_kind

e03974d

This was referenced Nov 1, 2019

use LinkedTo rather than ChildOf for PublicEndpoint open-telemetry/opentelemetry-go#272

Merged

Remove "FollowsFrom" relationships open-telemetry/opentelemetry-go#193

Closed

yurishkuro approved these changes Nov 3, 2019

View reviewed changes

carlosalberto reviewed Nov 4, 2019

View reviewed changes

specification/api-tracing.md Outdated Show resolved Hide resolved

carlosalberto approved these changes Nov 4, 2019

View reviewed changes

Update specification/api-tracing.md

7595543

Co-Authored-By: Sergey Kanzhelev <[email protected]>

SergeyKanzhelev merged commit 724bb54 into open-telemetry:master Nov 4, 2019

jmacd mentioned this pull request Nov 4, 2019

Change SpanKind type an integer type open-telemetry/opentelemetry-go#288

Merged

jmacd deleted the jmacd/loosen_span_kind branch November 25, 2019 19:19

jmacd mentioned this pull request Dec 4, 2019

Span.Kind.LOAD_BALANCER (PROXY/SIDECAR) #51

Open

fbogsany mentioned this pull request Feb 28, 2020

Add Rack instrumentation open-telemetry/opentelemetry-ruby#166

Merged

arielvalentin mentioned this pull request Feb 22, 2023

feat: Use Rack::Events for instrumentation open-telemetry/opentelemetry-ruby-contrib#342

Merged

Clarify the interpretation of SpanKind #337

Clarify the interpretation of SpanKind #337

Conversation

jmacd commented Oct 24, 2019 • edited Loading

tigrannajaryan commented Oct 24, 2019

SergeyKanzhelev left a comment

Choose a reason for hiding this comment

jmacd commented Oct 24, 2019

tigrannajaryan commented Oct 24, 2019

jmacd commented Oct 24, 2019

tigrannajaryan commented Oct 24, 2019

SergeyKanzhelev commented Oct 24, 2019

bogdandrutu commented Oct 24, 2019

jmacd commented Oct 25, 2019

yurishkuro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yurishkuro Oct 25, 2019 • edited Loading

Choose a reason for hiding this comment

tedsuo commented Oct 25, 2019 • edited Loading

jmacd commented Oct 25, 2019

SergeyKanzhelev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SergeyKanzhelev commented Nov 1, 2019

jmacd commented Nov 1, 2019

jmacd commented Nov 1, 2019

yurishkuro commented Nov 3, 2019

yurishkuro commented Nov 3, 2019

yurishkuro commented Nov 3, 2019 • edited Loading

bogdandrutu commented Nov 4, 2019

jmacd commented Nov 4, 2019

yurishkuro commented Nov 4, 2019

jmacd commented Nov 4, 2019

Clarify the interpretation of `SpanKind` #337

Clarify the interpretation of `SpanKind` #337

jmacd commented Oct 24, 2019 •

edited

Loading

yurishkuro Oct 25, 2019 •

edited

Loading

tedsuo commented Oct 25, 2019 •

edited

Loading

yurishkuro commented Nov 3, 2019 •

edited

Loading