Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Span name: Both low-cardinality (grouping key) and human-readable (display name) #557

Closed
Oberon00 opened this issue Apr 9, 2020 · 39 comments · Fixed by #810
Closed

Span name: Both low-cardinality (grouping key) and human-readable (display name) #557

Oberon00 opened this issue Apr 9, 2020 · 39 comments · Fixed by #810
Assignees
Labels
area:api Cross language API specification issue area:sampling Related to trace sampling priority:p1 Highest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA spec:trace Related to the specification/trace directory

Comments

@Oberon00
Copy link
Member

Oberon00 commented Apr 9, 2020

This came up when I suggested the handler name instead of the route name for the HTTP semantic conventions (#548). I think that the handler name is a better grouping key than the route, but the route is a better display name, as @yurishkuro pointed out. So I think we should relieve span name from this double duty. I'll post ideas as separate comments to allow reactions.

EDIT:

Cf. current spec of span name https://github.com/open-telemetry/opentelemetry-specification/blob/e7fe34ad/specification/trace/api.md#span:

The span name is a human-readable string which concisely identifies the work represented by the Span, for example, an RPC method name, a function name, or the name of a subtask or stage within a larger computation. The span name should be the most general string that identifies a (statistically) interesting class of Spans, rather than individual Span instances. That is, "get_user" is a reasonable name, while "get_user/314159", where "314159" is a user ID, is not a good name due to its high cardinality.

I used the term "grouping key" here because I think using the span name as a key to group / sample similar spans is the reason why a Span name should have a low cardinality.

@Oberon00
Copy link
Member Author

Oberon00 commented Apr 9, 2020

  1. Add a new field "GroupingKey" to the span that defines a grouping key. The span name should be used if this is not set.

@Oberon00
Copy link
Member Author

Oberon00 commented Apr 9, 2020

  1. Add a semantic convention for a span attribute grouping_key.

@Oberon00
Copy link
Member Author

Oberon00 commented Apr 9, 2020

  1. Define a resource semantic convention semconv.isgroupingkey:<ATTRIBUTE_NAME>=true that defines a given span attribute as being viable as a grouping key (Problem: what if multiple exist? Have the value be an integer priority?).

@Oberon00 Oberon00 changed the title Span name is both grouping key and diplay name Span name is both grouping key and display name Apr 9, 2020
@Oberon00
Copy link
Member Author

I forgot to mention this in the SIG meeting today, but this topic was talked about there, so it seems this is an actual issue.

@jkwatson
Copy link
Contributor

jkwatson commented May 5, 2020

Can you define what a grouping key is? I'm not familiar with this term.

@Oberon00
Copy link
Member Author

Oberon00 commented May 6, 2020

@jkwatson Thank you for that question, I think I accidentally coined that term here. I edited the description to hopefully be more clear. The "grouping key" requirement can be found in the spec as a requirement for low cardinality of the span name. I think the intention here was to allow grouping / sampling per span name.

@Oberon00 Oberon00 changed the title Span name is both grouping key and display name Span name: Both low-cardinality (grouping key) and human-readable (display-name) May 6, 2020
@Oberon00 Oberon00 changed the title Span name: Both low-cardinality (grouping key) and human-readable (display-name) Span name: Both low-cardinality (grouping key) and human-readable (display name) May 6, 2020
@arminru
Copy link
Member

arminru commented May 6, 2020

@jkwatson You can think of the proposed grouping key as what an SQL query will look at in a "GROUP BY" clause. All spans with the same grouping key will be grouped together on the backend when being analyzed or presented in the UI.
Eventually, however, it is up to the backend to decide in which cases and for which purposes it applies grouping at all. It could also decide to group using another logic or even custom defined rulesets instead of relying on the grouping key (or span name).

@jkwatson
Copy link
Contributor

jkwatson commented May 6, 2020

Thanks for the answers, that helps a lot!

@yurishkuro
Copy link
Member

I forgot to mention this in the SIG meeting today, but this topic was talked about there, so it seems this is an actual issue.

@Oberon00 can you elaborate? I am not seeing an issue yet, especially since "grouping key" is a concept for machine processing, and why would machines care if they group by extra field or by readable span name (providing there's uniqueness).

@Oberon00
Copy link
Member Author

Oberon00 commented May 6, 2020

@yurishkuro

"grouping key" is a concept for machine processing, and why would machines care if they group by extra field or by readable span name

Exactly! That's the idea behind my suggestions in this issue.

I am not seeing an issue yet

You can see the issue in the PR comment linked in the description (#548 (comment); actually it was your comment 😄). The issue is that if we just want the span name to be a display name, we can drop the requirement for low cardinality and would potentially have a better human-readability (e.g. we could use GET domain.example.com as display name instead of just HTTP GET), or we could drop the requirement for (easy) human readability and would have a better grouping key. The span name needs to satisfy both requirements which obviously means there must be compromises made in some cases.

@yurishkuro
Copy link
Member

Sorry, but I don't see "low cardinality" and "human readable" as conflicting requirements. In Jaeger UI, for example, you can select the Service from a dropdown, and the following dropdown is populated by span names - they have to be both human readable and low cardinality (high cardinality will explode the dropdown and make it unusable).

@Oberon00
Copy link
Member Author

Oberon00 commented May 6, 2020

"HTTP GET" is a very poor display name (it's also a poor grouping key). "GET domain.example.com" is a good display name, but it's an even worse grouping key.
I think this clearly demonstrates that there are cases where this conflicts.

@Oberon00
Copy link
Member Author

Oberon00 commented May 6, 2020

If you want to use your groups for display (as opposed to sampling), you'll need a display name for your group. In that case you can use any one display name from within that group and display something like "'GET domain.example.com' and 32323 similar spans"

@pauldraper
Copy link

pauldraper commented May 7, 2020

To summarize, you dislike the use of span name being both (1) the name for the individual span "instance" and (2) the (low-cardinally) name of the span "class".

I agree. If I'm looking at a single trace, I'd prefer to see

HTTP GET /accounts/123
    HTTP GET /users/123
    HTTP GET /users/456
    HTTP GET /users/789

instead of

HTTP GET /accounts/{accountId}
    HTTP GET /users/{userId}
    HTTP GET /users/{userId}
    HTTP GET /users/{userId}

The former is a more informative summary of the span.

Of course the latter form still needed for grouping (e.g. Jaeger dropdown UI).

Counterargument:

Span attributes are the place to store instance-specific details.

Counter-counterargument:

There are more attributes than can be displayed for all spans in a trace, and there are more span attributes than every backend can be required to understand significant vs. insignificant. If you want something looks like the former trace, you need the instrumentation to name it.


For compatibility I propose the opposite: the span "name" should continue to be the low-carnality span "class." And there should be a standardized attribute name or instance that summarizes the high-carnality span "instance."

@yurishkuro
Copy link
Member

And there should be a standardized attribute name or instance that summarizes the high-carnality span "instance."

I think having an optional display.name attribute is a reasonable idea (I've seen it being proposed in some Jaeger tickets), but it's certainly not a required field. Tracing UIs have wide flexibility in how they want to display the spans, e.g. many APM vendor tools prominently show the HTTP status code without having to drill down into span details, but surely we're not suggesting that status code should be a recommended part of the display name. Because each span already captures a lot of other attributes, UIs can be intelligent enough to decide which display format is the most useful, and it would not require the application code to duplicate the information it already provides in the span.

@carlosalberto
Copy link
Contributor

Tracing UIs have wide flexibility in how they want to display the spans, e.g. many APM vendor tools prominently show the HTTP status code without having to drill down into span details

My same feeling.

I like the idea of having an additional, optional attribute (such as display.name) that is defined in the semantic conventions, and that UIs may want to use for display instead of Span name.

@pauldraper
Copy link

pauldraper commented May 8, 2020

optional

Agreed. No reason to make it mandatory: there is an abundently obvious fallback in span name.


To add a further example for this idea:

Since HTTP clients don't have to pattern-match URLs, they frequently are left without a great low-cardinality names, and the instrumentation is forced to choose something poor like HTTP GET or HTTP GET github.com. The marginal utility of an instance-specific name is large in this case.


Bike shed: The thing I don't like about display.name is "display;" the instrumentation shouldn't be defining display.color or display.lines or such.

@Oberon00
Copy link
Member Author

Oberon00 commented May 8, 2020

So back to the initial description: Will this display.nane allow me to merge #548 then?

@iNikem
Copy link
Contributor

iNikem commented May 11, 2020

"HTTP GET" is a very poor display name (it's also a poor grouping key). "GET domain.example.com" is a good display name, but it's an even worse grouping key.
I think this clearly demonstrates that there are cases where this conflicts.

Why "GET domain.example.com" is a bad grouping key?

@jmacd
Copy link
Contributor

jmacd commented May 11, 2020

My $0.02 was given verbally in the last SIG meeting. I'd like to see a span name with message-formatting syntax, like "GET {http.url}". This is both descriptive and low cardinality, and when it comes time to display, the system can format the message.

@iNikem
Copy link
Contributor

iNikem commented May 11, 2020

If you care for one more opinion: span name is certainly a "grouping key" for me. A value that define a "class" of spans, as specification states. It is certainly has less information that possible display name of a single span. Every UI vendor can may his own decisions (probably even different ones for different users of that UI) what specific attribute of the span to use in their UI. Span name defines a unit of work, not a particular invocation of that work.

Taking that into account I think the original proposal of #548 is a good one and we should go with it.

@pauldraper
Copy link

pauldraper commented May 11, 2020 via email

@iNikem
Copy link
Contributor

iNikem commented May 11, 2020

Correct. Backends have to understand attributes anyway if they hope to present useful information to their users. "Useful" as in "relevant to users' use-cases". You don't make decisions about UI in sql queries.

@Oberon00
Copy link
Member Author

Oberon00 commented May 11, 2020

@iNikem

Why "GET domain.example.com" is a bad grouping key?

Because it can have a very high cardinality. See #270 (comment) EDIT: And https://github.com/open-telemetry/opentelemetry-specification/pull/416/files#r369591004

@iNikem
Copy link
Contributor

iNikem commented May 11, 2020

Ah, I thought you were talking about "GET ". Using full URL is certainly bad for grouping. Even using path portion is very bad, agreed.

@iNikem
Copy link
Contributor

iNikem commented Jul 13, 2020

And I again want to express my support for #548. What else has to be done before that PR can be revived and merged?

For the record, I have since changed my opinion and think that http route information, if available, makes for a better span name than handler name. My current opinion is based on the following example.

Imagine we have a servlet container running Spring MVC application and having auto instrumentation agent attached. When we create SERVER span from servlet request we have full url, which is too high cardinality for span name, and servlet handler name, like MainServlet.doGet. Later Spring MVC kicks in and detects that requests should be handled to a PersonController.showDetails based on the request route /api/v2/persons/{personId}.

In the situation above I think it is sensible to update SERVER span name to route-based name /api/v2/persons/{personId}. But not PersonController.showDetails because that would be strange to servlet instrumentation created span.

@tedsuo
Copy link
Contributor

tedsuo commented Jul 23, 2020

Hi all, I've made a display hint proposal to help resolve this, please have a look: #730

@yurishkuro
Copy link
Member

"HTTP GET" is a very poor display name (it's also a poor grouping key).

I want to address this argument. For server-side span, there is no excuse in using "HTTP GET" as a span name. A server always has some kind of mapping from URLs to handlers, that mapping most commonly called "route". The route is good both as display string and grouping key, satisfying the span name requirements.

The situation where "HTTP GET" names become necessary are for the client-side spans emitted by generic instrumentation of HTTP frameworks, which are removed from business logic and only have the (potentially high-cardinality) URL to work with. It is often not that interesting to build aggregations of those spans, so while certainly a poor grouping key those names do not affect the analysis that much. Without aggregation requirement, these spans are only useful to be shown in the Gantt-chart style views, where we're dealing with a single span and the UI can replace/augment the span name with the full URL.

@pauldraper
Copy link

pauldraper commented Jul 24, 2020

I think that depends on how you choose to do it.

IMO there isn't a fundamental difference between client tracing and server tracing:

client.programming.function
   HTTP GET client
       HTTP GET server
           server.programming.function

What is instrumented by whom depends.

If you are using a SOAP client, there an obvious automatic name for the operation. If you are using ApacheHttpClient you will have to choose and add the name yourself.

If you are using a Play server, there is an obvious automatic name for the operation. If you are using com.sun.net.httpserver.HttpServer you will have to choose and add the name yourself.

@Oberon00
Copy link
Member Author

@open-telemetry/dotnet-approvers are you aware of this issue? It seems you already made a choice here that runs counter the current direction of this discussion when naming the Name DisplayName (see #730 (comment))

@SergeyKanzhelev
Copy link
Member

I suggest to move this issue for Future milestone. This may be added after GA, not a breaking change.

@Oberon00
Copy link
Member Author

The issue here is that the purpose of the existing span name is not clearly defined. Sure, you can add a display.name later on (should it be required). But you cannot as easily undo the compromises made in semantic conventions to make the existing span name a more useful display name and a worse grouping key for that.

@Oberon00
Copy link
Member Author

Oberon00 commented Aug 14, 2020

See spec

The span name is a human-readable string which concisely identifies the work represented by the Span, for example, an RPC method name, a function name, or the name of a subtask or stage within a larger computation. The span name should be the most general string that identifies a (statistically) interesting class of Spans, rather than individual Span instances.

the issue is that optimizing a span name for human-readability will sometimes yield different results from optimizing for "most general string", as demonstrated in #548. In that PR, I agree that when "naming" an HTTP span, the route name is better for human consumption, but I maintain that the handler name is a more general string and still identifies a statistically interesting class of spans.

@austinlparker
Copy link
Member

austinlparker commented Aug 14, 2020

Outside of the spec, the span name is generally referred to as the 'grouping key' for operations and instrumenters are advised to avoid high cardinality span names. I agree with your (@Oberon00) overall point, but I would suggest that #548 could just as simply be resolved by turning the handler class/method name into a semantic attribute on a server span.

+1 to Sergey's point about tabling this until post-GA.

@SergeyKanzhelev
Copy link
Member

@Oberon00 this problem exists on many platforms already and every platform takes its route to resolve it. OpenTelemetry semantic convention made emphasize on (more) guaranteed grouping. Every app owner/vendor may adjust the span name with custom modules or custom code.

Also our requirement is to have a support for features in some backend. This is not a simple attribute and additional semantic is forced on it. So it would be great to have Jaeger or other backend to demonstrate how to use it.

@Oberon00
Copy link
Member Author

I'm perfectly fine with tabling #548 and #730, but this issue is not resolved by either IMHO.

So it would be great to have Jaeger or other backend to demonstrate how to use it.

How to use what?

I think I'll propose a one sentence PR for this issue before more confusion arises 😉

@Oberon00
Copy link
Member Author

I proposed #810 to clarify what I think we should resolve before GA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api Cross language API specification issue area:sampling Related to trace sampling priority:p1 Highest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA spec:trace Related to the specification/trace directory
Projects
None yet
Development

Successfully merging a pull request may close this issue.