Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow adding links after span creation #2278

Closed

Conversation

pyohannes
Copy link
Contributor

Fixes #454

In a discussion in the specification SIG, it was deemed feasible to revise a further decision about not allowing adding links to spans after span creation.

Adding links after span creation allows to create more meaningful traces in certain scenarios (several of those scenarios were identified by the messaging workgroup). Best-effort sampling based on links is still possible, only based on links added at creation time (similar to attributes).

Changes

This moves the section about adding span links from "Span creation" to "Span operations", removing mentions of the limitation that links can only be added at creation time. The name AddLink is proposed, wording is kept consistent with the corresponding section for adding attributes.

@pyohannes pyohannes requested review from a team January 19, 2022 23:35
@bogdandrutu
Copy link
Member

My 2cents: By approving this PR we are allowing instrumentation to use the new API, instead of trying to pass all the links at start time. This will cause instrumentations to not even consider grouping all the links at the start which will mean in lots of cases the instrumentation will produce incomplete traces since the sampler is not working in this case.

We are opening a pandora box without thinking of implications, and this may cause troubles on short to medium term until a proper solution for the samplers to work is proposed.

@pyohannes
Copy link
Contributor Author

pyohannes commented Jan 20, 2022

By approving this PR we are allowing instrumentation to use the new API, instead of trying to pass all the links at start time. This will cause instrumentations to not even consider grouping all the links at the start which will mean in lots of cases the instrumentation will produce incomplete traces since the sampler is not working in this case.

That's a valid point: this might change the shape of some traces when one is using custom samplers (none of the specified samplers use links for now).

We are opening a pandora box without thinking of implications, and this may cause troubles on short to medium term until a proper solution for the samplers to work is proposed.

I think it would be beneficial to make a decision as soon as possible (whether to allow links after creation time or not to) for the following reasons:

  1. It will give us a clear framework to work with when coming up with semantic conventions. If we now decide to go with a sub-optimal solution that is based on the current limitation, we will be stuck with this sub-optimal solution (as we want to stabilize some semantic conventions this quarter), even when the spec will remove the limitation at a later point.
  2. When we want to remove this limitation, it's probably better to do it know than later. Links are not widely used now, but probably will be used more and more with time. The longer we wait with a decision, the harder it will be to remove the limitation.

@pyohannes
Copy link
Contributor Author

This will cause instrumentations to not even consider grouping all the links at the start which will mean in lots of cases the instrumentation will produce incomplete traces since the sampler is not working in this case.

Sampling in connection with links is a very tricky thing, as I found out by implementing a link-based sampler myself. Even with the current model, it's impossible to avoid incomplete traces: how will you sample a span that has a non-sampled parent, but sampled links? The only way to guarantee complete traces would be to limit adding links only to root spans of a trace.

Producing incomplete traces with link-based samplers is not a problem that is introduced with this change, although this change increases the scope of this problem.

@carlosalberto
Copy link
Contributor

We are opening a pandora box without thinking of implications, and this may cause troubles on short to medium term until a proper solution for the samplers to work is proposed.

The same happens for attributes, but we allow that anyway. I feel like a 'big' warning should be put in the related sections, mentioning that instrumentation code SHOULD try its best to set links/attributes at Span build time.

@jmacd
Copy link
Contributor

jmacd commented Jan 20, 2022

I want to add a few remarks about sampling and "completeness", since that is a major topic in #2047.

By approving this PR we are allowing instrumentation to use the new API, instead of trying to pass all the links at start time. This will cause instrumentations to not even consider grouping all the links at the start which will mean in lots of cases the instrumentation will produce incomplete traces since the sampler is not working in this case.

Challenging this statement @bogdandrutu: if there's a real problem created as a result of the new API, that is to say that many developers opt to use AddLink() instead of passing to Start(), then users will have "incompleteness" and they may be motivated to ask why the instrumented code did not use the Start() API. If there's a will and a way, the right thing will happen. OTOH, the reason for this change is that sometimes there is literally not a way to know your links when you start, so we should allow this new API.

I also want to challenge the idea that anyone using links actually expects a guarantee of "completeness" when they are sampling. Actually, the term "completeness" is not defined the way it's being used in this discussion. To me, the point of a Link is to connect one trace to another, and there is no notion of completeness for trace and its linked-to-traces that we have defined (see #1929). Completeness is about whether all the spans for a given TraceID have been collected (per #2047).

I do not expect linked-to traces to be collected with a 100% guarantee if I'm sampling. One reason to use links is to avoid huge or never-ending traces, so I am usually explicitly not interested in having them all collected. There are two directions a user could go if they want to ensure a sampled trace has its linked-to traces also collected.

  1. Non-probabilistically: if the cost of tracing is not a concern, making a non-probabilistic decision to sample a trace whenever any of its provided-at-start-time linked-to contexts is also being traced. If the cost of tracing is a concern, it's not clear how to rate-limit this decision.
  2. Probabilistically: if probability sampling is being used, then each linked-to trace has an independent probability of being collected. If you have 1000 links and use 1/2 sampling for the linked-to traces, you should expect approximately 500 of them to be collected. (This, to me, is working as intended.)

@jmacd
Copy link
Contributor

jmacd commented Jan 20, 2022

how will you sample a span that has a non-sampled parent, but sampled links? The only way to guarantee complete traces would be to limit adding links only to root spans of a trace.

#2047 defines a notion of sub-trace completeness that can be used here. Generally speaking, with sampling you have the potential for incompleteness, think of it as a feature not a bug. 😁

@jmacd
Copy link
Contributor

jmacd commented Jan 21, 2022

several of those scenarios were identified by the messaging workgroup

It may be helpful to link to a summary of those scenarios.

the reason for this change is that sometimes there is literally not a way to know your links when you start

One of the reasons I've wanted to add links after a span starts is:

a. Span A and B start
b. Span A acquires a shared resource
c. Span B tries to acquire a shared resource, but is blocked by Span A
d. Instrumentation adds a link to Span B ("blocked by")
e. Instrumentation adds a link to Span A ("blocking")

@pyohannes
Copy link
Contributor Author

The same happens for attributes, but we allow that anyway. I feel like a 'big' warning should be put in the related sections, mentioning that instrumentation code SHOULD try its best to set links/attributes at Span build time.

@carlosalberto This is mentioned (I took this over from attributes), I'm not sure if it should be emphasized more:

Whenever possible, users SHOULD set any already known links at span creation instead of calling AddLink later.

@pyohannes
Copy link
Contributor Author

several of those scenarios were identified by the messaging workgroup

It may be helpful to link to a summary of those scenarios.

One is briefly sketched here: #454 (comment)

We also plan to present our current design for messaging instrumentation to a wider audience, this will be covered there: #454 (comment)

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jan 28, 2022
@pyohannes pyohannes force-pushed the create-span-link-after-creation branch from 9dc2cdd to 5fcc01f Compare January 29, 2022 01:37
@pyohannes
Copy link
Contributor Author

I will bring this up in the next specification SIG meeting, to discuss next steps for resolving open questions.

@@ -402,30 +407,6 @@ A `SpanContext` cannot be set as active in a `Context` directly, but by
[wrapping it into a Span](#wrapping-a-spancontext-in-a-span).
For example, a `Propagator` performing context extraction may need this.

#### Specifying links
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep this section? I don't think that we can remove the existing API for specifying links at creation time. Instead, we should simply remove the last sentence of the first paragraph here:

Links cannot be added after Span creation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we can remove the existing API for specifying links at creation time.

Right, the API for creating a span is still required to accept a list of links (see above). In addition, a method for adding links after span creation is added as a should.

I tried to structure this in a way similar to attributes, where the situation is similar (attributes can be added at creation time and after span creation). I'm open to change this, if it's clearer in another way.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to jump on the same train, but I've found that the diff is misleading. By reading the whole document (without diff context), the order of the document introduces the creation first and adding links comes later and is discouraged in the span creation section.

My only concern is that the "Link"s link in the "Spans encapsulate:" part jumps straight to the "Add Links" section which may be misleading. Perhaps we can create a separate "Links" subsection that describes what a link is (arguably the same could be done for Events however Events don't make sense at create time)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can create a separate "Links" subsection that describes what a link is (arguably the same could be done for Events however Events don't make sense at create time)

Any other opinions on this? Otherwise, I'll go ahead the change the PR according to the suggestion.

specification/trace/api.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
specification/trace/api.md Outdated Show resolved Hide resolved
specification/trace/api.md Show resolved Hide resolved
@pyohannes
Copy link
Contributor Author

pyohannes commented Feb 15, 2022

These are the diagrams regarding messaging use cases presented in today's spec SIG. For the pull scenario solution presented below, it would be necessary to add links after span creation.

Details and related discussions are captured in this draft OTEP: open-telemetry/oteps#192

Push scenarios

const consumer = kafka.consumer({ groupId: 'my-group' })
await consumer.run({
 eachBatch: async ({ batch, offset, heartbeat, co, uco, isRunning, isStale }) => {
     for (let message of batch.messages) {
         console.log({
             key: message.key.toString(),
             value: message.value.toString(),
             headers: message.headers,
         })
     }
 },
})

Messaging consumer scenarios (1)

Pull scenarios

const res = await client.receive();
res.messages.forEach(() => console.log('operation with side effect'));

Messaging consumer scenarios

@bogdandrutu
Copy link
Member

@jmacd

One of the reasons I've wanted to add links after a span starts is:
a. Span A and B start
b. Span A acquires a shared resource
c. Span B tries to acquire a shared resource, but is blocked by Span A
d. Instrumentation adds a link to Span B ("blocked by")
e. Instrumentation adds a link to Span A ("blocking")

This is a great example, but in this case the "link" has also associated a "relation". Which I think is important to have if we start allowing links after.

The "default" case that we had so far was batching and the child of relation. With this new proposal we open the possibilities to way more relations and we probably need to think about this and add that to the API probably.

As per @pyohannes picture for the "Pull scenarios", the relation between the Span and the "linked context" is no longer a child of, so I think we need to document that in the link. The relation can be a semantic convention, but I think is important to have it.

@pyohannes for the "Pull scenarios":

Somehow I don't like the proposed way to model this, the reason is that (independent of adding links at start/end) I would prefer to have another Span Execute that wraps the 2 console.log ops as a sibling to Receive. That would feel much more natural.

@pyohannes
Copy link
Contributor Author

@bogdandrutu

The "default" case that we had so far was batching and the child of relation. With this new proposal we open the possibilities to way more relations and we probably need to think about this and add that to the API probably.

I think in this context, even links in batching use cases modelled "child-of" relations: it's the case where one child has multiple parents.

As per @pyohannes picture for the "Pull scenarios", the relation between the Span and the "linked context" is no longer a child of, so I think we need to document that in the link. The relation can be a semantic convention, but I think is important to have it.

Yes, links could greatly benefit from additional semantic context. In the messaging use cases, we'd always have links between one span with kind producer, and one span with kind consumer. This way semantic context could be inferred from the spans that are linked. But having it explicitly attached to a link. As it is already possible to attach attributes to a link, it's might just be a matter of coming up with semantic conventions for attributes on links?

@pyohannes for the "Pull scenarios":

Somehow I don't like the proposed way to model this, the reason is that (independent of adding links at start/end) I would prefer to have another Span Execute that wraps the 2 console.log ops as a sibling to Receive. That would feel much more natural.

As mentioned, this is independent of adding links after span creation. In the messaging workgroup, we were discussing "Process" spans (same as "Execute" spans). The aim was to come up with a model that can easily be extended by "Process" spans. However, we're not yet clear about all the details, and we'd also like to clear up whether we'd be able to create a "Receive" span as sketched above, as "Process" spans might look different in a different model.

@reyang reyang added area:api Cross language API specification issue spec:trace Related to the specification/trace directory labels Mar 29, 2022
@github-actions
Copy link

github-actions bot commented Apr 6, 2022

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Apr 6, 2022
@pyohannes pyohannes force-pushed the create-span-link-after-creation branch from 152e5b6 to 8aa3d5d Compare April 7, 2022 23:43
@pyohannes
Copy link
Contributor Author

This is still being discussed with @bogdandrutu in the Thursday messaging workgroup.

@github-actions github-actions bot removed the Stale label Apr 13, 2022
@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Apr 20, 2022
@github-actions
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Apr 27, 2022
@nerochiaro
Copy link

I came here looking for a resolution of this issue, but it seems that the last significant comment was back in April

This is still being discussed with @bogdandrutu in the Thursday messaging workgroup.

I am curious to understand if the discussion has simply stalled and has been abandoned or if there is a possibility that at some point opentelemetry will allow adding links to spans that have already been created.

@pyohannes
Copy link
Contributor Author

I am curious to understand if the discussion has simply stalled and has been abandoned or if there is a possibility that at some point opentelemetry will allow adding links to spans that have already been created.

The discussion has stalled but is not yet settled, issue #454 is still open. @nerochiaro Maybe you could add your use case for adding span links after span creation to this issue?

@nerochiaro
Copy link

I added more comments to issue #454, but in short the use case would be when using the lambda automatic instrumentation.

A span for the lambda function is created automatically by the instrumentation's function wrapper, and there is no way to tell it to add links (for example, a link to the span created by a lambda authorizer that ran before the function itself).

@Oberon00
Copy link
Member

For that use case you don't need to add links after creation, if you can tell the wrapper to add the links right away.

@nerochiaro
Copy link

nerochiaro commented Oct 18, 2022

For that use case you don't need to add links after creation, if you can tell the wrapper to add the links right away.

Except that, as far as I can tell, the wrapper creates the span then runs the lambda, without accepting any input.
I would have to write my own wrapper which duplicates all the code of the default one, and add code to create links to that.

(Or possibly the wrapper could accept a configuration option that specifies a "hook" function that given the event and context would generate extra links and attributes to add to the lambda, but it seems convoluted and with drawbacks. see my comment on #454 for more details.)

I'm not saying it's not feasible, but certainly not ideal, especially when lambdas running behind API Gateway with an authorizer lambda are a pretty common case. Maybe this scenario could be modeled some other way that I'm not seeing, without the use of links ?

@joaopgrassi joaopgrassi removed the Stale label Oct 19, 2022
@joaopgrassi
Copy link
Member

joaopgrassi commented Oct 19, 2022

I removed the stale label, but I wonder if we should focus the discussion on the issue or here in this PR. If here, then maybe re-opening the PR to get attention again?

@carlosalberto
Copy link
Contributor

I'd recommend discussing this in the issue itself (e.g. sometimes there's more than one possible PR for a given issue, so general discussion should be in the issue itself).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api Cross language API specification issue spec:trace Related to the specification/trace directory
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Please (re)-allow recording links after Span creation time