Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add context propagation requirements to HTTP conventions #1783

Closed
wants to merge 6 commits into from

Conversation

lmolkova
Copy link
Contributor

@lmolkova lmolkova commented Jun 29, 2021

Blocked by #1811

Changes

  • add a requirement to propagate context created for HTTP client request
  • if a context is already injected, back-off and don't instrument

Motivation

  1. Context propagation is an essential part of tracing and de-facto done by every http instrumentation
  2. Multiple layers of instrumentation can potentially legidly (or by mistake) co-exit. In this case, the best solution is to let higher-level instrumentation win:

From #1767 (comment)

Backoff if the context is injected. If the context is already injected on the request (HTTP/gRPC/anything else), there is no other good option than to back off. Options are:

  • re-instrument, i.e. create a new span and inject header (replace or add another value)? Then the logic that injected the header (and created a previous span) will be broken. There is no way to suppress that span.
  • re-instrument, but not inject header? Then this instrumentation layer is broken - there is no reason to export this span
  • don't instrument: ok, someone above already instrumented this request and perhaps created a span, nothing else to do. It seems nothing is broken and we didn't even create a span

This approach also means that the user's manual instrumentation always wins, which seems like a good default to have in terms of supportability. This is really short-term mitigation for a subset of double-instrumentation problems.

Related issues #
#530
#1767

@lmolkova lmolkova requested review from a team June 29, 2021 21:58
@lmolkova
Copy link
Contributor Author

lmolkova commented Jun 29, 2021

Open questions:

  • do we want to be specific on which propagator should be used? (arguably it's up to downstream service and instrumentation)
  • do we want to make optimization in propagator API to check if a context is already created? We can do it with extract, but it will require unnecessary context parsing/population when there are multiple layers of instrumentation. We can also leave it up to the language to decide.

@@ -111,6 +112,11 @@ from the `net.peer.name`
used to look up the `net.peer.ip` that is actually connected to.
In that case it is strongly recommended to set the `net.peer.name` attribute in addition to `http.host`.

### Context propagation

- context created for HTTP client span MUST be injected on outgoing request using configured [propagator](../../context/api-propagators.md)
Copy link
Member

@cijothomas cijothomas Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this addition can be for not just http, but every instrumentation dealing with out-of-proc communication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, if this direction is supported by the community, I can update other relevant specs, I can also update propagators doc to mention it.

CHANGELOG.md Outdated Show resolved Hide resolved
specification/trace/semantic_conventions/http.md Outdated Show resolved Hide resolved
@Oberon00 Oberon00 changed the title Add context propagation requriements to HTTP conventions Add context propagation requirements to HTTP conventions Jun 30, 2021
### Context propagation

- context created for HTTP client span (if valid) MUST be injected on outgoing request header using configured [propagator](../../context/api-propagators.md)
- Exception: if outgoing HTTP request already has valid context (for configured propagator), it cannot be changed and new span MUST NOT be recorded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I oppose this change. #1738 made in clear and explicit that nested CLIENT spans are allowed. There are totally valid cases for having them: e.g. DB call which used HTTP as its transport. In this case I want both a span with DB semantic convention and spans with HTTP semantic conventions.

I totally agree that only the outer-most or first CLIENT span must have its context propagated.

Copy link
Contributor Author

@lmolkova lmolkova Jul 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point and agree nested client spans should be allowed.
This change does not prohibit nested client spans. This is an edge case of 2 HTTP instrumentations fighting for the same request. Arguably there is no real use-case behind 2 identical spans created on different API levels.
At the same time, DB spans should not inject context into http requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is worth clarifying in this spec that only HTTP instrumentation should inject context into http request? I.e. if OTel instrumentation injects headers into HTTP requests it MUST also create an HTTP span?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is worth clarifying in this spec that only HTTP instrumentation should inject context into http request? I.e. if OTel instrumentation injects headers into HTTP requests it MUST also create an HTTP span?

Why? Let's take another example: messaging system with HTTP transport. PRODUCER span is created by it and it definitely has to be injected and propagated. Hm, but it probably will be injected into the message and not http request... But then do we want for http instrumentation to inject its context into http headers? Probably not?

Arguably there is no real use-case behind 2 identical spans created on different API levels

Agree

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then do we want for http instrumentation to inject its context into http headers? Probably not?

We probably don't need it but it probably won't hurt if we have it in both HTTP and the message.

As a special case, for AWS messaging systems (SQS, SNS), setting the right HTTP header will actually end up being translated to a message property. https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-message-metadata.html#sqs-message-system-attributes

@lmolkova
Copy link
Contributor Author

lmolkova commented Jul 6, 2021

With messaging, you have to allow separate context on message and transport. Reasons:

  • messaging systems allow batching messages on send
  • they allow buffering and sending absolutely independent messages in the background
  • retries happen and you can change the transport span (create a new one), but not the message span (or you'll end up with multiple contexts for the same message, which may be hard to debug)

Service meshes that handle messaging are the ultimate example of all the above.

Why?

Assuming there is instrumentation for all almost HTTP clients in nearly every language, having consistent high-quality spans created for HTTP requests will bring a better user experience - vendors always know how to visualize it, build metrics and alerts based on it.
Letting DB instrumentation instrument HTTP requests and HTTP instrumentation only create a span (retries again? complex multi-request operations?) removes the assumption that under HTTP client span there is a server HTTP span and complicates experience.

@lmolkova
Copy link
Contributor Author

lmolkova commented Jul 9, 2021

@iNikem @bogdandrutu please take a look.


`Context` created for HTTP client span (if valid) MUST be injected on outgoing request headers using configured [propagator](../../context/api-propagators.md). Instrumentation that injects `Context` into HTTP request headers MUST also emit a span that complies with other client requirements in this specification.

**Exception**: if outgoing HTTP request already has valid `Context` (for configured propagator), it cannot be changed and new span MUST NOT be recorded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this be determined using the current OpenTelemetry API? Would all http client instrumentation be required to query the propagator, grab the fields() and check to see if they exist already? How would that work if the http request object was being re-used and the fields were left over from a previous request? (see the documentation on the fields() function of the propagator to see what I'm referring to).

Copy link
Contributor Author

@lmolkova lmolkova Jul 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this be determined using the current OpenTelemetry API?

I was thinking about having this logic within the propagator and have 2 options:

  1. Sub-optimal, no propagator change requires: instrumentation calls extract and checks if it got valid context. It's cheap when there is one layer of instrumentation, but if there are multiple, it involves extra context parsing and allocation.
  2. Optimal: add CanExtract (naming is always hard) method that only checks for context presence and may do minimum validity checks.

This way decision on context presence and validity stays within the propagator and multiple HTTP (and other protocol) instrumentations don't have to do it.

I suggest starting with 1 as perf hit affects relatively rare scenarios (multiple instrumentation layers) and is not very big in the general case of single transparent. Optimization (option 2) can come later.
I'm also open to starting with option 2 right away add a new method on propagators if there is any consensus around it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would that work if the http request object was being re-used and the fields were left over from a previous request?

Great point! I believe the current spec requires to clean up the context for the reused carriers, Is there an assumption that it's not always possible?

I think this is also worth mentioning here that context should be cleaned up after HTTP try if the HTTP client allows reusing the same request instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec says "successive calls should clear these fields first.", so your proposed call to extract would still be operating on the previous contents of the headers, meaning that the resulting Context would always be valid, so I don't think this is a feasible approach, with either options 1 or 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, you're right. what's the feeling about changing it to clean up after rather than before? this way everyone cleans up for what they've done and never breaks anything else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh. I have no idea...I don't think I've ever seen a web client that re-uses request objects like this, so I don't have the context for what it's even referring to. :)

Copy link
Contributor Author

@lmolkova lmolkova Jul 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems it was there from the very beginning (#147) and there was no discussion/explanation around this choice. It also seems that at least golang http client allows to reuse http request instances between tries. I'll raise it as a separate issue and until then this PR is blocked.

@iNikem
Copy link
Contributor

iNikem commented Jul 12, 2021

Ok, I agree with you about messaging use-case. Different contexts propagated via message and via http connection make sense. Although I am not sure our current ecosystem (both instrumentations and backends) handles this correctly.

But I am not convinced about tying together "should I create a new CLIENT span" and "should I propagate this span". This proposal forbids creating new CLIENT span if it will not be propagated. Take the following use-case.

  • User space high level http client (e.g. reactive WebClient in Java) makes an http request.
  • Lower level "transport" layer library (Netty in this case) is invoked several times by WebClient to handle retries or redirects or circuit-breaker whatever.

Both libraries have auto-instrumentations in Java. I argue that the valid outcome should be the following:

  • CLIENT span with http semantic conventions should be created by WebClient instrumentation.
  • Its SpanContext should, obviously, be injected into the request
  • Nested CLIENT spans with http semantic conventions should be created by Netty instrumentation. At least in order to expose that lower-level transport layer information to the developer.

This proposal forbids creating those nested spans by Netty. I think it is wrong.

@ahayworth
Copy link
Contributor

My understanding of this PR is that there is some desire to handle competing "http" client spans; to collapse them into one sensible span rather than multiple duplicate spans.

We've been exploring this in opentelemetry-ruby, trying to de-duplicate http client spans in our auto-instrumentation. We have an approach that works well for us, by modifying the context around a block of code. Then, our HTTP auto-instrumentation libraries merge those context values into the attributes of the outgoing client span. The result is one outgoing span, annotated with relevant attributes from a higher-level instrumentation.

In pseudocode, it looks like:

# Higher-level HTTP-like instrumentation
OpenTelemetry::Common::HTTP::ClientContext.with_attributes({ foo: "bar" }) do
  # do work, usually calling `super`
end
# Lower-level HTTP client instrumentation
attributes = OpenTelemetry::Common::HTTP::ClientContext.attributes
tracer.in_span("GET", kind: :client, attributes: attributes.merte({ other: "attributes" }) do |span|
  # do more work
end

You can see this in practice in the "higher-level" koala instrumentation (a facebook HTTP client, I believe), and the "lower-level" Net::HTTP instrumentation (standard ruby HTTP client, used by many things). The context modification bits can be found here.

I would submit this as an alternate approach to prohibiting nested spans; this allows instrumentation that knows it is just an extremely thin wrapper around an HTTP call to decorate spans with relevant attributes without confusing duplication. This approach has downsides and does not solve all of the things you're setting out to do, but I think it is worth considering.

@lmolkova
Copy link
Contributor Author

@iNikem thanks for the useful input!

Lower level "transport" layer library (Netty in this case) is invoked several times by WebClient to handle retries or redirects or circuit-breaker whatever.

I think there is an assumption that retries and redirects should be represented as a nested span but have to share the same context on the wire. Why so?

Redirects may or may not be traceable and users configure HTTP clients to allow/prevent auto-redirects. So we should assume any HTTP client instrumentation will create new spans and propagate new context for redirects (if app logic handles them). For the sake of consistency, we should make auto-instrumentation do the same.

Retries are part of application logic in the majority of HTTP clients. It's not feasible to instrument most of the HTTP clients in the way to have a higher-level span to group retries without the user's help. So I believe we have to agree each retry has to have its own span AND context to bring consistent and understandable experience.

@lmolkova
Copy link
Contributor Author

@ahayworth this PR attempts to address corner case: what happens if http request is already instrumented. There is a broader discussion here #1767 with context-based instrumentation suppression. Thanks for sharing your approach!

@ahayworth
Copy link
Contributor

ahayworth commented Jul 13, 2021

@lmolkova Yes, I understand - and I think the context propagation requirements in this PR are useful. I mentioned the context-based span augmentation that the ruby SDK does because I feel as though it can address this part of the PR:

Exception: if outgoing HTTP request already has valid Context (for configured propagator), it cannot be changed and new span MUST NOT be recorded.

I just wanted to raise that an alternate way to address it would be through context-based augmentation: while it creates an implicit assumption that lower-level instrumentation will be present, at least in the ruby world we presume that low-level HTTP implementation will be something that many folks want. Taking that approach, there is no already-instrumented request in progress; requests are only instrumented at the lowest level, and it solves that part of the problem you've raised.

Put differently: we could instead advise instrumentation not to do this in the first place, and would perhaps avoid some of the difficulties and problems that others have raised.

(edit: I will engage in the PR you mentioned! 😄 )

@lmolkova
Copy link
Contributor Author

lmolkova commented Jul 13, 2021

I'm closing this PR based on @trask feedback on #1811 (comment) - reusable requests are much more common than I expected and requiring context clean up will bring perf hit on the happy popular case of no-retries.

I'm trying to come up with yet another context marker proposal, but I believe we need more than a terminal marker (we just don't want two layers of the same e.g. HTTP instrumentation but we want to allow deeper layers of different kinds of instrumentations: DNS/streaming messages/TCP/profiling, etc).

@lmolkova lmolkova closed this Jul 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants