Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Add HTTP Correlation protocol #18887

Merged
merged 3 commits into from
Jul 6, 2017

Conversation

lmolkova
Copy link

We created HTTP Correlation protocol as a part of correlation feature: it describes Id format and generation (that is implemented in System.Diagnostics.Activity) and HTTP headers that carry Activity.Id and Activity.Baggage (used in System.Net.Http and Asp.Net Core).

The protocol will be submitted to IETF, but until it's officially published as RFC, we want to store it in corefx repo unless someone has a better suggestion.

/cc @vancem @SergeyKanzhelev

@cwe1ss
Copy link
Contributor

cwe1ss commented Apr 26, 2017

Could you please share some information about the planned process of this standard.

My main questions are:

  • Which industry parties have been involved with this proposed standard so far?
  • Why do you implement it in the .NET framework before it has been approved?
  • How will the .NET framework deal with breaking changes that come up during standardization?

Since this involves interoperability with other systems/languages, I'd like to ping some people who are involved in open source tracers and the opentracing project - maybe someone could share his opinion about this proposed standard.

@adriancole (Zipkin) @basvanbeek (Zipkin) @bhs (Lightstep) @pavolloffay (Hawkular) @wu-sheng (Sky-Walking) @yurishkuro (Jaeger) (feel free to ping other people)

PS: As always, I'm excited with the intention to get distributed tracing features in .NET. My main concern here is that a) this standard will not be adopted by anyone outside of Microsoft and b) hard-coding this standard will make it very hard to integrate .NET apps with existing tracers. Also, any changes during standardization will lead to more legacy code in the .NET framework.

I'd prefer to see all of this shipped in a later version of .NET but with a well discussed standard.

This document provide guidance for implementations of [HTTP Correlation Protocol](HttpCorrelationProtocol.md) without [Hierarchical Request-Id](HierarchicalRequestId.md) support or interoperability with services that do not support it.

We strongly recommend every implementation to support [Hierarchical Request-Id](HierarchicalRequestId.md) wherever possible. If implementation do not support it, it still MUST ensure essential requirements are met:
* `Request-Id` uniquely identifies every HTTP request involved in operation processing and MUST be generated for every incoming and outgoing request

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this include parent ids? (since they are noted in above docs and also later here in the examples)

Copy link
Author

@lmolkova lmolkova Apr 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request-Id is generated for every outgoing request and becomes parent for incoming request on downstream service.

Message Component Name Context
user starts request to service-a user
incoming request service-a Request-Id=abc; Parent-Request-Id=; Id=123
request to service-b service-a Request-Id=def; Parent-Request-Id=abc, Id=123
incoming request service-b Request-Id=ghi; Parent-Request-Id=def; Id=123
response service-b Request-Id=ghi; Parent-Request-Id=def; Id=123
response from service-b service-a Request-Id=def; Parent-Request-Id=abc; Id=123
response service-a Request-Id=abc; Parent-Request-Id=; Id=123
response from service-a user

So, logging system logs Request-Id from the incoming request headers as Parent-Request-Id and logs new unique Request-Id.

- [HTTP Header encoding RFC5987](https://tools.ietf.org/html/rfc5987)
- De-facto overall HTTP headers size is limited to several kilobytes (depending on a web server)

# Industry standards

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some definition of standard, there's a context propagation format you can see (if for nothing else another example) w3c/trace-context#1 There are other issues available on the same repo as well, including many parties who are exploring it w3c/trace-context#4 cc @bogdandrutu

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! this is really good to know. So this is standardization of Zipkin format, correct?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmolkova it's part of a broader proposal that a couple parties including Zipkin and Google are working on for distributed trace propagation and submission. We're planning on reaching out more widely once we have more documentation and code samples available, but I'd be glad to lay out the plan and longer-term goals if you're interested.

Implementations SHOULD support hierarchical structure for the Request-Id, described in [Hierarchical Request-Id document](HierarchicalRequestId.md).
See [Flat Request-Id](FlatRequestId.md) for non-hierarchical Request-Id requirements.

## Correlation-Context

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is created once then passed read-only afterwards. Interesting.. I think what is interesting in general is that your formats seem to include, but are not limited to tracing

cc @nbmorgan who recently asked about an agnostic place to do exactly what's described here

Copy link
Author

@lmolkova lmolkova Apr 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly meant to control logging: service may pass sampling rate/flags there that could be used by downstream services. It's also valid to propagate other tracing-related information: feature flags, some Ids that could be useful to group traces by.

We discourage anyone from using it for non-tracing purposes. Headers have significant size limitations, parsing, propagating and just storing Correlation-Context in memory involved performance overhead.
Service in general does not know which web-server is used somewhere downstream and exceeding header size limit may create a problem that is hard to find.

Also, depending on the implementation, service may have little control or no control at all over where Correlation-Context is propagated to: it could be external systems, cloud storage, etc. Even if service trust immediate callee, callee will propagate it further. So it also creates security concerns.

And finally, there are better ways and cleaner designs to pass non-tracing data between the services :)

@codefromthecrypt
Copy link

codefromthecrypt commented Apr 27, 2017 via email

@codefromthecrypt
Copy link

codefromthecrypt commented Apr 27, 2017 via email

@codefromthecrypt
Copy link

codefromthecrypt commented Apr 27, 2017 via email

@lmolkova
Copy link
Author

@cwe1ss Thanks for sharing it!

I believe we already touched this topic in #16393: There is no worldwide standard (at least for now), that's why we are creating this one. And in any case we want to support interoperability with other tracing systems in .NET.

Regarding the plans for the standard: we are considering publishing it as Informational RFC first in a next few months.
We will follow general .NET rules to deal with breaking changes if needed.

@karelz karelz added area-System.Diagnostics.Tracing documentation Documentation bug or enhancement, does not impact product or test code labels Apr 27, 2017
@karelz
Copy link
Member

karelz commented Apr 27, 2017

FYI: @davidsh @CIPop @Priya91 @geoffkizer

@tedsuo
Copy link

tedsuo commented Apr 27, 2017

Coming here from OpenTracing. Glad to see .Net and MSFT interested in this.

What types of interoperability are expected to be solved by this standard? I would like to have more context so I can attune my feedback correctly. For example, is this expected to enable blind interoperability between tracing systems? What does interoperability mean in this case?

While these headers look like a reasonable implementation for an "indexed log" style of tracer, hierarchical request ids and uncompressed baggage have severe limitations. How are tracing systems with more effective implementations expected to interact with this standard?

@tedsuo
Copy link

tedsuo commented Apr 28, 2017

FYI: @JonathanMace could you see Tracing Plane working with these headers? Or are they separate?

@lmolkova
Copy link
Author

@tedsuo @adriancole and everyone else,

We really appreciate your feedback and want to address any concerns you have. Unfortunately I'm quite busy with some other things now. I'll get back to you in the next few days, sorry for the delay.

@danmoseley
Copy link
Member

@lmolkova is this part of the 2.0 release, or not part of that?

@lmolkova
Copy link
Author

lmolkova commented May 5, 2017

@danmosemsft
it is a part of 2.0, however it is pure documentation update and could be merged after 2.0 freeze.

@tedsuo @adriancole
Sorry, it takes longer than I expected to answer your questions. I did not forget about you.

@codefromthecrypt
Copy link

codefromthecrypt commented May 5, 2017 via email

@karelz karelz removed the Post-ZBB label May 18, 2017
@karelz karelz modified the milestone: 2.1.0 May 19, 2017
@karelz
Copy link
Member

karelz commented May 31, 2017

@lmolkova any update on this? Can you please move the PR forward towards merge?
Do we need the documentation in 2.0 branch, or are docs in master (2.1) sufficient?

@lmolkova
Copy link
Author

lmolkova commented Jun 2, 2017

@tedsuo,

This standard does not intend to solve interoperability issues. We made it quite flexible, so we allow almost ANY format on the incoming Request-Id.
Tracing system that uses implementation of this protocol may have compatibility layer that supports multiple schemes for incoming requests thus supporting some schemes (e.g. Zipkin). If all tracing systems follow the same approach, there will be no need for compatibility on outgoing layer; it would be inefficient to provide the same Ids under different header names, and formats may not match.
In presence of widely adopted standard, there will be no need in compatibility on incoming layer as well.

This protocol and it's implementation is the first iteration and the minimum required to correlate telemetry. We may change things based on your and other community feedback. We will also review TraceContext and the proposal Zipkin and Google are working on (as @mtwo mentioned) and see what we can do to interoperate with it or support it.

@lmolkova
Copy link
Author

lmolkova commented Jun 2, 2017

@adriancole

I added more details on 'parent' Id logging and Correlation-Context limitations/misuse, thanks!

Would be great to discuss it on workshop, unfortunately I will be on the trip during it and probably without internet access. Will try to attend remote sessions when it will be possible. Thanks for the information anyway!

@lmolkova
Copy link
Author

lmolkova commented Jun 2, 2017

@karelz

any update on this? Can you please move the PR forward towards merge?
Do we need the documentation in 2.0 branch, or are docs in master (2.1) sufficient?

Finally moving it towards merge. I'm fine with having docs in master (2.1) only.

@karelz
Copy link
Member

karelz commented Jun 11, 2017

Ping?

@tedsuo
Copy link

tedsuo commented Jun 13, 2017

@lmolkova thanks for your response. Personally, I like many things about this but I'm not totally sold on the HierarchicalRequestId approach. It seems like you need to either have short ids, few hops between services, or hit the header limit. In practice, most systems will work by indexing on the trace id, request id and parent id, so the rest of the hierarchy will probably not be necessary in practice. Another issue is accounting for joins, when codepaths have multiple parents. I don't need answers for any of these questions, just food for thought when you work on future iterations.

Cheers,
-T

@karelz
Copy link
Member

karelz commented Jun 29, 2017

@lmolkova ping? If you're not ready yet, let's close the PR and reopen it when you are ready to finish it off ...

@SergeyKanzhelev
Copy link

@karelz liudmila is on vacation. Can this PR wait till after the weekend?

@karelz
Copy link
Member

karelz commented Jun 29, 2017

Sure. If you won't have time to push it further in a week or two, please close it and reopen when you are ready. Thanks!

Hierarchical Request-Id look like `|<root-id>.<local-id1>.<local-id2>.` (e.g. `|9e74f0e5-efc4-41b5-86d1-3524a43bd891.bcec871c_1.`) and holds all information needed to trace whole operation and particular request.
Root-id serves as common identifier for all requests involved in operation processing and local-ids represent internal activities (and requests) done within scope of this operation.

[CorrelationVector](https://osgwiki.com/wiki/CorrelationVector) is valid hierarchical Request-Id, except it does not start with "|". Implementation SHOULD allow other schemes for incoming request identifiers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this link is available from outside of MS. Please remove or copy the content publicly.

CC: @jacpull

@lmolkova lmolkova force-pushed the lmolkova/HttpCorrelationProtocol branch from bab520c to 6569f0a Compare July 4, 2017 00:58
@lmolkova
Copy link
Author

lmolkova commented Jul 4, 2017

@SergeyKanzhelev private link is removed

@karelz
In this PR we have received a lot of useful feedback that we should use for the next iterations. For now, I addressed everything that was possible and this protocol is used by System.Diagnostics.Activity, System.Net.HttpClient and ASP.NET Core Hosting layer.
So I believe the review is completed and PR should be merged.

@karelz
Copy link
Member

karelz commented Jul 4, 2017

@lmolkova in that case please work with area owners to finalize the review and squash/merge the PR. Thanks!

@lmolkova
Copy link
Author

lmolkova commented Jul 5, 2017

@vancem are you ok to merge it?

@vancem vancem merged commit e1c2b4d into dotnet:master Jul 6, 2017
@lmolkova lmolkova deleted the lmolkova/HttpCorrelationProtocol branch October 11, 2018 01:37
macrogreg pushed a commit to open-telemetry/opentelemetry-dotnet-instrumentation that referenced this pull request Sep 24, 2020
…relationProtocol

Add HTTP Correlation protocol

Commit migrated from dotnet/corefx@e1c2b4d
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…relationProtocol

Add HTTP Correlation protocol

Commit migrated from dotnet/corefx@e1c2b4d
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Diagnostics.Tracing documentation Documentation bug or enhancement, does not impact product or test code
Projects
None yet
Development

Successfully merging this pull request may close these issues.