Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NH-2313 Add basic TraceState handling and W3C trace context propagation #11

Merged
merged 78 commits into from
May 9, 2022

Conversation

tammy-baylis-swi
Copy link
Contributor

@tammy-baylis-swi tammy-baylis-swi commented Apr 6, 2022

Add basic TraceState handling and W3C trace context propagation

Summary of changes, with details described in below sections:

  1. Adds custom Propagator
  2. Updates custom Sampler
  3. Adds custom ResponsePropagator
  4. Adds helper classes
  5. Meets some Acceptance Criteria

Out of scope / What I'd much prefer put in a future PR:

  • Handling of invalid headers
  • Customizing tracing_mode, sample_rate, trigger_tracing_mode_disabled
  • Thorough testing of all Acceptance Criteria
  • Thorough testing of every unsigned vs signed TT behaviour in the TT spec
  • Conforming to Service Entry Internal spec because that's NH-11236

1. Add custom Propagator

SolarWindsPropagator is new. It must be used in composite after two existing OTel Python propagators, tracecontext and baggage. Currently this is configured on the customer side (i.e. the testbed) as OTEL_PROPAGATORS: tracecontext,baggage,solarwinds.

SolarWindsPropagator performs extract of incoming HTTP request headers into the OTel context so that they're available to the Sampler. This is how sampling decision can be made using x-trace-options and x-trace-options-signature if provided. Note that the OTel Python propagator for tracecontext already handles extraction and injection of traceparent and tracestate headers. extract is called by OTel Python after a request arrives at an instrumented service.

SolarWindsPropagator performs inject of our values into outgoing HTTP request headers. Specifically, we set up tracestate with the KV sw=<span_id>_<trace_flags>. This is how we propagate a SolarWinds sampling decision that was made by the Sampler for the last span in a service to the next service in a distributed trace. inject is called by OTel Python when an instrumented service is preparing to make a request. Note that OTel Python is equipped with a TraceState class that has predefined methods to add/update KVs.

2. Update custom Sampler

ParentBasedSwSampler replaces the ParentBasedAoSampler placeholder that Martin created. It extends OTel Python's ParentBased sampler to let OTel Python delegate decision-making depending on whether the future span***:

  1. is a root (e.g. Service A, span1 in below diagram)
  2. has a remote parent span that is sampled (e.g. Service B, span1)
  3. has a remote parent span that is not sampled (e.g. Service B, span1)

Copied from TraceState Handling:

Service A
  (sample?) span1
            span2
             \------------- RPC call ---------> Service B
            span3                                 (sample?) span1
                                                            span2

For future spans with local parent spans (e.g. ServiceB, span2), we use the OTel Python ParentBased defaults for ALWAYS_ON/ALWAYS_OFF.

***"Future spans?" OTel Python calls should_sample before creating a new span in an instrumented service. Note that unlike the Propagator that can be set up as a composite of many, there is only one Sampler for the custom-distro.

_SwSampler replaces the _AoSampler placeholder that Martin created and it used by ParentBasedSwSampler for the 3 cases above. When should_sample is called it makes a decision using SWIG'd liboboe. It also calculates tracestate and attributes based on that decision for custom SW behaviour:

  1. _SwSampler.calculate_liboboe_decision uses parent span information from parent_span_context and header values from parent_context as inputs for SWIG'd liboboe Context.getDecisions.
  2. _SwSampler.calculate_trace_state uses the liboboe decision outputs to create a new tracestate, or update the tracestate of the parent_span_context if it exists, for propagation.
  3. _SwSampler.calculate_attributes calculates sw.tracestate_parent_id and sw.w3c.tracestate that are used for span creation and event export to the backend, without being propagated.

3. Add custom ResponsePropagator

SolarWindsTraceResponsePropagator is new. It relies on what Cloud Native is currently calling an "experimental propagator that injects tracecontext into HTTP responses". Hopefully it becomes official and better-documented this year; then we won't have to customize entrypoints for individual OTel Python instrumentations of Flask, Django, ASGI, etc.

SolarWindsTraceResponsePropagator extends the OTel Python abstract class ResponsePropagator so we can customize the x-trace and x-trace-options-response headers that we inject into responses from instrumented services after incoming HTTP requests.

Special note: Like the Java custom-distro, x-trace-options-response depends on a tracestate piggyback mechanism where extracted headers are added to tracestate at should_sample then injected for context propagation. This is because by the time ResponsePropagator.inject is called, the span(s) for that instrumented service has already been created and the initial options header and liboboe decision output are no longer in the OTel context. Also like Java, switching delimiters between W3C-unfriendly [,, =] and the unlikely-to-be-used-meaningfully [...., ####] lets us store options in propagated tracestate to ensure that the correct x-trace-options-response is calculated for each trace and without adding a complex storage of these values to the distro. Thank you Eric for coming up with this.

4. Add helper classes

XTraceOptions digests incoming x-trace-options and x-trace-options-signature request headers into values usable by SWIG'd liboboe for decision-making.

W3CTransformer has commonly-used methods to convert integers from various sources to hex strings for decision-making and propagation.

5. Meets some Acceptance Criteria

Please see this page for test methods and results: https://swicloud.atlassian.net/wiki/spaces/NIT/pages/2886959329/2022-04-06+testing

@tammy-baylis-swi
Copy link
Contributor Author

Update: I've responded to all comments from the second wave, except for this one about environment variables. I'm working on it now. #11 (comment)

@tammy-baylis-swi
Copy link
Contributor Author

Update: I've responded to all comments from the second wave, except for this one about environment variables. I'm working on it now. #11 (comment)

This has now been addressed: #11 (comment)

Copy link
Contributor

@cheempz cheempz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the revisit @tammy-baylis-swi! LGTM :)

@tammy-baylis-swi tammy-baylis-swi merged commit 7652590 into main May 9, 2022
@tammy-baylis-swi tammy-baylis-swi deleted the add-custom-propagator branch May 9, 2022 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants