Controlling context propagation boundary #1633

pmm-sumo · 2021-04-20T17:55:37Z

What are you trying to achieve?

There are several use cases where two organisations employing OpenTelemetry communicate with each other, such as: any 3rd party API calls, synthetics monitoring, webhooks, etc. In some cases (e.g. synthetics monitoring) it is anticipated that the called organisation records the identifier of the originating call and samples the trace (if OpenTelemetry is employed) so it could be later used for diagnosing purposes.

When standard trace context propagation approach is being employed, this leads to several side effects, which might or might not be anticipated:

spans recorded on both of them share the same trace ID,
each of the organisations has access only to spans recorded on their side; which also means that callee (Organization 2) is not having access to the root span,
the originating organisation sampling decision will be passed,
baggage might be unwillingly shared between the organisations, which might cause leaking sensitive information


 Organization 1         Organization 2
    (caller)               (callee)
    
 ┌───────────┐          ┌────API────┐
 │           │          │           │
 │ Service A ├────?────►│ Service B │
 │           │          │           │
 └─────┬─────┘          └─────┬─────┘
       │                      │
     Spans                  Spans
       │                      │
 ┌─────▼─────┐          ┌─────▼─────┐
 │ Backend 1 │          │ Backend 2 │
 └───────────┘          └───────────┘

One way to approach it is simple to leave the status quo. I.e. assume that it's fine to deal with the listed side effects and consider it should be a duty of the organisation making the call to external service that the baggage must not contain sensitive information.

However, perhaps the Tracing API/SDK could be extended to handle such case gracefully and e.g. explicitly filter baggage context as well as provide means for controlling how the trace context should be propagated when making a call to an external organisation. There are several approaches how to handle this, to start with some:

Drop trace context in such case and instead identify the caller somehow via baggage. This information could be incorporated on the callee side and persisted, e.g. via storing as a span attribute.
Drop trace context and generate a new trace ID for the callee (essentially, include a new context without parent span ID). That way, the caller would have the information that allows to identify the trace on callee side.
Leverage tracestate (or trace-flags) and include additional information there. Perhaps a special field could describe that the request is passing the boundary which requires starting a new trace and using linking capability to store information about the relationship with the callee.
Define a boundary on the callee side (e.g. via API), so whenever context is being extracted there, instead of continuing the trace, start a new one and store context as a linked span
...

Additional context.

Perhaps this should followup with OTEPS describing a proposal

Related issues: #1255, #1337, #867, #510

The text was updated successfully, but these errors were encountered:

eyjohn · 2021-04-21T08:01:53Z

Thanks for posting this Przemek,

One thing to consider that I'd like to add is data sensitivity, in some cases, I do not wish third parties who may continue the trace to be aware of the true origin on the request and related request by being able to correlate either on a traceId/spanId/TraceState basis. In fact, we may want to obscure the traceIds by generating a new one at the boundary point or having a transformation scheme. In any case, I think this point should be called out during the formation of any recommendations/semantic conventions regarding cross-organisation tracing.

Oberon00 · 2021-04-21T08:24:00Z

Define a boundary on the callee side (e.g. via API), so whenever context is being extracted there, instead of continuing the trace, start a new one

For this approach, see also #1188 "Support restarting the trace with a different trace ID"

(quite a lot related issues here)

Oberon00 · 2021-04-21T08:26:55Z

Also for

each of the organisations has access only to spans recorded on their side; which also means that callee (Organization 2) is not having access to the root span,

Not only the root span is a problem but any parent span can be. I consider this a design problem of W3C traceparent, but it can be worked around by storing the parent span ID in the tracestate under a unique key, see #366 (comment)

reyang · 2021-04-23T15:29:12Z

Discussed during the 04/23 triage meeting:

we want to see if there is a relatively high demand for this
if there is a high demand, we want to see if this is something that should be addressed by the W3C TraceContext specification, rather than having OpenTelemetrying providing a workaround / additional layer.
the same question might apply to other propagation protocols (e.g. Zipkin B3).

yurishkuro · 2021-04-23T16:43:07Z

we want to see if this is something that should be addressed by the W3C TraceContext specification

I doubt it. W3C TC defines the format, not the implementation. It calls out that baggage is meant to propagate only within trust boundaries (https://github.com/w3c/baggage/blob/master/baggage/privacy.md):

The main purpose of this header is to provide additional system-specific information to other systems within the same trust-boundary. The baggage header may contain any opaque value in any of the keys. As such, the baggage header can contain user-identifiable data. Systems MUST ensure that the baggage header does not leak beyond defined trust boundaries and they MUST ensure that the channel that is used to transport potentially user-identifiable data is secured.

This is not to say that OTEL SDK is the place to enforce trust boundaries. It's certainly possible to implement in the SDK but since it would require special configuration it is less reliable than other solutions, like not allowing your backend services to talk to external services without going through some DMZ which can sanitize the requests.

pmm-sumo · 2021-04-23T17:06:47Z

we want to see if this is something that should be addressed by the W3C TraceContext specification

I doubt it. W3C TC defines the format, not the implementation. It calls out that baggage is meant to propagate only within trust boundaries (https://github.com/w3c/baggage/blob/master/baggage/privacy.md):

This is more in context of open-telemetry/semantic-conventions#1230 , but I think W3C TraceContext could be extended with one of the following (among other solutions):

ability to set parent-id to zeroes, which would identify that a new trace-id should start (and the caller would know what will be this trace ID) - that would solve the problem of having parentless traces in the example I posted above
defining a new flag (e.g. external) which could identify a hint that the context passes a boundary - that would allow to handle it accordingly on the other side - e.g. start a new trace and link the one provided in traceparent rather than continuing it; this could actually work in conjuction with Support restarting the trace with a different trace ID #1188 (i.e. such behavior could be controlled via a hint, client library config or both)

The main purpose of this header is to provide additional system-specific information to other systems within the same trust-boundary. The baggage header may contain any opaque value in any of the keys. As such, the baggage header can contain user-identifiable data. Systems MUST ensure that the baggage header does not leak beyond defined trust boundaries and they MUST ensure that the channel that is used to transport potentially user-identifiable data is secured.

This is not to say that OTEL SDK is the place to enforce trust boundaries. It's certainly possible to implement in the SDK but since it would require special configuration it is less reliable than other solutions, like not allowing your backend services to talk to external services without going through some DMZ which can sanitize the requests.

I think that maybe the risk of leaking the baggage should be emphasized somehow in the docs. Also, I think that at least auto-instrumentation based libraries might want to eventually provide some means of controlling it

brunobat · 2023-05-08T08:49:23Z

We have an issue on the Java instrumentation project that relates to this: open-telemetry/opentelemetry-java-instrumentation#8038.
This has concerning security implications. Could you please increase the priority of this issue?

annettejanewilson · 2024-05-03T11:00:00Z

We are interested in this for baggage. At present we see the following options for preventing baggage from propagating to third parties:

Explicitly clear the baggage at every point we talk to a third party in every service. This seems very painful, difficult to automate and easy to miss some usages.
Set up every communication with a third party to go through Envoy or something similar. We already do this for internal calls in an Istio service mesh, but it's a serious undertaking to cover all calls to third parties too. My understanding is that this requires changing all outgoing calls to use HTTP so that Envoy can get at the headers before making an HTTPS call to the external service, and could be a pain for usages wrapped in libraries.
In our OTel configuration, specify an allowlist for destination authorities that we will propagate baggage to. E.g. *.skyscanner.io:* might be a wildcard matching our internal destinations inside the service mesh.

The first and second options seem a lot less desirable to us than the third option, were it to be supported. At present we don't believe this is possible. Propagators don't have access to anything that would allow them to make a decision about whether or not to propagate. (Whether that's an authority directly or some kind of protocol-agnostic categorisation determined by configuration.)

I think we would be happy with something as blunt as either propagating everything or nothing for a particular authority. As far as I'm aware we're only using HTTP(S) and gRPC as protocols - I'm not sure what this would look like for other protocols, especially if they're going to have authorities that don't look like hostname:port.

From what I understand we probably want the same behaviour for tracing as for baggage, it's just that baggage could have a wider array of kinds of data.

danielgblanco · 2024-05-03T11:51:56Z

I see the Tracing API/SDK was mentioned here, but I believe this to be more of an issue of the Propagators API if I understand the issue correctly. Currently, the Propagators API has no knowledge of system boundaries and has no way to allow propagators to receive any information about propagation boundaries from their callees. I think tackling this would require propagators to be configured with information regarding what "destinations" are safe or not to inject context to, and for callers to provide that information when calling inject.

I commented open-telemetry/oteps#255 because that could be an addition if conditional propagation of Baggage items is required depending on sensitivity settings.

annettejanewilson · 2024-09-11T08:44:53Z

I'm wondering if we could have something like this: a mechanism to identify different classes of destination, and the ability to enable or disable propagators separately for each class.

Instrumentations could perhaps define their own destination classes, but recognise shared classes like "host_port". Definition might look something like this, with the first matching class in the list applying for a given communication.

destination_classes:
  - class_name: skyscanner_internal
    destinations: 
      - host_port: *.skyscanner.io:*
      - host_port: *.skyscanner.net:*
      - aws_sqs: arn:aws:sqs:*:111122223333:*
  - class_name: external
    destinations:
      - everything: true

Then when applying propagators:

propagators:
  - name: tracecontext
    applies_to: [skyscanner_internal]
  - name: baggage
    applies_to: [skyscanner_internal]

Something like this would be greatly helpful to us. Not only are we concerned about leaking information, we've also found some third parties to be quite sensitive to unexpected HTTP headers. (This is more of a problem with the ottrace propagator which it seems we're still running, and which can create multiple headers with differing names, but still, we would like the ability to be more disciplined in what we send to third parties.)

However, maybe this doesn't mesh 100% with the OP's needs? We don't have a particular need to send modified tracecontext or baggage to particular destinations, we just want to be able to complete suppress them for destinations other than those we consider "internal".

stevesea · 2024-09-18T01:28:45Z

Another aggravating nuance -- baggage can contain multiple entries, some may be sensitive but not others.

In addition to configuring "should I propagate propagate baggage to <URI>?", it might be nice to identify which entries are sensitive.
Maybe a similar configuration to the baggage span processor? https://github.com/open-telemetry/opentelemetry-java-contrib/tree/main/baggage-processor

To clarify our use case a little... in Baggage, we pass around a tenancy context and other flags. All entries are namespaced under a corp.* prefix (where corp is our stock ticker).

baggage for a requests might look like:
baggage: corp.tenancy=mydomain.com/prod,corp.customerid=123456
or, when we're doing testing in production:
baggage: corp.tenancy=mydomain.com/sandbox;featflag=<value>,corp.customerid=123456

On incoming requests to our edge, we may want to propagate some entries but not others.
On outgoing requests, there are some entries we'd always want to discard

stevesea · 2024-09-25T00:02:00Z

Similar to other commenters, we also want to configure propagation in certain circumstances

outgoing requests to 3rd parties
internal requests to certain backends.
- for example, the Hashicorp Vault agent sidecar considers all HTTP headers in its cache key . When an auto-instrumented application sends requests to the vault agent, the presence of the traceparent/tracestate/baggage headers means every request is 'unique' and the vault proxy considers it a cache miss. In our case, it'd be nice if we could configure the Propagators to inject headers on calls to localhost:8200.

pmm-sumo added the spec:context Related to the specification/context directory label Apr 20, 2021

pmm-sumo mentioned this issue Jul 9, 2024

Define a standard way to identify synthetics request open-telemetry/semantic-conventions#1230

Open

reyang added the area:api Cross language API specification issue label Apr 23, 2021

pmm-sumo mentioned this issue Jun 6, 2021

REQUEST: New membership for pmm-sumo open-telemetry/community#749

Closed

6 tasks

spencerwilson mentioned this issue Mar 29, 2022

External API calls baggage behaviour? #2127

Open

trask mentioned this issue May 5, 2023

Allow restricting propagation of OpenTelemetry trace headers open-telemetry/opentelemetry-java-instrumentation#8038

Open

mateuszrzeszutek mentioned this issue Jun 22, 2023

Allow overriding propagation of tracing headers open-telemetry/opentelemetry-java-instrumentation#8786

Closed

vmarchaud mentioned this issue Aug 10, 2023

Avoid propagating baggage across unsafe boundaries open-telemetry/opentelemetry-js#4055

Open

svrnm mentioned this issue Nov 14, 2023

Document that baggage is sent to external APIs by automatic instrumetation open-telemetry/opentelemetry.io#3530

Merged

robotadam mentioned this issue Mar 22, 2024

Control baggage propagation in automatic instrumentation libraries #3957

Open

tedsuo added the triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted label Apr 5, 2024

danielgblanco mentioned this issue May 3, 2024

Sensitive Data Redaction open-telemetry/oteps#255

Draft

danielgblanco added triage:accepted:needs-sponsor Ready to be implemented, but does not yet have a specification sponsor and removed triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted labels Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Controlling context propagation boundary #1633

Controlling context propagation boundary #1633

pmm-sumo commented Apr 20, 2021 •

edited

Loading

eyjohn commented Apr 21, 2021

Oberon00 commented Apr 21, 2021 •

edited

Loading

Oberon00 commented Apr 21, 2021

reyang commented Apr 23, 2021

yurishkuro commented Apr 23, 2021

pmm-sumo commented Apr 23, 2021 •

edited

Loading

brunobat commented May 8, 2023

annettejanewilson commented May 3, 2024

danielgblanco commented May 3, 2024

annettejanewilson commented Sep 11, 2024

stevesea commented Sep 18, 2024 •

edited

Loading

stevesea commented Sep 25, 2024

Controlling context propagation boundary #1633

Controlling context propagation boundary #1633

Comments

pmm-sumo commented Apr 20, 2021 • edited Loading

eyjohn commented Apr 21, 2021

Oberon00 commented Apr 21, 2021 • edited Loading

Oberon00 commented Apr 21, 2021

reyang commented Apr 23, 2021

yurishkuro commented Apr 23, 2021

pmm-sumo commented Apr 23, 2021 • edited Loading

brunobat commented May 8, 2023

annettejanewilson commented May 3, 2024

danielgblanco commented May 3, 2024

annettejanewilson commented Sep 11, 2024

stevesea commented Sep 18, 2024 • edited Loading

stevesea commented Sep 25, 2024

pmm-sumo commented Apr 20, 2021 •

edited

Loading

Oberon00 commented Apr 21, 2021 •

edited

Loading

pmm-sumo commented Apr 23, 2021 •

edited

Loading

stevesea commented Sep 18, 2024 •

edited

Loading