Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controlling context propagation boundary #1633

Open
pmm-sumo opened this issue Apr 20, 2021 · 12 comments
Open

Controlling context propagation boundary #1633

pmm-sumo opened this issue Apr 20, 2021 · 12 comments
Labels
area:api Cross language API specification issue spec:context Related to the specification/context directory triage:accepted:needs-sponsor Ready to be implemented, but does not yet have a specification sponsor

Comments

@pmm-sumo
Copy link
Contributor

pmm-sumo commented Apr 20, 2021

What are you trying to achieve?

There are several use cases where two organisations employing OpenTelemetry communicate with each other, such as: any 3rd party API calls, synthetics monitoring, webhooks, etc. In some cases (e.g. synthetics monitoring) it is anticipated that the called organisation records the identifier of the originating call and samples the trace (if OpenTelemetry is employed) so it could be later used for diagnosing purposes.

When standard trace context propagation approach is being employed, this leads to several side effects, which might or might not be anticipated:

  • spans recorded on both of them share the same trace ID,
  • each of the organisations has access only to spans recorded on their side; which also means that callee (Organization 2) is not having access to the root span,
  • the originating organisation sampling decision will be passed,
  • baggage might be unwillingly shared between the organisations, which might cause leaking sensitive information

 Organization 1         Organization 2
    (caller)               (callee)
    
 ┌───────────┐          ┌────API────┐
 │           │          │           │
 │ Service A ├────?────►│ Service B │
 │           │          │           │
 └─────┬─────┘          └─────┬─────┘
       │                      │
     Spans                  Spans
       │                      │
 ┌─────▼─────┐          ┌─────▼─────┐
 │ Backend 1 │          │ Backend 2 │
 └───────────┘          └───────────┘

One way to approach it is simple to leave the status quo. I.e. assume that it's fine to deal with the listed side effects and consider it should be a duty of the organisation making the call to external service that the baggage must not contain sensitive information.

However, perhaps the Tracing API/SDK could be extended to handle such case gracefully and e.g. explicitly filter baggage context as well as provide means for controlling how the trace context should be propagated when making a call to an external organisation. There are several approaches how to handle this, to start with some:

  1. Drop trace context in such case and instead identify the caller somehow via baggage. This information could be incorporated on the callee side and persisted, e.g. via storing as a span attribute.
  2. Drop trace context and generate a new trace ID for the callee (essentially, include a new context without parent span ID). That way, the caller would have the information that allows to identify the trace on callee side.
  3. Leverage tracestate (or trace-flags) and include additional information there. Perhaps a special field could describe that the request is passing the boundary which requires starting a new trace and using linking capability to store information about the relationship with the callee.
  4. Define a boundary on the callee side (e.g. via API), so whenever context is being extracted there, instead of continuing the trace, start a new one and store context as a linked span
  5. ...

Additional context.

Perhaps this should followup with OTEPS describing a proposal

Related issues: #1255, #1337, #867, #510

@eyjohn
Copy link

eyjohn commented Apr 21, 2021

Thanks for posting this Przemek,

One thing to consider that I'd like to add is data sensitivity, in some cases, I do not wish third parties who may continue the trace to be aware of the true origin on the request and related request by being able to correlate either on a traceId/spanId/TraceState basis. In fact, we may want to obscure the traceIds by generating a new one at the boundary point or having a transformation scheme. In any case, I think this point should be called out during the formation of any recommendations/semantic conventions regarding cross-organisation tracing.

@Oberon00
Copy link
Member

Oberon00 commented Apr 21, 2021

  1. Define a boundary on the callee side (e.g. via API), so whenever context is being extracted there, instead of continuing the trace, start a new one

For this approach, see also #1188 "Support restarting the trace with a different trace ID"

(quite a lot related issues here)

@Oberon00
Copy link
Member

Also for

  • each of the organisations has access only to spans recorded on their side; which also means that callee (Organization 2) is not having access to the root span,

Not only the root span is a problem but any parent span can be. I consider this a design problem of W3C traceparent, but it can be worked around by storing the parent span ID in the tracestate under a unique key, see #366 (comment)

@reyang
Copy link
Member

reyang commented Apr 23, 2021

Discussed during the 04/23 triage meeting:

  1. we want to see if there is a relatively high demand for this
  2. if there is a high demand, we want to see if this is something that should be addressed by the W3C TraceContext specification, rather than having OpenTelemetrying providing a workaround / additional layer.
  3. the same question might apply to other propagation protocols (e.g. Zipkin B3).

@reyang reyang added the area:api Cross language API specification issue label Apr 23, 2021
@yurishkuro
Copy link
Member

we want to see if this is something that should be addressed by the W3C TraceContext specification

I doubt it. W3C TC defines the format, not the implementation. It calls out that baggage is meant to propagate only within trust boundaries (https://github.com/w3c/baggage/blob/master/baggage/privacy.md):

The main purpose of this header is to provide additional system-specific information to other systems within the same trust-boundary. The baggage header may contain any opaque value in any of the keys. As such, the baggage header can contain user-identifiable data. Systems MUST ensure that the baggage header does not leak beyond defined trust boundaries and they MUST ensure that the channel that is used to transport potentially user-identifiable data is secured.

This is not to say that OTEL SDK is the place to enforce trust boundaries. It's certainly possible to implement in the SDK but since it would require special configuration it is less reliable than other solutions, like not allowing your backend services to talk to external services without going through some DMZ which can sanitize the requests.

@pmm-sumo
Copy link
Contributor Author

pmm-sumo commented Apr 23, 2021

we want to see if this is something that should be addressed by the W3C TraceContext specification

I doubt it. W3C TC defines the format, not the implementation. It calls out that baggage is meant to propagate only within trust boundaries (https://github.com/w3c/baggage/blob/master/baggage/privacy.md):

This is more in context of open-telemetry/semantic-conventions#1230 , but I think W3C TraceContext could be extended with one of the following (among other solutions):

  • ability to set parent-id to zeroes, which would identify that a new trace-id should start (and the caller would know what will be this trace ID) - that would solve the problem of having parentless traces in the example I posted above
  • defining a new flag (e.g. external) which could identify a hint that the context passes a boundary - that would allow to handle it accordingly on the other side - e.g. start a new trace and link the one provided in traceparent rather than continuing it; this could actually work in conjuction with Support restarting the trace with a different trace ID #1188 (i.e. such behavior could be controlled via a hint, client library config or both)

The main purpose of this header is to provide additional system-specific information to other systems within the same trust-boundary. The baggage header may contain any opaque value in any of the keys. As such, the baggage header can contain user-identifiable data. Systems MUST ensure that the baggage header does not leak beyond defined trust boundaries and they MUST ensure that the channel that is used to transport potentially user-identifiable data is secured.

This is not to say that OTEL SDK is the place to enforce trust boundaries. It's certainly possible to implement in the SDK but since it would require special configuration it is less reliable than other solutions, like not allowing your backend services to talk to external services without going through some DMZ which can sanitize the requests.

I think that maybe the risk of leaking the baggage should be emphasized somehow in the docs. Also, I think that at least auto-instrumentation based libraries might want to eventually provide some means of controlling it

@brunobat
Copy link
Contributor

brunobat commented May 8, 2023

We have an issue on the Java instrumentation project that relates to this: open-telemetry/opentelemetry-java-instrumentation#8038.
This has concerning security implications. Could you please increase the priority of this issue?

@annettejanewilson
Copy link

We are interested in this for baggage. At present we see the following options for preventing baggage from propagating to third parties:

  1. Explicitly clear the baggage at every point we talk to a third party in every service. This seems very painful, difficult to automate and easy to miss some usages.
  2. Set up every communication with a third party to go through Envoy or something similar. We already do this for internal calls in an Istio service mesh, but it's a serious undertaking to cover all calls to third parties too. My understanding is that this requires changing all outgoing calls to use HTTP so that Envoy can get at the headers before making an HTTPS call to the external service, and could be a pain for usages wrapped in libraries.
  3. In our OTel configuration, specify an allowlist for destination authorities that we will propagate baggage to. E.g. *.skyscanner.io:* might be a wildcard matching our internal destinations inside the service mesh.

The first and second options seem a lot less desirable to us than the third option, were it to be supported. At present we don't believe this is possible. Propagators don't have access to anything that would allow them to make a decision about whether or not to propagate. (Whether that's an authority directly or some kind of protocol-agnostic categorisation determined by configuration.)

I think we would be happy with something as blunt as either propagating everything or nothing for a particular authority. As far as I'm aware we're only using HTTP(S) and gRPC as protocols - I'm not sure what this would look like for other protocols, especially if they're going to have authorities that don't look like hostname:port.

From what I understand we probably want the same behaviour for tracing as for baggage, it's just that baggage could have a wider array of kinds of data.

@danielgblanco
Copy link
Contributor

I see the Tracing API/SDK was mentioned here, but I believe this to be more of an issue of the Propagators API if I understand the issue correctly. Currently, the Propagators API has no knowledge of system boundaries and has no way to allow propagators to receive any information about propagation boundaries from their callees. I think tackling this would require propagators to be configured with information regarding what "destinations" are safe or not to inject context to, and for callers to provide that information when calling inject.

I commented open-telemetry/oteps#255 because that could be an addition if conditional propagation of Baggage items is required depending on sensitivity settings.

@annettejanewilson
Copy link

I'm wondering if we could have something like this: a mechanism to identify different classes of destination, and the ability to enable or disable propagators separately for each class.

Instrumentations could perhaps define their own destination classes, but recognise shared classes like "host_port". Definition might look something like this, with the first matching class in the list applying for a given communication.

destination_classes:
  - class_name: skyscanner_internal
    destinations: 
      - host_port: *.skyscanner.io:*
      - host_port: *.skyscanner.net:*
      - aws_sqs: arn:aws:sqs:*:111122223333:*
  - class_name: external
    destinations:
      - everything: true

Then when applying propagators:

propagators:
  - name: tracecontext
    applies_to: [skyscanner_internal]
  - name: baggage
    applies_to: [skyscanner_internal]

Something like this would be greatly helpful to us. Not only are we concerned about leaking information, we've also found some third parties to be quite sensitive to unexpected HTTP headers. (This is more of a problem with the ottrace propagator which it seems we're still running, and which can create multiple headers with differing names, but still, we would like the ability to be more disciplined in what we send to third parties.)

However, maybe this doesn't mesh 100% with the OP's needs? We don't have a particular need to send modified tracecontext or baggage to particular destinations, we just want to be able to complete suppress them for destinations other than those we consider "internal".

@danielgblanco danielgblanco added triage:accepted:needs-sponsor Ready to be implemented, but does not yet have a specification sponsor and removed triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted labels Sep 12, 2024
@stevesea
Copy link

stevesea commented Sep 18, 2024

Another aggravating nuance -- baggage can contain multiple entries, some may be sensitive but not others.

In addition to configuring "should I propagate propagate baggage to <URI>?", it might be nice to identify which entries are sensitive.
Maybe a similar configuration to the baggage span processor? https://github.com/open-telemetry/opentelemetry-java-contrib/tree/main/baggage-processor

To clarify our use case a little... in Baggage, we pass around a tenancy context and other flags. All entries are namespaced under a corp.* prefix (where corp is our stock ticker).

baggage for a requests might look like:
baggage: corp.tenancy=mydomain.com/prod,corp.customerid=123456
or, when we're doing testing in production:
baggage: corp.tenancy=mydomain.com/sandbox;featflag=<value>,corp.customerid=123456

On incoming requests to our edge, we may want to propagate some entries but not others.
On outgoing requests, there are some entries we'd always want to discard

@stevesea
Copy link

Similar to other commenters, we also want to configure propagation in certain circumstances

  • outgoing requests to 3rd parties
  • internal requests to certain backends.
    • for example, the Hashicorp Vault agent sidecar considers all HTTP headers in its cache key . When an auto-instrumented application sends requests to the vault agent, the presence of the traceparent/tracestate/baggage headers means every request is 'unique' and the vault proxy considers it a cache miss. In our case, it'd be nice if we could configure the Propagators to inject headers on calls to localhost:8200.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api Cross language API specification issue spec:context Related to the specification/context directory triage:accepted:needs-sponsor Ready to be implemented, but does not yet have a specification sponsor
Projects
Status: Spec - Priority Backlog
Development

No branches or pull requests

10 participants