Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document that baggage is sent to external APIs by automatic instrumetation #3530

Merged
merged 4 commits into from
Nov 22, 2023

Conversation

swar8080
Copy link
Contributor

With the current wording, it might not be clear that baggage is sent to external APIs. Calling this out directly seems worth the extra words because of the risk of leaking sensitive data

@swar8080 swar8080 requested a review from a team November 13, 2023 02:27
@svrnm
Copy link
Member

svrnm commented Nov 13, 2023

@swar8080 that's new to me that automatic instrumentation is doing that, can you point me to an example where this is the case?

@swar8080
Copy link
Contributor Author

@svrnm i've tested this with http requests to localhost URLs with java auto-instrumentation. Also, i'm assuming the auto-instrumentation can't distinguish between my.internal.service.com and some.external.api.com when deciding to propagate the trace ID and baggage

Here's a sanity test where i've added baggage to a span making a HttpUrlConnection request:
image

@svrnm
Copy link
Member

svrnm commented Nov 13, 2023

hm... I start to understand what you are trying to say: you may (independent of automatic instrumentation or not) expose the baggage to a service outside of your network (I guess you could safe-guard yourself even in your code example by inspecting carrier, but the danger is always given)

@swar8080
Copy link
Contributor Author

Yep, and with automatic instrumentation this isn't something you opt-in to - Baggage and trace id is sent in all HTTP requests by default. The above sample code is actually from the java auto-instrumentation repo.

This is what it looks like with manual instrumentation: https://opentelemetry.io/docs/instrumentation/java/manual/#context-propagation

Feel free to change the documentation wording if you think it could be clearer

@cartermp
Copy link
Contributor

Hrm. I'm a little confused about why this clarification is needed? Autoinstrumentation, by design, propagates this information, which can be exported to another backend. I don't think there's anywhere we claim that baggage is a way to propagate internal-only information.

Is it the case that you and/or others may expect baggage wouldn't propagate?

FWIW I think the changes are harmless, I'm just worried that others may get an impression about how data propagates that isn't true.

@swar8080
Copy link
Contributor Author

@cartermp At least for me, the risk wasn't immediately obvious. Only when writing integration tests to verify baggage propagation to an internal service did I realize that third-party APIs would get the data as well.

It's implied with the current wording but seems too subtle given the consequences of leaking sensitive data:

potentially exposing
to anyone who inspects your network traffic. This is because it's stored in HTTP
headers alongside the current context

That assumes the reader understands context, propagation, and that auto-instrumentation will auto-propagate context.

I'm just worried that others may get an impression about how data propagates that isn't true.

Is this a concern about the wording or our understanding of how data actually propagates?

@svrnm
Copy link
Member

svrnm commented Nov 14, 2023

@cartermp At least for me, the risk wasn't immediately obvious. Only when writing integration tests to verify baggage propagation to an internal service did I realize that third-party APIs would get the data as well.

This may be true with your tracing data in general not only baggage, i.e. your service instrumented with OpenTelemetry may send trace context to a service outside your boundaries:

open-telemetry/opentelemetry-specification#1633

It's implied with the current wording but seems too subtle given the consequences of leaking sensitive data:

potentially exposing
to anyone who inspects your network traffic. This is because it's stored in HTTP
headers alongside the current context

That assumes the reader understands context, propagation, and that auto-instrumentation will auto-propagate context.

See my comment above, this is not only true for baggage, so this may be something worth commenting on but the place here is complicated, for @cartermp's worry that others may get an impression about how data propagates that isn't true: a big misconception about baggage is that it will be automatically set as span attributes and send to a backend, we try to clarify this in the document, but this is an issue that persists. The wording "and may be sent to external APIs by automatic instrumentation" will make people think "ah, the data is send to MY external API = observability backend"

So, to summarize: I am not against that change, I think we should phrase it differently, or even rethink that whole paragraph:

OTel Baggage should be used for data that you're okay with potentially exposing
to anyone who inspects your network traffic. This is because it's stored in HTTP
headers alongside the current context. If your relevant network traffic is
entirely within your own network, then this caveat may not apply.

@swar8080
Copy link
Contributor Author

@svrnm got it, so I tried changing the wording so it's clearer that baggage could be sent to unintended resources like third-party APIs. Also, being specific about baggage as an HTTP header hopefully help distinguish it from span attributes

@cartermp
Copy link
Contributor

@swar8080 thanks for the updates. I think this is pretty good. If you comment /fix:format then it'll run the formatter in CI to fix format issues.

@swar8080
Copy link
Contributor Author

/fix:format

@svrnm
Copy link
Member

svrnm commented Nov 22, 2023

/fix:format

that didn't work because patch-1 is a branch name that also exists on opentelemetry.io, we need to fix this manually. You can do this yourself by locally running npm run fix:format

@cartermp cartermp merged commit 5c7cfa6 into open-telemetry:main Nov 22, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants