Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect validation of baggage values #2934

Closed
sk- opened this issue Sep 15, 2022 · 16 comments · Fixed by #3058
Closed

Incorrect validation of baggage values #2934

sk- opened this issue Sep 15, 2022 · 16 comments · Fixed by #3058
Assignees
Labels
bug Something isn't working

Comments

@sk-
Copy link
Contributor

sk- commented Sep 15, 2022

Describe your environment
python: 3.10.6
opentelemetry-api: 1.12.0
opentelemetry-sdk: 1.12.0

Steps to reproduce

from opentelemetry.baggage import _is_valid_pair
_is_valid_pair("sentry-transaction", "GET%20%2Fapi%2F%2Freport") # OK

from opentelemetry.baggage.propagation import W3CBaggagePropagator
W3CBaggagePropagator().extract({"baggage": "sentry-transaction=GET%20%2Fapi%2Freport"})
# Returns empty and raises warning Invalid baggage entry: `sentry-transaction=GET%20%2Fapi%2Freport`

What is the expected behavior?
Opentelemetry should be able to parse baggage values which are percent encoded, as that's indicated by the W3C spec

A value contains a string whose character encoding MUST be UTF-8 [Encoding]. Any characters outside of the baggage-octet range of characters MUST be percent-encoded. Characters which are not required to be percent-encoded MAY be percent-encoded.

What is the actual behavior?
No baggage values were extracted and the following warning was raised:

Invalid baggage entry: `sentry-transaction=GET%20%2Fapi%2Freport`

Additional context
The problem is happening here:

Note also that the indicated code path seems to be not be too optimized, as the call to set_baggage will validate both the key and values again. And additionally, for each baggage value a new context will be created. Maybe it'd be better to first collect all baggage values and set them only once in the context.

@sk- sk- added the bug Something isn't working label Sep 15, 2022
@sk-
Copy link
Contributor Author

sk- commented Sep 21, 2022

@srikanthccv @ocelotl what do you think?

@srikanthccv
Copy link
Member

This looks like a legitimate bug. I haven't gotten time to triage this. I will check.

@ocelotl
Copy link
Contributor

ocelotl commented Sep 26, 2022

Checking...

@ocelotl
Copy link
Contributor

ocelotl commented Sep 26, 2022

Ok, I think this is what's happening here:

The definition for the baggage value says:

value                  =  *baggage-octet
baggage-octet          =  %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                          ; US-ASCII characters excluding CTLs,
                          ; whitespace, DQUOTE, comma, semicolon,
                          ; and backslash

That is why our regex value is this:

_VALUE_FORMAT = r"[\x21\x23-\x2b\x2d-\x3a\x3c-\x5b\x5d-\x7e]*"

That regex does not allow space characters, the reported string has one in %20. The spec for value says:

Leading and trailing whitespaces (OWS) are allowed and are not considered to be a part of the value.

I think that means we should first strip any leading and trailing whitespace characters from the value string, then apply the regex (we don't do that so I think we do have a bug).

The space character in "GET%20%2Fapi%2F%2Freport" is not leading nor trailing so I think we are doing the right thing by not accepting this string.

What do you think, @srikanthccv?

@sk-
Copy link
Contributor Author

sk- commented Sep 26, 2022

@ocelotl in the same spec you are citing you are missing the important part

Any characters outside of the baggage-octet range of characters MUST be percent-encoded.

@bruce-y
Copy link

bruce-y commented Nov 1, 2022

We are running into a similar issue. FWIW, The OTEL client for NodeJS handles values with spaces so long as they are percent-encoded.

@ocelotl
Copy link
Contributor

ocelotl commented Nov 17, 2022

I'm not sure extract should accept percent-encoded strings that have not been percent-encoded themselves, from here:

Because the percent ("%") character serves as the indicator for
percent-encoded octets, it must be percent-encoded as "%25" for that
octet to be used as data within a URI.

I think extract is doing the right thing by rejecting a string that comes with a %20 character because it expects that string to come from the result of an inject method who should have percent-encoded that %20 (leading to %2520). What is happening here is that a percent-encoded string is being passed to extract without first passing it to inject, which leads to an invalid string which does not have its percent-encoded characters percent-encoded themselves. When the same string is passed first to inject the string is handled right:

from opentelemetry.baggage.propagation import W3CBaggagePropagator
from opentelemetry.context import get_current
from opentelemetry.baggage import set_baggage

carrier = {}
w3c_baggage_propagator = W3CBaggagePropagator()

context = set_baggage(
    "baggage",
    "sentry-transaction=GET%20%2Fapi%2Freport",
    context=get_current()
)

w3c_baggage_propagator.inject(carrier, context)

print(carrier)

context = w3c_baggage_propagator.extract(carrier)

print(context)
{'baggage': 'baggage=sentry-transaction%3DGET%2520%252Fapi%252Freport'}
{'baggage-b037abe3-81b2-4195-86e6-929af2d86e99': {'baggage': 'sentry-transaction=GET%20%2Fapi%2Freport'}}

It is impossible for extract to handle percent-encoded strings that have not been percent-encoded themselves before, because how can extract tell if a percent symbol comes from the original string (then expect should leave it as it is) or from the encoding done by inject (then expect should decode it)?

I don't think this is a bug, WDYT @srikanthccv?

@ocelotl ocelotl self-assigned this Nov 17, 2022
@sk-
Copy link
Contributor Author

sk- commented Nov 17, 2022

@ocelotl I see your point, you are saying that the spec is ambiguos because given that the baggage-octet range includes the '%' symbol and the language around when to encode string, then there's no way to distinguish whether a%20b means a%20b, which would be fine because all characters are within the range, or whether it means a b, which would also be fine as the space is outside the range, and hence must be percent encoded.

Note however, that in Java, the value is always percent encoded, so the case above would not be ambiguous, as in the first case it would be encoded as a%2520b and in the second as a%20b. See https://cs.github.com/open-telemetry/opentelemetry-java/blob/0e41b1469d1fbc20c4fc5e7b59df0f4ac54330b6/api/all/src/main/java/io/opentelemetry/api/baggage/propagation/W3CBaggagePropagator.java#L68

In Java, it also seems there's no validation happening for the value, and the value is always being decoded. See https://cs.github.com/open-telemetry/opentelemetry-java/blob/0e41b1469d1fbc20c4fc5e7b59df0f4ac54330b6/api/all/src/main/java/io/opentelemetry/api/baggage/propagation/Parser.java#L128 and the function that calls it.

Note though, that your example should be:

from opentelemetry.baggage.propagation import W3CBaggagePropagator
from opentelemetry.context import get_current
from opentelemetry.baggage import set_baggage

carrier = {}
w3c_baggage_propagator = W3CBaggagePropagator()

context = set_baggage(
    "sentry-transaction",
    "GET /api/report",
    context=get_current()
)

print(context)

w3c_baggage_propagator.inject(carrier, context)

print(carrier)

context = w3c_baggage_propagator.extract(carrier)

print(context)

which unfortunately does not work, because you are restricting the values that can be used in the baggage, which is no correct.

@ocelotl
Copy link
Contributor

ocelotl commented Nov 17, 2022

@ocelotl I see your point, you are saying that the spec is ambiguos because given that the baggage-octet range includes the '%' symbol and the language around when to encode string, then there's no way to distinguish whether a%20b means a%20b, which would be fine because all characters are within the range, or whether it means a b, which would also be fine as the space is outside the range, and hence must be percent encoded.

What I am saying is that for a baggage value to be parseable by extract it has to be percent-encoded first. In your example above you mentioned this:

from opentelemetry.baggage.propagation import W3CBaggagePropagator
W3CBaggagePropagator().extract({"baggage": "sentry-transaction=GET%20%2Fapi%2Freport"})
# Returns empty and raises warning Invalid baggage entry: `sentry-transaction=GET%20%2Fapi%2Freport`

That string "sentry-transaction=GET%20%2Fapi%2Freport" is not percent-encoded and that is why you are getting that error. Out of curiosity, why would you pass a non-percent encoded string to extract?

which unfortunately does not work, because you are restricting the values that can be used in the baggage, which is no correct.

Sorry, how are we restricting the values? 🤔

@sk-
Copy link
Contributor Author

sk- commented Nov 18, 2022

@ocelotl that value is percent encoded correctly, and corresponds to the value GET /api/report. That is a baggage value generated by Sentry, and it is encoded correctly according to the spec.

The problem with your example is that you are misunderstanding how percent encoding works. Please refer to the code used in the java package.

As I mentioned originally, the problem is that the validation of the value is happening after decoding it. But, you should first validate the encoded value conforms to the character range and then decode its value.

You are restricting the values, because the spec, which I have cited many times already, says

A value contains a string whose character encoding MUST be UTF-8 [Encoding]. Any characters outside of the baggage-octet range of characters MUST be percent-encoded. Characters which are not required to be percent-encoded MAY be percent-encoded.

So you are allowed to have any string as a value, but the function set_baggage rejects any string with characters outside the baggage-octet range.

Pleas also note that there are at least 2 other open-telemetry implementations that handle this.

@srikanthccv could you please take a look

@ocelotl
Copy link
Contributor

ocelotl commented Nov 18, 2022

Ah, ok, now I can understand what you mean, fixing...

@srikanthccv
Copy link
Member

@ocelotl Would you like to assign this yourself?

@lzchen
Copy link
Contributor

lzchen commented Nov 21, 2022

Related, decoding resource attributes from otlpresourcedetector: #3046

@ocelotl
Copy link
Contributor

ocelotl commented Nov 28, 2022

@ocelotl Would you like to assign this yourself?

I did already.

ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Nov 29, 2022
ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Nov 29, 2022
ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Nov 29, 2022
ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Nov 30, 2022
ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Dec 1, 2022
ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Dec 13, 2022
@ghandic
Copy link

ghandic commented Dec 20, 2022

What is the status of this? Major issue for production release if we cant contain spaces

@lzchen
Copy link
Contributor

lzchen commented Jan 9, 2023

@ghandic

There is a PR open pending reviews for this issue. Once that is merged, it should go into the next release.

ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Jan 18, 2023
ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Jan 18, 2023
ocelotl added a commit that referenced this issue Jan 20, 2023
* Fix validation of baggage values

Fixes #2934

* Remove test case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants