Incorrect validation of baggage values #2934

sk- · 2022-09-15T20:13:27Z

Describe your environment
python: 3.10.6
opentelemetry-api: 1.12.0
opentelemetry-sdk: 1.12.0

Steps to reproduce

from opentelemetry.baggage import _is_valid_pair
_is_valid_pair("sentry-transaction", "GET%20%2Fapi%2F%2Freport") # OK

from opentelemetry.baggage.propagation import W3CBaggagePropagator
W3CBaggagePropagator().extract({"baggage": "sentry-transaction=GET%20%2Fapi%2Freport"})
# Returns empty and raises warning Invalid baggage entry: `sentry-transaction=GET%20%2Fapi%2Freport`

What is the expected behavior?
Opentelemetry should be able to parse baggage values which are percent encoded, as that's indicated by the W3C spec

A value contains a string whose character encoding MUST be UTF-8 [Encoding]. Any characters outside of the baggage-octet range of characters MUST be percent-encoded. Characters which are not required to be percent-encoded MAY be percent-encoded.

What is the actual behavior?
No baggage values were extracted and the following warning was raised:

Invalid baggage entry: `sentry-transaction=GET%20%2Fapi%2Freport`

Additional context
The problem is happening here:

opentelemetry-python/opentelemetry-api/src/opentelemetry/baggage/propagation/__init__.py

Line 92 in 75313b6

value = unquote_plus(value).strip()

Note also that the indicated code path seems to be not be too optimized, as the call to set_baggage will validate both the key and values again. And additionally, for each baggage value a new context will be created. Maybe it'd be better to first collect all baggage values and set them only once in the context.

The text was updated successfully, but these errors were encountered:

sk- · 2022-09-21T00:22:33Z

@srikanthccv @ocelotl what do you think?

srikanthccv · 2022-09-21T06:05:32Z

This looks like a legitimate bug. I haven't gotten time to triage this. I will check.

ocelotl · 2022-09-26T12:09:29Z

Checking...

ocelotl · 2022-09-26T12:56:42Z

Ok, I think this is what's happening here:

The definition for the baggage value says:

value                  =  *baggage-octet
baggage-octet          =  %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                          ; US-ASCII characters excluding CTLs,
                          ; whitespace, DQUOTE, comma, semicolon,
                          ; and backslash

That is why our regex value is this:

_VALUE_FORMAT = r"[\x21\x23-\x2b\x2d-\x3a\x3c-\x5b\x5d-\x7e]*"

That regex does not allow space characters, the reported string has one in %20. The spec for value says:

Leading and trailing whitespaces (OWS) are allowed and are not considered to be a part of the value.

I think that means we should first strip any leading and trailing whitespace characters from the value string, then apply the regex (we don't do that so I think we do have a bug).

The space character in "GET%20%2Fapi%2F%2Freport" is not leading nor trailing so I think we are doing the right thing by not accepting this string.

What do you think, @srikanthccv?

sk- · 2022-09-26T13:21:49Z

@ocelotl in the same spec you are citing you are missing the important part

Any characters outside of the baggage-octet range of characters MUST be percent-encoded.

bruce-y · 2022-11-01T19:44:46Z

We are running into a similar issue. FWIW, The OTEL client for NodeJS handles values with spaces so long as they are percent-encoded.

ocelotl · 2022-11-17T15:58:47Z

I'm not sure extract should accept percent-encoded strings that have not been percent-encoded themselves, from here:

Because the percent ("%") character serves as the indicator for
percent-encoded octets, it must be percent-encoded as "%25" for that
octet to be used as data within a URI.

I think extract is doing the right thing by rejecting a string that comes with a %20 character because it expects that string to come from the result of an inject method who should have percent-encoded that %20 (leading to %2520). What is happening here is that a percent-encoded string is being passed to extract without first passing it to inject, which leads to an invalid string which does not have its percent-encoded characters percent-encoded themselves. When the same string is passed first to inject the string is handled right:

from opentelemetry.baggage.propagation import W3CBaggagePropagator
from opentelemetry.context import get_current
from opentelemetry.baggage import set_baggage

carrier = {}
w3c_baggage_propagator = W3CBaggagePropagator()

context = set_baggage(
    "baggage",
    "sentry-transaction=GET%20%2Fapi%2Freport",
    context=get_current()
)

w3c_baggage_propagator.inject(carrier, context)

print(carrier)

context = w3c_baggage_propagator.extract(carrier)

print(context)

{'baggage': 'baggage=sentry-transaction%3DGET%2520%252Fapi%252Freport'}
{'baggage-b037abe3-81b2-4195-86e6-929af2d86e99': {'baggage': 'sentry-transaction=GET%20%2Fapi%2Freport'}}

It is impossible for extract to handle percent-encoded strings that have not been percent-encoded themselves before, because how can extract tell if a percent symbol comes from the original string (then expect should leave it as it is) or from the encoding done by inject (then expect should decode it)?

I don't think this is a bug, WDYT @srikanthccv?

sk- · 2022-11-17T19:00:35Z

@ocelotl I see your point, you are saying that the spec is ambiguos because given that the baggage-octet range includes the '%' symbol and the language around when to encode string, then there's no way to distinguish whether a%20b means a%20b, which would be fine because all characters are within the range, or whether it means a b, which would also be fine as the space is outside the range, and hence must be percent encoded.

Note however, that in Java, the value is always percent encoded, so the case above would not be ambiguous, as in the first case it would be encoded as a%2520b and in the second as a%20b. See https://cs.github.com/open-telemetry/opentelemetry-java/blob/0e41b1469d1fbc20c4fc5e7b59df0f4ac54330b6/api/all/src/main/java/io/opentelemetry/api/baggage/propagation/W3CBaggagePropagator.java#L68

In Java, it also seems there's no validation happening for the value, and the value is always being decoded. See https://cs.github.com/open-telemetry/opentelemetry-java/blob/0e41b1469d1fbc20c4fc5e7b59df0f4ac54330b6/api/all/src/main/java/io/opentelemetry/api/baggage/propagation/Parser.java#L128 and the function that calls it.

Note though, that your example should be:

from opentelemetry.baggage.propagation import W3CBaggagePropagator
from opentelemetry.context import get_current
from opentelemetry.baggage import set_baggage

carrier = {}
w3c_baggage_propagator = W3CBaggagePropagator()

context = set_baggage(
    "sentry-transaction",
    "GET /api/report",
    context=get_current()
)

print(context)

w3c_baggage_propagator.inject(carrier, context)

print(carrier)

context = w3c_baggage_propagator.extract(carrier)

print(context)

which unfortunately does not work, because you are restricting the values that can be used in the baggage, which is no correct.

ocelotl · 2022-11-17T19:42:35Z

@ocelotl I see your point, you are saying that the spec is ambiguos because given that the baggage-octet range includes the '%' symbol and the language around when to encode string, then there's no way to distinguish whether a%20b means a%20b, which would be fine because all characters are within the range, or whether it means a b, which would also be fine as the space is outside the range, and hence must be percent encoded.

What I am saying is that for a baggage value to be parseable by extract it has to be percent-encoded first. In your example above you mentioned this:

from opentelemetry.baggage.propagation import W3CBaggagePropagator
W3CBaggagePropagator().extract({"baggage": "sentry-transaction=GET%20%2Fapi%2Freport"})
# Returns empty and raises warning Invalid baggage entry: `sentry-transaction=GET%20%2Fapi%2Freport`

That string "sentry-transaction=GET%20%2Fapi%2Freport" is not percent-encoded and that is why you are getting that error. Out of curiosity, why would you pass a non-percent encoded string to extract?

which unfortunately does not work, because you are restricting the values that can be used in the baggage, which is no correct.

Sorry, how are we restricting the values? 🤔

sk- · 2022-11-18T00:22:22Z

@ocelotl that value is percent encoded correctly, and corresponds to the value GET /api/report. That is a baggage value generated by Sentry, and it is encoded correctly according to the spec.

The problem with your example is that you are misunderstanding how percent encoding works. Please refer to the code used in the java package.

As I mentioned originally, the problem is that the validation of the value is happening after decoding it. But, you should first validate the encoded value conforms to the character range and then decode its value.

You are restricting the values, because the spec, which I have cited many times already, says

A value contains a string whose character encoding MUST be UTF-8 [Encoding]. Any characters outside of the baggage-octet range of characters MUST be percent-encoded. Characters which are not required to be percent-encoded MAY be percent-encoded.

So you are allowed to have any string as a value, but the function set_baggage rejects any string with characters outside the baggage-octet range.

Pleas also note that there are at least 2 other open-telemetry implementations that handle this.

@srikanthccv could you please take a look

ocelotl · 2022-11-18T08:00:43Z

Ah, ok, now I can understand what you mean, fixing...

srikanthccv · 2022-11-18T19:34:50Z

@ocelotl Would you like to assign this yourself?

lzchen · 2022-11-21T23:03:32Z

Related, decoding resource attributes from otlpresourcedetector: #3046

ocelotl · 2022-11-28T20:27:48Z

@ocelotl Would you like to assign this yourself?

I did already.

Fixes open-telemetry#2934

ghandic · 2022-12-20T03:44:20Z

What is the status of this? Major issue for production release if we cant contain spaces

lzchen · 2023-01-09T21:43:03Z

@ghandic

There is a PR open pending reviews for this issue. Once that is merged, it should go into the next release.

Fixes open-telemetry#2934

* Fix validation of baggage values Fixes #2934 * Remove test case

sk- added the bug Something isn't working label Sep 15, 2022

ocelotl self-assigned this Nov 17, 2022

ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Nov 29, 2022

Fix validation of baggage values

ec445e4

Fixes open-telemetry#2934

ocelotl mentioned this issue Nov 29, 2022

Fix validation of baggage values #3058

Merged

ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Nov 29, 2022

Fix validation of baggage values

4f56ea0

Fixes open-telemetry#2934

ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Nov 29, 2022

Fix validation of baggage values

8ec1462

Fixes open-telemetry#2934

ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Nov 30, 2022

Fix validation of baggage values

4a8cabd

Fixes open-telemetry#2934

ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Dec 1, 2022

Fix validation of baggage values

291c6c4

Fixes open-telemetry#2934

ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Dec 13, 2022

Fix validation of baggage values

7b28f3f

Fixes open-telemetry#2934

ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Jan 18, 2023

Fix validation of baggage values

ff0843b

Fixes open-telemetry#2934

ocelotl added a commit to ocelotl/opentelemetry-python that referenced this issue Jan 18, 2023

Fix validation of baggage values

e0df4e8

Fixes open-telemetry#2934

tonyo mentioned this issue Jan 18, 2023

Error while parsing Baggage header with percent-encoded whitespace open-telemetry/opentelemetry-go#3601

Closed

ocelotl closed this as completed in #3058 Jan 20, 2023

ocelotl added a commit that referenced this issue Jan 20, 2023

Fix validation of baggage values (#3058)

a99cc21

* Fix validation of baggage values Fixes #2934 * Remove test case

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect validation of baggage values #2934

Incorrect validation of baggage values #2934

sk- commented Sep 15, 2022 •

edited

Loading

sk- commented Sep 21, 2022

srikanthccv commented Sep 21, 2022

ocelotl commented Sep 26, 2022

ocelotl commented Sep 26, 2022

sk- commented Sep 26, 2022

bruce-y commented Nov 1, 2022

ocelotl commented Nov 17, 2022

sk- commented Nov 17, 2022

ocelotl commented Nov 17, 2022

sk- commented Nov 18, 2022

ocelotl commented Nov 18, 2022

srikanthccv commented Nov 18, 2022

lzchen commented Nov 21, 2022

ocelotl commented Nov 28, 2022

ghandic commented Dec 20, 2022

lzchen commented Jan 9, 2023

Incorrect validation of baggage values #2934

Incorrect validation of baggage values #2934

Comments

sk- commented Sep 15, 2022 • edited Loading

sk- commented Sep 21, 2022

srikanthccv commented Sep 21, 2022

ocelotl commented Sep 26, 2022

ocelotl commented Sep 26, 2022

sk- commented Sep 26, 2022

bruce-y commented Nov 1, 2022

ocelotl commented Nov 17, 2022

sk- commented Nov 17, 2022

ocelotl commented Nov 17, 2022

sk- commented Nov 18, 2022

ocelotl commented Nov 18, 2022

srikanthccv commented Nov 18, 2022

lzchen commented Nov 21, 2022

ocelotl commented Nov 28, 2022

ghandic commented Dec 20, 2022

lzchen commented Jan 9, 2023

sk- commented Sep 15, 2022 •

edited

Loading