Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java: Audit the Otel implementation of Baggage using Baggage Test Suite #138

Open
kalyanaj opened this issue Jul 30, 2024 · 6 comments
Open
Assignees
Milestone

Comments

@kalyanaj
Copy link
Contributor

This is based on the exit criteria for the Candidate Recommendation State: "During the Candidate Recommendation stage, the Working Group will audit:
the OpenTelemetry implementations of Baggage to make sure they are following the W3C Baggage spec."

Target Milestone: Oct 31 2024 so that we can proceed with the next steps for getting Baggage into recommendation state.

@kalyanaj kalyanaj added this to the Oct31-2024 milestone Sep 10, 2024
@basti1302
Copy link
Contributor

basti1302 commented Oct 8, 2024

I took our own test suite (https://github.com/w3c/baggage/blob/main/test/test_baggage.py) and converted it to Java here: https://github.com/basti1302/opentelemetry-java/blob/w3c-baggage-issue-138-audit-otel-java-sdk/api/all/src/test/java/io/opentelemetry/api/baggage/propagation/W3cSpecTest.java

TL;DR: I think the OTel Java SDK implementation of baggage is not compliant with this spec when properties/metadata are involved.

This is due to one fundamental difference between the OTel baggage spec and this spec. The W3C spec allows additional/optional properties on list members. There properties itself can also be key-value pairs (like list members). Like so:

baggage: key1=value1;property1, key2=value2;property2=propertyValue2

which translates to two list members:

  • key1: value1 with a flag-like property property1 (without value)
  • key2: value2 with a property property2: propertyValue2.

In contrast, the OTel baggage spec supports additional properties on list members under the term "metadata", but metadata is one opaque string, and it is not interpreted or parsed. See https://opentelemetry.io/docs/specs/otel/baggage/api/#set-value.

Coming back to the Java version of our reference test suite:

  • Test cases that only involve simple key-value pairs (without properties) all work as expected.
  • ... with the obvious difference that when our test suite expects multiple properties or key-value shaped properties, the OTel API does not provide that),
  • Test cases where the properties-part of a key-value pair does not include inner white space and does not include key-value properties also work as expected.

But when the properties-part of a list member includes OWS or is a key value pair (and thus includes the equals character), the OTel Java SDK will

  • treat the whole property as one opaque string
  • percent-encode it when propagating it downstream.

Consider the following baggage header:

baggage: SomeKey=SomeValue;ValueProp \t = \t PropVal

We would expect this to be understood as one list member, with key SomeKey, value SomeValue and one property ValueProp: PropVal.

However, the Java OTel SDK will propagate this downstream as follows (more precisely, this is the string after one extract & inject roundtrip):

baggage: SomeKey=SomeValue;ValueProp%20%09%20%3D%20%09%20PropVal"

(See https://github.com/basti1302/opentelemetry-java/blob/w3c-baggage-issue-138-audit-otel-java-sdk/api/all/src/test/java/io/opentelemetry/api/baggage/propagation/W3cSpecTest.java#L74-L102 which exemplifies this behavior.)

That is, the inner whitespace and the equals sign in the property are being percent-encoded. Which is probably the correct behavior according to OTel's way of interpreting metadata as one opaque string.

Now, when this header would be parsed by a spec-compliant baggage implementation, this would probably be parsed as follows: One list member, with key SomeKey, value SomeValue, and with one property which no longer has key-value shape but is now a flag-like property ValueProp%20%09%20%3D%20%09%20PropVal (or the percent-decoded variant of this, but still as a single string).

At least this is what our reference implementation would do. To verify this, I added this test case to the W3C test suite locally on my machine:

    def test_parse_property_percent_encoded(self):
        baggage = Baggage().from_string(
            "SomeKey=SomeValue;ValueProp%20%09%20%3D%20%09%20PropVal")
        self.assertEqual(len(baggage.entries), 1)
        entry1 = baggage.entries[0]
        self.assertEqual(entry1.key, "SomeKey")
        self.assertEqual(entry1.value, "SomeValue")
        self.assertEqual(entry1.properties[0].key, "ValueProp%20%09%20%3D%20%09%20PropVal")
        self.assertEqual(entry1.properties[0].value, None)

The test passes like this.

Thus, the way the OTel SDK treats metadata as one opaque string conflicts with the following sentence in the W3C spec:

Leading and trailing OWS is allowed and is not considered to be a part of the property key or value.

Because after having gone through the OTel Java SDK, the OWS has become part of the property key due being percent encoded.

I strongly suspect that other OTel SDKs will exhibit the same behavior. I think the underlying issue is not an implementation issue in the Java OTel SDK but really the difference between the W3C baggage spec and the OTel baggage spec.

Additional note: AFAIK, our spec does not spell out whether or not properties are subject to percent encoding when propagating downstream. This is only specified for values. We should spell this out explicitly (and I assume at least property values should also be subject to percent encoding?).

@dyladan
Copy link
Member

dyladan commented Oct 8, 2024

Hmm. I remember when this decision was made it was specifically because the metadata had no meaning or purpose yet. When OTel was implementing the draft spec, we didn't know what metadata might turn into so we decided to treat it as opaque to leave options open for the future. I will have to test this specifically, but I think the JS implementation doesn't do the percent encoding trick, but just passes it directly unchanged.

@basti1302
Copy link
Contributor

basti1302 commented Oct 9, 2024

We discussed this in the working group meeting yesterday and came to the following consensus: We will file this as an implementation bug with the Java OTel SDK. Neither our spec nor the OTel baggage spec mandates to percent-encode properties/metadata. In fact, the OTel spec mandates to treat metadata as an opaque string (and not process it at all). We will also check other OTel SDKs whether they exhibit similar behavior.

Tangentially related, we are also considering changing the MUST for percent encoding values of list-members. Instead of prescribing a specific processing, we probably should only specify the allowed characters without mandating any specific mechanism for encoding not-allowed characters. (Percent-encoding could be mentioned as a non-normative hint as one possible option.) /Edit: This aspect is discussed here now: #145

@basti1302
Copy link
Contributor

The Java OTel SDK issue has been filed here: open-telemetry/opentelemetry-java#6771

@yurishkuro
Copy link
Member

Hold on - the spec absolutely must mandate encoding, otherwise the consumer never knows how to interpret the data, whether to decode it or not. We had this ambiguity in jaeger baggage format and it was very painful to handle

@basti1302
Copy link
Contributor

basti1302 commented Oct 9, 2024

@yurishkuro I created a new issue for this part of the discussion -> #145. Sorry, I should have linked to this new issue here as well, my bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants