Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Appendix for percent-encoding concerns (3.0.4) #3859

Merged
merged 9 commits into from
Jun 13, 2024

Conversation

handrews
Copy link
Member

@handrews handrews commented May 26, 2024

Rendering with all currently open 3.0.4 changes (as of the posting of this PR) can be found here.

If you feel this is too much detail, you should have seen the first seven drafts I wrote... do feel free to help edit this down, but it's the sort of mind-numbingly tedious thing that is ideal for an appendix as most people will not care, but the people who care will need it spelled out because it doesn't make any sense and requires slogging through compelx IETF/W3C/WHATWG history to even figure out which specs might be relevant.

Fixes:

Percent-encoding is a minefield, although in practice it mostly works.

This appendix attempts to acknowledge the concerns and then define enough terminology and link to enough other specifications that interested readers will be able to research further details.

@handrews handrews added clarification requests to clarify, but not change, part of the spec param serialization Issues related to parameter and/or header serialization media and encoding Issues regarding media type support and how to encode data (outside of query/path params) labels May 26, 2024
@handrews handrews added this to the v3.0.4 milestone May 26, 2024
@handrews handrews requested a review from a team May 26, 2024 23:39
@handrews handrews changed the title Appendex for percent-encoding concerns (3.0.4) Appendix for percent-encoding concerns (3.0.4) May 27, 2024
versions/3.0.4.md Outdated Show resolved Hide resolved
versions/3.0.4.md Outdated Show resolved Hide resolved
Copy link
Contributor

@ralfhandl ralfhandl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, with minor nits

ralfhandl
ralfhandl previously approved these changes May 28, 2024
However, other specifications can define special meanings, requiring percent-encoding for those characters outside of the additional special meanings.

The `form-urlencoded` media type defines special meanings for `=` and `&` as delimiters, and `+` as the replacement for the space character (instead of its percent-encoded form of `%20`).
This means that while these three characters are reserved-but-allowed in query strings by RFC3986, the must be percent-encoded in `form-urlencoded` query strings except when used for their `form-urlencoded` purposes; see [Appendix C](#usingRFC6570Implementations) for an example of handling `+` in form values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/the must/they must/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks- fixed!

@notEthan
Copy link
Contributor

notEthan commented Jun 3, 2024

This is good information and I wish I'd had it many years ago when I was deciphering such rules from the RFCs and how tools I was using handled (or mishandled) implementing them.

ralfhandl
ralfhandl previously approved these changes Jun 4, 2024
@ralfhandl ralfhandl requested a review from notEthan June 4, 2024 08:28
@handrews
Copy link
Member Author

handrews commented Jun 8, 2024

I have expanded this to address the rest of #1663, as well as #1322 and #3849, as doing them in a separate PR would have resulted in a nightmarish set of merge conflicts. (I actually wrote them separately and tried the merge and decided... nope, let's not do that).

@notEthan and @ralfhandl , the new changes are in a separate commit if you'd like to review just them.

Here is the commit message of that new commit:


Further clarify style+explode examples

This aligns all examples with RFC6570 operator prefixing behavior,
which was previously only shown for matrix and label. The
non-RFC6570 styles (spaceDelimited, pipeDelimited, and
deepObject) are treated as analogues of form and therefore
prefixed with a ?. The lack of suitablity of this for
cookie parameters has been addressed with an appendix in
another change, and the appendix has been stubbed out here to
ensure that the link is valid.


The following table shows examples, as would be shown with the `example` or `examples` keywords, of the different serializations for each value.

* The value _e/s_ denotes the empty string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems more readable to me to use the word empty. n/a is a common abbreviation but I don't think I've ever seen e/s before. Italics seems good to differentiate it from the other values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@notEthan I've reworked the language in the bullet points, changed a column heading, and used empty - all of this is in a separate commit that you can look at through the commits menu.

ralfhandl
ralfhandl previously approved these changes Jun 10, 2024
@handrews
Copy link
Member Author

The latest force-push rebases on top of the stubbed-out appendix headings PR and avoids having to merge the release branch into the PR branch for conflict resolution.

It also restores the Encoding Object's restriction that allowReserved only applies to application/x-www-form-urlencoded which was accidentally removed in some other recent PR. I think. It's there in 3.0.3 so I put it back.

@handrews
Copy link
Member Author

The last commit pushed updates the Header Object's fields, which are now copied from the Parameter Object, to match how this PR changes the copied fields. It also includes changes from the Examples PR because it and the Header fields PR got merge in quick succession and I forgot I'd need to do this. It's... somewhat related to encoding? Really I just don't want to juggle another PR for such a small, closely related change.

ralfhandl
ralfhandl previously approved these changes Jun 11, 2024
@ralfhandl ralfhandl requested a review from a team June 11, 2024 07:49
handrews and others added 8 commits June 12, 2024 16:12
Percent-encoding is a minefield, although in practice it mostly works.

This appendix attempts to acknowledge the concerns and then define enough
terminology and link to enough other specifications that interested
readers will be able to research further details.
"serialization" matches other discussions, "rendering" does not.
Co-authored-by: Ralf Handl <[email protected]>
This is to make sure PR checks pass even though the PR adding
Appendix C has not yet been merged.
This aligns all examples with RFC6570 operator prefixing behavior,
which was previously only shown for `matrix` and `label`.  The
non-RFC6570 styles (`spaceDelimited`, `pipeDelimited`, and
`deepObject`) are treated as analogues of `form` and therefore
prefixed with a `?`.  The lack of suitablity of this for
cookie parameters has been addressed with an appendix in
another change, and the appendix has been stubbed out here to
ensure that the link is valid.
This was confusing, and my initial use of _e/s_ was too novel.
Switch the column heading to "undefined" to align with RFC6570
and make clear that it is not about `allowEmptyValue`
Copy link
Member Author

@handrews handrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a few notes to explain the various parts of the large PR.

<a name="encodingAllowReserved"></a>allowReserved | `boolean` | When this is true, parameter values are serialized using reserved expansion, as defined by [RFC6570](https://datatracker.ietf.org/doc/html/rfc6570#autoid-20), which allows [RFC3986's reserved character set](https://datatracker.ietf.org/doc/html/rfc3986#autoid-13), as well as percent-encoded triples, to pass through unchanged, while still percent-encoding all other disallowed characters (including `%` outside of percent-encoded triples). Applications are still responsible for percent-encoding reserved characters that are [not allowed in the query string](https://datatracker.ietf.org/doc/html/rfc3986#autoid-24) (`[`, `]`, `#`), or have a special meaning in `application/x-www-form-urlencoded` (`-`, `&`, `+`); see Appendices [C](#usingRFC6570Implementations) and [E](#percentEncodingAndFormMediaTypes) for details. This property only applies to parameters with an `in` value of `query`. The default value is `false`.
<a name="encodingStyle"></a>style | `string` | Describes how a specific property value will be serialized depending on its type. See [Parameter Object](#parameterObject) for details on the [`style`](#parameterStyle) property. The behavior follows the same values as `query` parameters, including default values. Note that the initial `?` using in query strings is not used in `applcation/x-www-form-urlencoded` message bodies, and MUST be removed (if using an RFC6570 implementation) or simply not added (if constructing the string manually). This property SHALL be ignored if the request body media type is not `application/x-www-form-urlencoded`.
<a name="encodingExplode"></a>explode | `boolean` | When this is true, property values of type `array` or `object` generate separate parameters for each value of the array, or key-value-pair of the map. For other types of properties this property has no effect. When [`style`](#encodingStyle) is `form`, the default value is `true`. For all other styles, the default value is `false`. Note that despite `false` being the default for `deepObject`, the combination of `false` with `deepObject` is undefined. This property SHALL be ignored if the request body media type is not `application/x-www-form-urlencoded`.
<a name="encodingAllowReserved"></a>allowReserved | `boolean` | When this is true, parameter values are serialized using reserved expansion, as defined by [RFC6570](https://datatracker.ietf.org/doc/html/rfc6570#autoid-20), which allows [RFC3986's reserved character set](https://datatracker.ietf.org/doc/html/rfc3986#autoid-13), as well as percent-encoded triples, to pass through unchanged, while still percent-encoding all other disallowed characters (including `%` outside of percent-encoded triples). Applications are still responsible for percent-encoding reserved characters that are [not allowed in the query string](https://datatracker.ietf.org/doc/html/rfc3986#autoid-24) (`[`, `]`, `#`), or have a special meaning in `application/x-www-form-urlencoded` (`-`, `&`, `+`); see Appendices [C](#usingRFC6570Implementations) and [E](#percentEncodingAndFormMediaTypes) for details. This property only applies to parameters with an `in` value of `query`. The default value is `false`. This property SHALL be ignored if the request body media type is not `application/x-www-form-urlencoded`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only change to this line is the restoration of

This property SHALL be ignored if the request body media type is not application/x-www-form-urlencoded.

at the end, which got removed by accident somewhere along the way.

Comment on lines -2370 to +2476
<a name="headerExample"></a>example | Any | Example of the header's potential value. The example SHOULD match the specified schema and encoding properties if present. The `example` field is mutually exclusive of the `examples` field. Furthermore, if referencing a `schema` that contains an example, the `example` value SHALL _override_ the example provided by the schema. To represent examples of media types that cannot naturally be represented in JSON or YAML, a string value can contain the example with escaping where necessary.
<a name="headerExamples"></a>examples | Map[ `string`, [Example Object](#exampleObject) \| [Reference Object](#referenceObject)] | Examples of the header's potential value. Each example SHOULD contain a value in the correct format as specified in the header encoding. The `examples` field is mutually exclusive of the `example` field. Furthermore, if referencing a `schema` that contains an example, the `examples` value SHALL _override_ the example provided by the schema.
<a name="headerExample"></a>example | Any | Example of the header's potential value; see [Working With Examples](#working-with-examples).
<a name="headerExamples"></a>examples | Map[ `string`, [Example Object](#exampleObject) \| [Reference Object](#referenceObject)] | Examples of the header's potential value; see [Working With Examples](#working-with-examples).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes would have been part of the examples PR but it was merged within hours of the PR that added this explicit fixed fields table for the Header Object and I didn't remember that I needed to update whichever went second. So I just incorporated it here.

versions/3.0.4.md Show resolved Hide resolved
Comment on lines +1138 to +1171
form | false | ?color= | ?color=blue | ?color=blue,black,brown | ?color=R,100,G,200,B,150
form | true | ?color= | ?color=blue | ?color=blue&color=black&color=brown | ?R=100&G=200&B=150
spaceDelimited | false | _n/a_ | _n/a_ | ?color=blue%20black%20brown | ?color=R%20100%20G%20200%20B%20150
spaceDelimited | true | _n/a_ | _n/a_ | _n/a_ | _n/a_
pipeDelimited | false | _n/a_ | _n/a_ | ?color=blue%7Cblack%7Cbrown | ?color=R%7C100%7CG%7C200%7CB%7C150
pipeDelimited | true | _n/a_ | _n/a_ | _n/a_ | _n/a_
deepObject | false | _n/a_ | _n/a_ | _n/a_ | _n/a_
deepObject | true | _n/a_ | _n/a_ | _n/a_ | ?color%5BR%5D=100&color%5BG%5D=200&color%5BB%5D=150
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ? here is to be more explicit with how RFC6570-form-style expansion works, which I extended to pipeDelimited, spaceDelimited, and deepObject as they are alternatives to form.

There is a matching note in the Encoding Object that the ? needs to be removed or not added for form-urlencoded message bodies.

<a name="encodingStyle"></a>style | `string` | Describes how a specific property value will be serialized depending on its type. See [Parameter Object](#parameterObject) for details on the [`style`](#parameterStyle) property. The behavior follows the same values as `query` parameters, including default values. This property SHALL be ignored if the request body media type is not `application/x-www-form-urlencoded`.
<a name="encodingExplode"></a>explode | `boolean` | When this is true, property values of type `array` or `object` generate separate parameters for each value of the array, or key-value-pair of the map. For other types of properties this property has no effect. When [`style`](#encodingStyle) is `form`, the default value is `true`. For all other styles, the default value is `false`. This property SHALL be ignored if the request body media type is not `application/x-www-form-urlencoded`.
<a name="encodingAllowReserved"></a>allowReserved | `boolean` | When this is true, parameter values are serialized using reserved expansion, as defined by [RFC6570](https://datatracker.ietf.org/doc/html/rfc6570#autoid-20), which allows [RFC3986's reserved character set](https://datatracker.ietf.org/doc/html/rfc3986#autoid-13), as well as percent-encoded triples, to pass through unchanged, while still percent-encoding all other disallowed characters (including `%` outside of percent-encoded triples). Applications are still responsible for percent-encoding reserved characters that are [not allowed in the query string](https://datatracker.ietf.org/doc/html/rfc3986#autoid-24) (`[`, `]`, `#`), or have a special meaning in `application/x-www-form-urlencoded` (`-`, `&`, `+`); see Appendices [C](#usingRFC6570Implementations) and [E](#percentEncodingAndFormMediaTypes) for details. This property only applies to parameters with an `in` value of `query`. The default value is `false`.
<a name="encodingStyle"></a>style | `string` | Describes how a specific property value will be serialized depending on its type. See [Parameter Object](#parameterObject) for details on the [`style`](#parameterStyle) property. The behavior follows the same values as `query` parameters, including default values. Note that the initial `?` using in query strings is not used in `applcation/x-www-form-urlencoded` message bodies, and MUST be removed (if using an RFC6570 implementation) or simply not added (if constructing the string manually). This property SHALL be ignored if the request body media type is not `application/x-www-form-urlencoded`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the line addressing how RFC6570 form-style expansion includes the ?, which needs to be stripped for form-urlencoded message bodies. This pairs with the addition of the ? to the style examples table.

That style is not usable for the Header Object anyway.
* The behavior of combinations marked _n/a_ is undefined
* The `undefined` replaces the `empty` column in previous versions of this specification in order to better align with [RFC6570 §2.3](https://www.rfc-editor.org/rfc/rfc6570.html#section-2.3) terminology, which describes certain values including but not limited to `null` as "undefined" values with special handling; notably, the empty string is _not_ undefined
* For `form` and the non-RFC6570 query string styles `spaceDelimited`, `pipeDelimited`, and `deepObject`, each example is shown prefixed with `?` as if it were the only query parameter; see [Appendix C](#usingRFC6570Implementations) for more information on constructing query strings from multiple parameters, and [Appendix D](#serializingHeadersAndCookies) for warnings regarding `form` and cookie parameters
* Note that the `?` prefix is not appropriate for serializing `application/x-www-form-urlencoded` HTTP message bodies, and MUST be stripped or (if constructing the string manually) not added when used in that context; see the [Encoding Object](#encodingObject) for more information
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the line addressing how RFC6570 form-style expansion includes the ?, which needs to be stripped for form-urlencoded message bodies. This pairs with the addition of the ? to the style examples table.

Suggested change
* Note that the `?` prefix is not appropriate for serializing `application/x-www-form-urlencoded` HTTP message bodies, and MUST be stripped or (if constructing the string manually) not added when used in that context; see the [Encoding Object](#encodingObject) for more information
* Note that the `?` prefix is not appropriate for serializing `application/x-www-form-urlencoded` HTTP message bodies, and MUST be stripped or (if constructing the string manually) not added when used in that context; see the [Encoding Object](#encodingObject) for more information
Suggested change
* Note that the `?` prefix is not appropriate for serializing `application/x-www-form-urlencoded` HTTP message bodies, and MUST be stripped or (if constructing the string manually) not added when used in that context; see the [Encoding Object](#encodingObject) for more information
* Note that the `?` prefix is not appropriate for serializing `application/x-www-form-urlencoded` HTTP message bodies, and MUST be stripped or (if constructing the string manually) not added when used in that context; see the [Encoding Object](#encodingObject) for more information

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea why this shows up as a "suggested change" at all, much less twice! It's just a comment.

Copy link
Contributor

@ralfhandl ralfhandl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent summary of the current mess 😄

Copy link
Contributor

@lornajane lornajane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed in TDC and didn't raise any major objections

@lornajane lornajane merged commit eec1906 into OAI:v3.0.4-dev Jun 13, 2024
1 check passed
@handrews handrews deleted the ser-percent-304 branch June 13, 2024 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarification requests to clarify, but not change, part of the spec media and encoding Issues regarding media type support and how to encode data (outside of query/path params) param serialization Issues related to parameter and/or header serialization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants