Skip to content

Commit

Permalink
Issue #18 URI path processing
Browse files Browse the repository at this point in the history
moved TODOs to PR comments
  • Loading branch information
gregw committed Oct 7, 2021
1 parent bd12ce5 commit fabaadd
Showing 1 changed file with 0 additions and 23 deletions.
23 changes: 0 additions & 23 deletions spec/src/main/asciidoc/servlet-spec-body.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -1324,15 +1324,7 @@ A fragment in the path is indicated by the first occurrence of a `\#` character.
==== Decoding of non-special characters
Characters other than `/`, `;` and `%` that are encoded in `%nn` form are decoded and the resulting octet sequences is treated as UTF-8 and converted to a character sequence.

> Note that special characters cannot be part of a UTF-8 character sequence as all such sequences are comprised of negative octets.

> Note this is not reserved characters as defined by RFC3986, as that does not include `%` and includes many characters we don't care about. Avoiding a second decoding is worthwhile.

==== Collapse sequences of multiple `"/"` characters
> **WARNING** Swapping the order of stage 3 and stage 4 may be significant. Consider `"/aaa/bbb//../"`.

> **TODO** Are we sure we don't want to do this in the other order?

Any sequence of more than one `"/"` character in the URI must be replaced with a single `"/"`.

==== Remove dot-segments+
Expand All @@ -1344,8 +1336,6 @@ Any sequence of more than one `"/"` character in the URI must be replaced with a
==== Removal of path parameters
A path segment containing the `";"` character is split at the first occurence of `";"`. The segment is replaced by the character sequence preceeding the `";"`. The characters following the `";"` are considered a path parameters and may be preserved by the container for later processing (eg `jsessionid`).

> TODO How do we handle URIs like `/foo/;/bar`? I think as currently written we end up with `/foo//bar` ?

==== Decoding of remaining `%nn` sequences
Any remaining `%nn` sequences in the path should be decoded. Some containers may be configured to leave some specific characters encoded (eg. the characters '/' and '%' may be left decoded by some container configuration).

Expand All @@ -1363,19 +1353,6 @@ If suspicious sequences are discovered during the prior steps, the request must

A container or context may be configured to have a different set of rejected sequences.

> TODO how do we define control characters? < 0x20? Is 0x7F (DEL) OK?

> TODO should we also by default reject '\' and '%5c' ?

> TODO should we also by default reject non visible and/or control characters ?

> TODO if %2F is allowed, we may now have double '/', '/./' and '/../' segments in the URL, should stage 3 and 4 be re-run if this is allowed?






=== Request Path Elements

The request path that leads to a servlet
Expand Down

0 comments on commit fabaadd

Please sign in to comment.