-
Notifications
You must be signed in to change notification settings - Fork 89
URI Path Canonicalization
Once agreed, the following text will be added to the Servlet specification at the start of section 12. See also issue 18 for background and discussion.
The process described here adapts and extends the URI canonicalization process described in RFC 3986 to create a standard Servlet URI path canonicalization process that ensures that URIs can be mapped to Servlets, Filters and security constraints in an unambiguous manner. It is also intended to provide information to reverse proxy implementations so they are aware of how requests they pass to servlet containers will be processed.
Servlet containers may implement the standard Servlet URI path canonicalization in any manner they see fit as long as the end result is identical to the end result of the process described here. Servlet containers may provide container specific configuration options to vary the standard canonicalization process. Any such variations may have security implications and both Servlet container implementors and users are advised to be sure that they understand the implications of any such container specific canonicalization options.
The URI is extracted from the request-target
as defined by RFC 7230. URIs in origin-form
or asterisk-form
are passed unchanged to stage 2. URIs in absolute-form
have the protocol and authority removed to convert them to origin-form
and are then passed to stage 2. URIs in authority-form
are outside of the scope of this specification.
The URI is the :path
pseudo header as defined by RFC 7540 and is passed unchanged to stage 2.
Containers may support other protocols. Containers should extract an appropriate URI for the request from the protocol and pass it to stage 2.
Characters encoded in %nn
form, other than those identified as reserved by RFC 3986 2.2 are decoded as octet sequences.
Reserved characters are left in the %nn
form.
WARNING Swapping the order of stage 3 and stage 4 may be significant. Consider
"/aaa/bbb//../"
.
Any sequence of more than one "/"
character in the URI must be replaced with a single "/"
.
URIs that contain segments of the following forms must be rejected with a 400 response:
".." sub-delim *(pchar)
"." sub-delim *(pchar)
Sequences of the form "/./"
must be replaced with "/"
.
Sequences of the form "/" segment "/../"
must be replaced with "/"
. If there is no preceding segment for a ".."
segment then return a 400 response.
Sequences of the form "/" *(unreserved / pct-encoded / ":" / "@") sub-delim *(pchar) "/"
must have the characters from and including the sub-delim
to the end of the segment removed.
TODO How do we handle URIs like
/foo/;/bar
? I think as currently written we end up with/foo//bar
?
Any remaining %nn
sequences should be decoded, although some containers may be configured to leave some specific characters encoded (eg. the characters '/' and '%' may be left decoded by some container configuration). The resulting octet sequence is converted to a character sequence using UTF-8 decoding.
The decoded path is used to map the request to a context and resource within the context. This form of the URI path is used for all subsequent mapping (web applications, servlet, filters and security constraints).
If suspicious sequences are discovered during the prior decoding steps suspicious, the request can be rejected with a 400 bad request using the error handling of the matched context.
By default the set of rejected sequences must include:
-
%2F
,%2f
/.;
/..;
-
/%2E
,%2e
-
/%2E%2E
,%2E%2e
,%2e%2E
,%2e%2e
A container or context may be configured to have a different set of rejected sequences.