Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Prevent decoding of percent-encoded ASCII characters in URL's path
This CL is part of the URL interop 2023 effort. "Intent to Implement and Ship" is [1]. Currently, when Chrome parses a URL, it decodes percent-encoded ASCII characters in URL path. However, this behavior doesn't align with the URL Standard [2]. The CL fixes this behavior to retain percent-encoded ASCII characters in URL's path. Before: > const url = new URL("http://example.com/%41"); > url.href "http://example.com/A" After: > const url = new URL("http://example.com/%41"); > url.href "http://example.com/%41" Interoperability: - Chrome isn't compliant, while Firefox and Safari are compliant. - I've tested URL APIs in non-browser environments and libraries, such as Deno's `URL` implementation [3] and Rust's `url` crate [4], both of which are standard-compliant. Background: The existing behavior seems to be a result of past decisions. The comment in `url_canon_path.cc` states: > // This table was used to be designed to match exactly what IE did > // with the characters. Impact: Regarding implementation, web-exported URL APIs, GURL, and KURL share the same URL parser and canonicalization backend. Given that these URL classes are widely used both internally or externally, predicting all possible consequences and risks is challenging. Given the very low user metrics [5], we received approval to land [1], but with a kill switch in place. UMA: Usage: 0.000071% (URL.Path.UnescapeEscapedChar [5], as of Aug 2023) This number isn't specific to any particular use case and represents a an upper bound. The actual impact is likely lower. Interaction with web servers: Before: When a user enters "https://example.com/%41" in the address bar or clicks a link like <a href="https://example.com/%41">, Chrome sends "/A" to the server. After:: Chrome now sends "/%41" to the server, without decoding, similar to Safari and Firefox. Note that Chrome's address bar will still display "https://example.com/A" because the address bar formats URLs in their own way. For websites, how to handle percent-encoded characters in a URL's path is up to each website. Since they can receive such URLs from various clients, not just Chrome, this isn't a new issue for most websites. They typically decode a URL's path before processing. Another concern relates to Chromium's internal code or developers who rely on the current behavior, intentionally or not. For example, this CL might lead to issues in cases like: ``` const hash = {}; const url1 = new URL("http://example.com/%41"); hash[url1.href] = "v1"; // ... const url2 = new URL("http://example.com/A"); hash[url2.href] // Assumed that "v1" is retrieved, but this is no longer true. ``` According to the URL Standard, `url1` and `url2` are not equivalent [6], but some clients might depend on Chrome's current behavior as a feature. This presents a risk. Additional notes: - This change only affects the URL's path. Other parts like the host are not impacted. - There was a discussion about Chrome's behavior [7]. The consensus is that Chrome's behavior should be fixed for better interoperability. - There's a proposal to add a normalization interface [8] to URL. - [1] https://groups.google.com/a/chromium.org/g/blink-dev/c/1L8vW_Xo8eY/m/3Otq2TkvAwAJ - [2] https://url.spec.whatwg.org/#url-parsing - [3] https://deno.land/[email protected]?s=URL - [4] https://docs.rs/url/latest/url/ - [5] https://uma.googleplex.com/p/chrome/timeline_v2/?sid=1bb9e227dc4889fd2efbf5755d256c62 - [6] https://url.spec.whatwg.org/#url-equivalence - [7] whatwg/url#606 - [8] whatwg/url#729 Bug: 1252531 Change-Id: I135b4efbe76bc58ba5b6c5ce652ed0aa72002249 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4607744 Reviewed-by: Daniel Cheng <[email protected]> Reviewed-by: James Lee <[email protected]> Reviewed-by: Avi Drissman <[email protected]> Reviewed-by: Emily Stark <[email protected]> Commit-Queue: Hayato Ito <[email protected]> Cr-Commit-Position: refs/heads/main@{#1191900} NOKEYCHECK=True GitOrigin-RevId: 21947c4c384cd129c20862475364ec6d430998ea
- Loading branch information