Function `normalize_url` doubly encodes query string in some cases. #231

Simon-Will · 2023-02-15T14:34:49Z

The function normalize_url does not correctly handle some query strings. From what I understand, the function is supposed to rearrange the query parameters so that URLs with different order of query parameters match.

However, the combination of urlencode and URL.with_query doubly encodes some characters. I think that it's exactly those characters that can legally occur in a query string, but not in any URL part, such as : and /. urlencode of course will encode them and URL.with_query will then pick up the % that the urlencode unnecessarily produced.

I propose to just drop the parse_qsl and urlencode resulting in the simplified_normalize_url in the following example code:

from yarl import URL
from aioresponses.compat import normalize_url

# %3A = :  (Does NOT need to be encoded in query string)
# %2F = /  (Does NOT need to be encoded in query string)
# %26 = &  (MUST be encoded in query string)
url_str = 'https://example.com/?var=foo%3Abar%2Fbaz%26gaz'


def simplified_normalize_url(url):
    url = URL(url)
    return url.with_query(sorted(url.query.items()))


print(f'Before: {url_str}')
print(f'After normalize_url: {normalize_url(url_str)}')
print(f'After simplified_normalize_url: {simplified_normalize_url(url_str)}')

Running it on my machine (Xubuntu 22.04.1 with Python 3.9.6) produces the following output:

Before: https://example.com/?var=foo%3Abar%2Fbaz%26gaz
After normalize_url: https://example.com/?var=foo%253Abar%252Fbaz%2526gaz
After simplified_normalize_url: https://example.com/?var=foo:bar/baz%26gaz

I can create a small pull request with tests for this tomorrow.

The text was updated successfully, but these errors were encountered:

Simon-Will linked a pull request Feb 16, 2023 that will close this issue

Fix double-query-encoding bug #232

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function `normalize_url` doubly encodes query string in some cases. #231

Function `normalize_url` doubly encodes query string in some cases. #231

Simon-Will commented Feb 15, 2023 •

edited

Loading

Function normalize_url doubly encodes query string in some cases. #231

Function normalize_url doubly encodes query string in some cases. #231

Comments

Simon-Will commented Feb 15, 2023 • edited Loading

Function `normalize_url` doubly encodes query string in some cases. #231

Function `normalize_url` doubly encodes query string in some cases. #231

Simon-Will commented Feb 15, 2023 •

edited

Loading