Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function normalize_url doubly encodes query string in some cases. #231

Open
Simon-Will opened this issue Feb 15, 2023 · 0 comments · May be fixed by #232
Open

Function normalize_url doubly encodes query string in some cases. #231

Simon-Will opened this issue Feb 15, 2023 · 0 comments · May be fixed by #232

Comments

@Simon-Will
Copy link

Simon-Will commented Feb 15, 2023

The function normalize_url does not correctly handle some query strings. From what I understand, the function is supposed to rearrange the query parameters so that URLs with different order of query parameters match.

However, the combination of urlencode and URL.with_query doubly encodes some characters. I think that it's exactly those characters that can legally occur in a query string, but not in any URL part, such as : and /. urlencode of course will encode them and URL.with_query will then pick up the % that the urlencode unnecessarily produced.

I propose to just drop the parse_qsl and urlencode resulting in the simplified_normalize_url in the following example code:

from yarl import URL
from aioresponses.compat import normalize_url

# %3A = :  (Does NOT need to be encoded in query string)
# %2F = /  (Does NOT need to be encoded in query string)
# %26 = &  (MUST be encoded in query string)
url_str = 'https://example.com/?var=foo%3Abar%2Fbaz%26gaz'


def simplified_normalize_url(url):
    url = URL(url)
    return url.with_query(sorted(url.query.items()))


print(f'Before: {url_str}')
print(f'After normalize_url: {normalize_url(url_str)}')
print(f'After simplified_normalize_url: {simplified_normalize_url(url_str)}')

Running it on my machine (Xubuntu 22.04.1 with Python 3.9.6) produces the following output:

Before: https://example.com/?var=foo%3Abar%2Fbaz%26gaz
After normalize_url: https://example.com/?var=foo%253Abar%252Fbaz%2526gaz
After simplified_normalize_url: https://example.com/?var=foo:bar/baz%26gaz

I can create a small pull request with tests for this tomorrow.

@Simon-Will Simon-Will linked a pull request Feb 16, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant