You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The function normalize_url does not correctly handle some query strings. From what I understand, the function is supposed to rearrange the query parameters so that URLs with different order of query parameters match.
However, the combination of urlencode and URL.with_query doubly encodes some characters. I think that it's exactly those characters that can legally occur in a query string, but not in any URL part, such as : and /. urlencode of course will encode them and URL.with_query will then pick up the % that the urlencode unnecessarily produced.
I propose to just drop the parse_qsl and urlencode resulting in the simplified_normalize_url in the following example code:
fromyarlimportURLfromaioresponses.compatimportnormalize_url# %3A = : (Does NOT need to be encoded in query string)# %2F = / (Does NOT need to be encoded in query string)# %26 = & (MUST be encoded in query string)url_str='https://example.com/?var=foo%3Abar%2Fbaz%26gaz'defsimplified_normalize_url(url):
url=URL(url)
returnurl.with_query(sorted(url.query.items()))
print(f'Before: {url_str}')
print(f'After normalize_url: {normalize_url(url_str)}')
print(f'After simplified_normalize_url: {simplified_normalize_url(url_str)}')
Running it on my machine (Xubuntu 22.04.1 with Python 3.9.6) produces the following output:
Before: https://example.com/?var=foo%3Abar%2Fbaz%26gaz
After normalize_url: https://example.com/?var=foo%253Abar%252Fbaz%2526gaz
After simplified_normalize_url: https://example.com/?var=foo:bar/baz%26gaz
I can create a small pull request with tests for this tomorrow.
The text was updated successfully, but these errors were encountered:
The function normalize_url does not correctly handle some query strings. From what I understand, the function is supposed to rearrange the query parameters so that URLs with different order of query parameters match.
However, the combination of
urlencode
andURL.with_query
doubly encodes some characters. I think that it's exactly those characters that can legally occur in a query string, but not in any URL part, such as:
and/
.urlencode
of course will encode them andURL.with_query
will then pick up the%
that theurlencode
unnecessarily produced.I propose to just drop the
parse_qsl
andurlencode
resulting in thesimplified_normalize_url
in the following example code:Running it on my machine (Xubuntu 22.04.1 with Python 3.9.6) produces the following output:
I can create a small pull request with tests for this tomorrow.
The text was updated successfully, but these errors were encountered: