Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move from urlparse to parse_url for prepending schemes #5917

Merged
merged 1 commit into from
Dec 29, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 15 additions & 6 deletions requests/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
import zipfile
from collections import OrderedDict
from urllib3.util import make_headers
from urllib3.util import parse_url

from .__version__ import __version__
from . import certs
Expand Down Expand Up @@ -963,15 +964,23 @@ def prepend_scheme_if_needed(url, new_scheme):

:rtype: str
"""
scheme, netloc, path, params, query, fragment = urlparse(url, new_scheme)

# urlparse is a finicky beast, and sometimes decides that there isn't a
# netloc present. Assume that it's being over-cautious, and switch netloc
# and path if urlparse decided there was no netloc.
parsed = parse_url(url)
scheme, auth, host, port, path, query, fragment = parsed

# A defect in urlparse determines that there isn't a netloc present in some
# urls. We previously assumed parsing was overly cautious, and swapped the
# netloc and path. Due to a lack of tests on the original defect, this is
# maintained with parse_url for backwards compatibility.
netloc = parsed.netloc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhhh, what? This doesn't make any sense to me, is this something that's only needed for urlparse?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is only a quirk for urlparse. I'd thought I repro'd with parse_url initially, but I'm unable to now.

I'll reword the whole comment for clarity. I'm not confident the test suite covers all of the initial issue though. I think I'm still in favor of leaving the swap in place as a fallback for anything we're missing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sethmlarson I've reworded things to be a bit clearer.

if not netloc:
netloc, path = path, netloc

return urlunparse((scheme, netloc, path, params, query, fragment))
if scheme is None:
scheme = new_scheme
if path is None:
path = ''

return urlunparse((scheme, netloc, path, '', query, fragment))


def get_auth_from_url(url):
Expand Down
1 change: 1 addition & 0 deletions tests/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -601,6 +601,7 @@ def test_parse_header_links(value, expected):
'value, expected', (
('example.com/path', 'http://example.com/path'),
('//example.com/path', 'http://example.com/path'),
('example.com:80', 'http://example.com:80'),
))
def test_prepend_scheme_if_needed(value, expected):
assert prepend_scheme_if_needed(value, 'http') == expected
Expand Down