Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle @ characters in the userinfo section. #62

Closed
wants to merge 3 commits into from
Closed

Handle @ characters in the userinfo section. #62

wants to merge 3 commits into from

Conversation

tomchristie
Copy link
Contributor

This PR allows handling URLs in the form https://[email protected]:[email protected]

Currently the rfc3986 package raises an InvalidAuthority exception when calling authority_info() on a URL formatted in this way. Or returns None when eg. .host/.userinfo are accessed.

Before this change...

>>> u = rfc3986.api.uri_reference("https://[email protected]:[email protected]")
>>> u.authority
'[email protected]:[email protected]'
>>> u.userinfo  # None
>>> u.host  # None

The WHATWG URL spec treats @ in the userinfo section as a non-parser terminating validation error, and URL encodes the character. https://url.spec.whatwg.org/#authority-state

After this change, we follow the WHATWG behavior, URL encoding the character if it incorrectly occurs in the userinfo...

>>> u = rfc3986.api.uri_reference("https://[email protected]:[email protected]")
>>> u.authority
'[email protected]:[email protected]'
>>> u.userinfo
'user%40gmail.com:password'
>>> u.host
'google.com'

Raised via encode/httpx#328

@sigmavirus24
Copy link
Collaborator

I believe you're referring to the "at sign" not ampersand. Also you're asking for auto-normalization. I believe doing:

u = rfc3986.api.uri_reference("https://[email protected]:[email protected]").normalize()

That should do what you want.

@tomchristie tomchristie changed the title Handle ampersand characters in the userinfo section. Handle @ characters in the userinfo section. Jan 14, 2020
@tomchristie
Copy link
Contributor Author

I believe you're referring to the "at sign" not ampersand.

Ug. Brain freeze moment. 🤣

Also you're asking for auto-normalization

Possibly, yes. It's not obvious to me if this "non-terminating validation error" case is the same category as normalization of valid URLs?

I believe doing [...]

The current behavior doesn't capture that behavior right now, no:

>>> rfc3986.api.uri_reference("https://[email protected]:[email protected]").normalize()
URIReference(scheme='https', authority=None, path=None, query=None, fragment=None)

If .normalize() is the right place to handle this, then I think any PR along those lines would need to handle both .uri_reference(...).normalize() and also .iri_reference(...).encode(), since...

>>> rfc3986.api.iri_reference("https://[email protected]:[email protected]").encode()
URIReference(scheme='https', authority=None, path=None, query=None, fragment=None)

@codecov-io
Copy link

codecov-io commented Jan 14, 2020

Codecov Report

Merging #62 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master    #62   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files          13     13           
  Lines         793    795    +2     
=====================================
+ Hits          793    795    +2
Impacted Files Coverage Δ
src/rfc3986/_mixin.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 76e8ffb...1d79e63. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants