Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPv6 addresses are not handled #263

Closed
ohad-ivix opened this issue Apr 24, 2022 · 6 comments · Fixed by #298
Closed

IPv6 addresses are not handled #263

ohad-ivix opened this issue Apr 24, 2022 · 6 comments · Fixed by #298

Comments

@ohad-ivix
Copy link

For URLs using IPv4 addresses, the host address gets extracted correctly using .domain but .fqdn gives the empty string:

>>> tldextract.extract("https://127.0.0.1:1234/foobar").domain
'127.0.0.1'
>>> tldextract.extract("https://127.0.0.1:1234/foobar").fqdn
''

For URLs using IPv6 addresses, neither method extracts the host address correctly:

>>> tldextract.extract("https://[::1]:1234/foobar").domain
'['
>>> tldextract.extract("https://[::1]:1234/foobar").fqdn
'' 
>>> tldextract.extract("https://[FEC0:0000:0000:0000:0000:0000:0000:0001]:1234/foobar").domain
'[FEC0'
>>> tldextract.extract("https://[FEC0:0000:0000:0000:0000:0000:0000:0001]:1234/foobar").fqdn
''

This was tested using tldextract version 3.2.1 on Python 3.9.12

@john-kurkowski john-kurkowski changed the title IP addresses are not handled IPv6 addresses are not handled May 3, 2022
@john-kurkowski
Copy link
Owner

Related: #74, #156

@john-kurkowski
Copy link
Owner

Agreed that those IPv6 outputs are garbage, like returning '[' or '[FEC0' for the domain. IPv6 has come up a couple times before. This library supports IPv4. It seems not too much extra work to support IPv6. I'm for it. 👍

@john-kurkowski
Copy link
Owner

Re: IPv4, your quoted output is intended.

Think of domain as the body of the IP address's ExtractResult; an IP has no subdomain or suffix.

fqdn is a computed property, "if there is a proper domain/suffix," or else it's the falsy, empty string you observed. If the extraction is an IP, do you think it makes sense for the computed property to contain some other (fully qualified domain) "name" for a number?

@ohad-ivix
Copy link
Author

Ah, I was just naively assuming that "domain" < "fully-qualified domain". My bad for taking a random code snippet from SE and not really reading the docstring 😅.
I did read the README, so maybe adding the examples there would have helped in my case…

I guess I wouldn't change how fqdn behaves in the case of IPs either. It's probably best to think of localhost as interchangeable with IPv4 127.0.0.1 or IPv6 [::1], so the empty string would be right here too.

@ohad-ivix
Copy link
Author

ohad-ivix commented May 3, 2022

What I needed in my use case was a way to get just the hostname (without user credentials or port number) that would work for all URLs, and fqdn kinda looked like what I wanted. Currently I'm doing post-processing on the result of urllib.parse.urlsplit instead.

Reading the code, netloc seems like what I was looking for, but I'm not sure if exposing it as a property is really in the mission statement for this library.

@john-kurkowski
Copy link
Owner

I hear you on the README examples. 👍

What I needed in my use case was a way to get just the hostname (without user credentials or port number) that would work for all URLs … Currently I'm doing post-processing on the result of urllib.parse.urlsplit instead.

Yup, urllib.parse.urlsplit(some_url).netloc is what you're looking for in the standard library. In this library, domain is equivalent for IPv4 (or, more generally, rejoining the entire tuple). Just need a fix for IPv6 to be treated the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants