IPv6 addresses are not handled #263

ohad-ivix · 2022-04-24T14:13:52Z

For URLs using IPv4 addresses, the host address gets extracted correctly using .domain but .fqdn gives the empty string:

>>> tldextract.extract("https://127.0.0.1:1234/foobar").domain
'127.0.0.1'
>>> tldextract.extract("https://127.0.0.1:1234/foobar").fqdn
''

For URLs using IPv6 addresses, neither method extracts the host address correctly:

>>> tldextract.extract("https://[::1]:1234/foobar").domain
'['
>>> tldextract.extract("https://[::1]:1234/foobar").fqdn
'' 
>>> tldextract.extract("https://[FEC0:0000:0000:0000:0000:0000:0000:0001]:1234/foobar").domain
'[FEC0'
>>> tldextract.extract("https://[FEC0:0000:0000:0000:0000:0000:0000:0001]:1234/foobar").fqdn
''

This was tested using tldextract version 3.2.1 on Python 3.9.12

The text was updated successfully, but these errors were encountered:

john-kurkowski · 2022-05-03T01:19:37Z

Related: #74, #156

john-kurkowski · 2022-05-03T01:22:56Z

Agreed that those IPv6 outputs are garbage, like returning '[' or '[FEC0' for the domain. IPv6 has come up a couple times before. This library supports IPv4. It seems not too much extra work to support IPv6. I'm for it. 👍

john-kurkowski · 2022-05-03T01:36:09Z

Re: IPv4, your quoted output is intended.

Think of domain as the body of the IP address's ExtractResult; an IP has no subdomain or suffix.

fqdn is a computed property, "if there is a proper domain/suffix," or else it's the falsy, empty string you observed. If the extraction is an IP, do you think it makes sense for the computed property to contain some other (fully qualified domain) "name" for a number?

ohad-ivix · 2022-05-03T08:57:32Z

Ah, I was just naively assuming that "domain" < "fully-qualified domain". My bad for taking a random code snippet from SE and not really reading the docstring 😅.
I did read the README, so maybe adding the examples there would have helped in my case…

I guess I wouldn't change how fqdn behaves in the case of IPs either. It's probably best to think of localhost as interchangeable with IPv4 127.0.0.1 or IPv6 [::1], so the empty string would be right here too.

ohad-ivix · 2022-05-03T08:58:50Z

What I needed in my use case was a way to get just the hostname (without user credentials or port number) that would work for all URLs, and fqdn kinda looked like what I wanted. Currently I'm doing post-processing on the result of urllib.parse.urlsplit instead.

Reading the code, netloc seems like what I was looking for, but I'm not sure if exposing it as a property is really in the mission statement for this library.

john-kurkowski · 2022-05-03T18:36:16Z

I hear you on the README examples. 👍

What I needed in my use case was a way to get just the hostname (without user credentials or port number) that would work for all URLs … Currently I'm doing post-processing on the result of urllib.parse.urlsplit instead.

Yup, urllib.parse.urlsplit(some_url).netloc is what you're looking for in the standard library. In this library, domain is equivalent for IPv4 (or, more generally, rejoining the entire tuple). Just need a fix for IPv6 to be treated the same.

john-kurkowski changed the title ~~IP addresses are not handled~~ IPv6 addresses are not handled May 3, 2022

elliotwutingfeng mentioned this issue Jul 7, 2023

Add support for IPv6 addresses #298

Merged

john-kurkowski closed this as completed in #298 Jul 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPv6 addresses are not handled #263

IPv6 addresses are not handled #263

ohad-ivix commented Apr 24, 2022

john-kurkowski commented May 3, 2022

john-kurkowski commented May 3, 2022

john-kurkowski commented May 3, 2022

ohad-ivix commented May 3, 2022

ohad-ivix commented May 3, 2022 •

edited

Loading

john-kurkowski commented May 3, 2022

IPv6 addresses are not handled #263

IPv6 addresses are not handled #263

Comments

ohad-ivix commented Apr 24, 2022

john-kurkowski commented May 3, 2022

john-kurkowski commented May 3, 2022

john-kurkowski commented May 3, 2022

ohad-ivix commented May 3, 2022

ohad-ivix commented May 3, 2022 • edited Loading

john-kurkowski commented May 3, 2022

ohad-ivix commented May 3, 2022 •

edited

Loading