You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
01.01.01.01 -> domain parsed as 01.01.01.01 01.01.01 -> domain parsed as 01.01.01 01.01 -> domain parsed as 01.01 01 -> domain parsed as 01 (output is still correct nonetheless)
0x1.0x1.0x1.0x1 -> domain parsed as 0x1.0x1.0x1.0x1 0x1.0x1.0x1 -> domain parsed as 0x1.0x1.0x1 0x1.0x1 -> domain parsed as 0x1.0x1 0x1 -> domain parsed as 0x1 (output is still correct nonetheless)
Given that tldextract's regex-based ipv4() function only recognizes IPv4 addresses with 4 decimal octets without zero padding, this is probably a bug.
It can be fixed by using socket.inet_pton() in looks_like_ip() instead of socket.inet_aton(). However, it is only supported on Unix/Unix-Like/Windows systems. Some of these systems do not.
A more portable fix would be using ipaddress.IPv4Address, though it is much slower.
If suffix_index == len(labels) == 4, are there any edge cases not covered by IP_RE?
The text was updated successfully, but these errors were encountered:
elliotwutingfeng
changed the title
1,2,3-octet hostnames detected as IPv4 addresses
1,2,3-octet/hexadecimal hostnames detected as IPv4 addresses
May 21, 2023
It can be fixed by using socket.inet_pton() in looks_like_ip() instead of socket.inet_aton(). However, it is only supported on Unix/Unix-Like/Windows systems. Some of these systems do not.
A more portable fix would be using ipaddress.IPv4Address, though it is much slower.
Maybe try socket.inet_pton, and if it's unavailable for the system, fall back to ipaddress.IPv4Address?
…th unicode dots. (#292)
- IPv4 addresses with unicode dots are now recognized. Closes#287
- IPv4 addresses must have 4 decimal octets. Closes#290
---------
Co-authored-by: John Kurkowski <[email protected]>
The following inputs are recognized as IPv4 addresses due to the use of socket.inet_aton().
1.1.1
-> domain parsed as1.1.1
1.1
-> domain parsed as1.1
1
-> domain parsed as1
(output is still correct nonetheless)The above is legacy behavior from UNIX's inet_aton for classful networks, a network addressing architecture made obsolete in 1993.
01.01.01.01
-> domain parsed as01.01.01.01
01.01.01
-> domain parsed as01.01.01
01.01
-> domain parsed as01.01
01
-> domain parsed as01
(output is still correct nonetheless)0x1.0x1.0x1.0x1
-> domain parsed as0x1.0x1.0x1.0x1
0x1.0x1.0x1
-> domain parsed as0x1.0x1.0x1
0x1.0x1
-> domain parsed as0x1.0x1
0x1
-> domain parsed as0x1
(output is still correct nonetheless)Given that tldextract's regex-based ipv4() function only recognizes IPv4 addresses with 4 decimal octets without zero padding, this is probably a bug.
It can be fixed by using socket.inet_pton() in looks_like_ip() instead of socket.inet_aton(). However, it is only supported on Unix/Unix-Like/Windows systems. Some of these systems do not.
A more portable fix would be using ipaddress.IPv4Address, though it is much slower.
If suffix_index == len(labels) == 4, are there any edge cases not covered by IP_RE?
The text was updated successfully, but these errors were encountered: