allowing bidi domain names to start with a digit #496
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request resolves the issue #489
Currently rust-url marks as invalid urls that contains bidirectional characters (domains that contain symbols both in RLT and LTR encoding) if they contains segments starting with a number.
As an example URL
http://mail.163.com.xn----9mcjf9b4dbm09f.com/iloystgnjfrgthteawvo/indexx.php
Attept to parse that url will return an error "invalid international domain name", while that url is valid and existed (url contained some malitious content and doesn't exist anymore)
While urls http://mail.com.xn----9mcjf9b4dbm09f.com/iloystgnjfrgthteawvo/indexx.php as well as http://mail.163.com/iloystgnjfrgthteawvo/indexx.php are considered as OK.
I guess there is some contradiction bettween different RFCs. This crate implements bidi (bidirectional unicode chars rules) check based on RFC-5893 https://tools.ietf.org/html/rfc5893#section-2 whicth says that the first label in the segment name must be the left-to-right char, riht-to-left-char or "Arabic letter" but not "European Number"
While RFC-1123 from 1989 that specify "Requirements for Internet Hosts" explicily says that a digit is a permited domain name start https://tools.ietf.org/html/rfc1123#section-2
So, using using number starting subparts inside bidirectional subdomains sould be allowed, otherwise we sholud treat any number-started domain name (for example http://37signals.com/) as invalid
This change is