Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allowing bidi domain names to start with a digit #496

Closed
wants to merge 1 commit into from

Conversation

rtf-const
Copy link

@rtf-const rtf-const commented May 4, 2019

This pull request resolves the issue #489

Currently rust-url marks as invalid urls that contains bidirectional characters (domains that contain symbols both in RLT and LTR encoding) if they contains segments starting with a number.
As an example URL
http://mail.163.com.xn----9mcjf9b4dbm09f.com/iloystgnjfrgthteawvo/indexx.php
Attept to parse that url will return an error "invalid international domain name", while that url is valid and existed (url contained some malitious content and doesn't exist anymore)
While urls http://mail.com.xn----9mcjf9b4dbm09f.com/iloystgnjfrgthteawvo/indexx.php as well as http://mail.163.com/iloystgnjfrgthteawvo/indexx.php are considered as OK.
I guess there is some contradiction bettween different RFCs. This crate implements bidi (bidirectional unicode chars rules) check based on RFC-5893 https://tools.ietf.org/html/rfc5893#section-2 whicth says that the first label in the segment name must be the left-to-right char, riht-to-left-char or "Arabic letter" but not "European Number"
While RFC-1123 from 1989 that specify "Requirements for Internet Hosts" explicily says that a digit is a permited domain name start https://tools.ietf.org/html/rfc1123#section-2

the restriction on the first character is relaxed to allow either a
letter or a digit. Host software MUST support this more liberal
syntax.

So, using using number starting subparts inside bidirectional subdomains sould be allowed, otherwise we sholud treat any number-started domain name (for example http://37signals.com/) as invalid


This change is Reviewable

@rtf-const
Copy link
Author

It's my fault I wrongly interpreted RFC

@rtf-const rtf-const closed this May 5, 2019
@SimonSapin
Copy link
Member

Please note that this library implements the specification at https://url.spec.whatwg.org/ (and specifically https://url.spec.whatwg.org/#concept-domain-to-ascii, which is defined based on https://www.unicode.org/reports/tr46/#ToASCII).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants