-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perhaps do not apply ToASCII for ASCII-only input #267
Comments
FYI, Node.js after nodejs/node#11549 (released in v7.7.2) errors out on all test cases in https://annevankesteren.nl/2017/02/idna-toascii-differences. |
FYI: @rmisev discovered some issues with ASCII fast paths: #309 (comment). An ASCII fast path is probably still an acceptable intermediate solution though, but I'd rather define something better in the standard. |
So my recommendation is that we go with a fourth solution:
In particular make use of the new CheckHyphens proposed in http://www.unicode.org/reports/tr46/tr46-18.html#ToASCII. I'm not entirely sure how to update the standard though potentially we could refer to the proposed version. I think we've done that in the past as well. I filed these bugs (I hope someone can help with Node.js, they seem well represented):
I wrote these tests (quite a few from @rmisev): web-platform-tests/wpt#5976. |
Sure 😆 To be clear, am I correct in saying this decision means: <li><p>Let <var>result</var> be the result of running <a abstract-op lt=ToASCII>Unicode ToASCII</a> with
<i>domain_name</i> set to <var>domain</var>, <i>UseSTD3ASCIIRules</i> set to false,
- <i>processing_option</i> set to <i>Nontransitional_Processing</i>, and <i>VerifyDnsLength</i> set
- to false.
+ <i>processing_option</i> set to <i>Nontransitional_Processing</i>, <i>VerifyDnsLength</i> set
+ to false, and <i>CheckHyphens</i> set to false. If so, we have nodejs/node#12966 already which would fulfill this. |
Yeah, we might also make CheckJoiners false, but that's a separate discussion. |
Fixes #53 and fixes #267 by no longer breaking on on hyphens in the 3rd and 4th position of a domain label. This is known to break YouTube: r3---sn-2gb7ln7k.googlevideo.com. This is done by setting the proposed CheckHyphens flag to false. Fixes #110 by clarifying that BIDI and CONTEXTJ checks are to be done by setting the proposed CheckBidi and CheckJoiners flags to true. Follow-up #313 is filed to remove the proposed bits once Unicode is updated.
Tests: web-platform-tests/wpt#5976. Fixes #53 and fixes #267 by no longer breaking on on hyphens in the 3rd and 4th position of a domain label. This is known to break YouTube: r3---sn-2gb7ln7k.googlevideo.com. This is done by setting the proposed CheckHyphens flag to false. Fixes #110 by clarifying that BIDI and CONTEXTJ checks are to be done by setting the proposed CheckBidi and CheckJoiners flags to true. Follow-up #313 is filed to remove the proposed bits once Unicode is updated.
Many implementations currently skip ToASCII if domain is ASCII-only, but as discovered in [1] and [2], this can result in some undesirable behavior. Adding a note prevents implementors from making the mistake of thinking ToASCII is a no-op if the input is ASCII, and also provides a recommendation on how to properly optimize the ToASCII step. [1]: whatwg#267 [2]: whatwg#309 (comment)
Many implementations currently skip ToASCII if domain is ASCII-only, but as discovered in [1] and [2], this can result in some undesirable behavior. Adding a note prevents implementors from making the mistake of thinking ToASCII is a no-op if the input is ASCII, and also provides a recommendation on how to properly optimize the ToASCII step. [1]: whatwg#267 [2]: whatwg#309 (comment)
Many implementations currently skip ToASCII if domain is ASCII-only, but as discovered in [1] and [2], this can result in some undesirable behavior. Adding a note prevents implementors from making the mistake of thinking ToASCII is a no-op if the input is ASCII, and also provides a recommendation on how to properly optimize the ToASCII step. [1]: whatwg#267 [2]: whatwg#309 (comment)
Many implementations currently skip ToASCII if domain is ASCII-only, but as discovered in #267 and #309 (comment), this can result in some undesirable behavior. Adding a note prevents implementers from making the mistake of thinking ToASCII is a no-op if the input is ASCII, and also provides a recommendation on how to properly optimize the ToASCII step.
See https://lists.w3.org/Archives/Public/www-archive/2017Feb/0006.html
The possible solutions are:
The text was updated successfully, but these errors were encountered: