Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url.domainToASCII inconsistent with WHATWG spec? #41343

Closed
armanbilge opened this issue Dec 28, 2021 · 9 comments
Closed

url.domainToASCII inconsistent with WHATWG spec? #41343

armanbilge opened this issue Dec 28, 2021 · 9 comments
Labels
whatwg-url Issues and PRs related to the WHATWG URL implementation.

Comments

@armanbilge
Copy link

armanbilge commented Dec 28, 2021

Version

v17.3.0

Platform

Darwin new-host-3.home 20.6.0 Darwin Kernel Version 20.6.0: Tue Oct 12 18:33:42 PDT 2021; root:xnu-7195.141.8~1/RELEASE_X86_64 x86_64

Subsystem

No response

What steps will reproduce the bug?

url.domainToASCII('42')

How often does it reproduce? Is there a required condition?

No response

What is the expected behavior?

My expected output is '42'.

This is based on my understanding of the WHATWG spec: https://url.spec.whatwg.org/#idna

Which in particular says:

If beStrict is false, domain is an ASCII string, and strictly splitting domain on U+002E (.) does not produce any item that starts with "xn--", this step is equivalent to ASCII lowercasing domain.

What do you see instead?

'0.0.0.42'

Additional information

I suspect url.domainToUnicode has the same problem.

@VoltrexKeyva VoltrexKeyva added the whatwg-url Issues and PRs related to the WHATWG URL implementation. label Dec 28, 2021
@MoonBall
Copy link
Member

MoonBall commented Jan 1, 2022

@armanbilge Node.js try to resovle the single number to a ipv4 address by the code. So input '42' is equivalent to input '0.0.0.42'.

I think this is a feature, not a bug. Since '42' is not a valid domain, but it will be valid if it represents a ipv4 address number.

Another tip: Node.js's url.domainToASCII(domain) maybe not conform to url spec. From Node.js's document, the url spec is not mentioned here.

Returns the Punycode ASCII serialization of the domain.

@armanbilge
Copy link
Author

@MoonBall thanks for looking into this.

I think this is a feature, not a bug. Since '42' is not a valid domain, but it will be valid if it represents a ipv4 address number.

That may very well be the case, but the docs currently say:

Returns the Punycode ASCII serialization of the domain. If domain is an invalid domain, the empty string is returned.

https://nodejs.org/dist/latest/docs/api/url.html#urldomaintoasciidomain

Furthermore, this behavior is inconsistent with the spec and many (all?) other implementations of domainToASCII in other programming languages. So at the very least, this is extremely surprising and should be documented as such.

Another tip: Node.js's url.domainToASCII(domain) maybe not conform to url spec. From Node.js's document, the url spec is not mentioned here.

I may have misunderstood, but the table-of-contents appears to list url.domainToASCII methods under the "The WHATWG URL API" section in the docs: https://nodejs.org/dist/latest/docs/api/url.html

For the record, punycode.toASCII('42') will return '42'.

@MoonBall
Copy link
Member

MoonBall commented Jan 2, 2022

I am wrong. The URL module conforms to url spec of whatwg.

I may have misunderstood, but the table-of-contents appears to list url.domainToASCII methods under the "The WHATWG URL API" section in the docs: https://nodejs.org/dist/latest/docs/api/url.html

I find that Chrome also treat a number to a valid ipv4 address. Maybe it's a conventional implement, even though the ipv4 spec doesn't include it.

image

@MoonBall
Copy link
Member

MoonBall commented Jan 2, 2022

Need a more professional guy to answer if '40' should be a valid ipv4 address. 😊

@armanbilge
Copy link
Author

armanbilge commented Jan 2, 2022

Right, but in those screenshots you are invoking the new URL constructor. Apologies if I'm missing something, but how does this relate to domainToASCII?

@MoonBall
Copy link
Member

MoonBall commented Jan 2, 2022

Right, but in those screenshots you are invoking the new URL constructor. Apologies if I'm missing something, but how does this relate to domainToASCII?

Since I don't find domainToASCII in Chrome, but new URL will execute algorithm of domain to ascii .
image

@armanbilge
Copy link
Author

So I looked into this a bit more. Actually, it seems these methods were removed from the WHATWG spec in:

It's very possible that Node.js is the only "real" implementation of these methods in the wild :) but this means it is difficult/impossible to find another WHATWG implementation of these methods to compare to. Also, there's the philosophical question of whether it makes sense to change the implementation in Node.js to correctly adhere to a spec that's been retired/deprecated (assuming that there even is a bug here, which we have yet to establish anyway).

@annevk
Copy link

annevk commented Jan 3, 2022

As originally defined these methods would go through https://url.spec.whatwg.org/#concept-host-parser first, which does the number-to-IPv4 conversion. So this seems correct.

@armanbilge
Copy link
Author

@annevk thank you for clarifying this, much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
whatwg-url Issues and PRs related to the WHATWG URL implementation.
Projects
None yet
Development

No branches or pull requests

4 participants