Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for native URL parsing if node API is unavailable #149

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jesseditson
Copy link

Fixes #148

@@ -1,7 +1,19 @@

var URL = require('url');
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the intent would be clearer to rather test if the global URL is set and require it otherwise?

Or we could also rely directly on it since it is available since node@10.

What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense - you'd probably know better than me if there are folks depending on this using ancient node versions, but as long as this came in a major bump IMO it'd be simplest to just use the global URL since it is available in all maintenance+ releases of node

@@ -97,7 +109,7 @@ module.exports = function extractHostname(value) {
url = '//' + url;
}

var parts = URL.parse(url, null, true);
var parts = parseURL(url);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

Also discovers & resolves a few URL related issues:
- WHATWG parses any potentially valid integer string as an IP - add explicit IP detection to avoid this case and preserve existing parsing functionality
- WHATWG does not fuzzy parse ipv6 addresses, so add simple regex-based ipv6 extraction to handle this case and preserve existing functionality
- WHATWG does not parse scheme-less URLs so change prefixing behavior to include a scheme when falling back to URL-based parsing
Copy link
Author

@jesseditson jesseditson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WHATWG URL parsing turns out to be quite subtly different. I did a pass on the unit tests to get it all passing but there are potentially some concessions here that are undesirable (mainly using some regexes to try to align with the legacy parsing behavior).

Comment on lines -92 to 82
var needsTrimming = checkTrimmingNeeded(url);
if (needsTrimming) {
url = url.trim();
}

var needsLowerCase = checkLowerCaseNeeded(url);
if (needsLowerCase) {
url = url.toLowerCase();
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff is a little funky, but I just moved the lowercase check to before the ipv6 check since the later IP checks expect a lowercased string.

Comment on lines -109 to +101
url = '//' + url;
url = 'https://' + url;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WHATWG URL does not support scheme-less URLs so I had to add a scheme to get it to parse.

Comment on lines +111 to +113
if (ipLikeRE.test(parts.hostname) && !ipLikeRE.test(value)) {
return value;
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WHATWG URL will parse the string http://42 as the URL http://0.0.0.42, which is not how the legacy URL parser worked (it would fail to parse). This just aligns them. A slight difference in this is that when value is not a string, we will no longer coerce it to one when returning it. I believe this is closer to the documented API but is technically a breaking change. LMK if you'd like me to restore the test and the string-coercion.

expect(tld.extractHostname(42)).to.equal('42');
expect(tld.extractHostname(42)).to.equal(42);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above - "return the initial value" feels like it should return an integer, not a string. But this is a change in behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support environments other than node.js that have a global URL class
2 participants