Add more restrictions to the host parser #159

gijsk · 2016-11-02T15:48:52Z

AFAICT Safari and Chrome already fail to parse this, both when trying to click links to e.g. "http://www.;.com/" and when trying to use the URL constructor to create that URL.

Edge and Firefox both seem to think it's valid. IE shows an error page but it's not clear to me what it thinks, and it doesn't seem to implement the URL constructor.

The text was updated successfully, but these errors were encountered:

valenting · 2016-11-07T00:57:50Z

(In reply to zeusex81 from comment #8)

";" was just an example.
here's the characters that only firefox accepts :
http://www.$*+=!;,<>|&~^`'"(){}.com/

We should consider whether these characters should also be valid in a hostname.

achristensen07 · 2016-12-01T19:56:11Z

I noticed Safari doesn't accept '<' or '>' but Firefox does, and Chrome percent-encodes these characters.

According to https://tools.ietf.org/html/rfc3986 we should only accept ALPHA DIGIT -._~!$&'()*+,;= or properly percent-encoded values, which would have been decoded at this point. Where did the current list of forbidden characters come from? I would support a change to match rfc3986 more closely by having a list of allowed characters instead of a list of forbidden characters.

annevk · 2016-12-09T01:53:21Z

The current list is basically maximally liberal while avoiding deeply problematic code points. I did this because in theory hosts basically have no restrictions on what they can contain. But yeah, we should probably move into the other direction given what user agents already do.

Note that since RFC3986 allows percent-encoded characters they do allow basically anything.

See whatwg/url#159 for context.

annevk · 2017-01-04T13:45:49Z

I created a test and Safari TP appears to follow the URL Standard, apart from setting to # and ? not failing properly.

Firefox is wrong for U+000B, U+000C, and %.

Chrome is all over the map. Given that Firefox and Safari TP are this close I'm a little more hesitant to make changes, but I'm open to suggestions.

annevk · 2017-01-04T13:50:37Z

Chrome's results do mean that moving to a safelist as @achristensen07 suggests is a potential option, although Chrome does allow a lot more through setters... If we discount Chrome's setter results, the post-IDNA safelist would be a-Z, 0-9, +, -, ., and _, which is fairly small.

achristensen07 · 2017-01-04T14:44:15Z

Don't let Safari TP sway your decision here. I just blindly copied the spec here. I would still support changing the spec

achristensen07 · 2017-01-04T15:20:09Z

The reason I would still support changing the spec is that NSURL's parser follows rfc3986. I would like WebKit's URL parser to not give NSURL any "valid" URLs it considers to be invalid. I'm not sure what would happen if you did a DNS lookup with invalid characters.

annevk · 2017-01-04T15:35:59Z

My analysis of Chrome discounted the fact that while Chrome throws for a lot more hosts, it also happily emits percent-encoded hosts sometimes. So https://}x/ becomes https://%7Dx/ and is not treated as an error.

If we wanted to make a change, I would remove the + from the earlier safelist (fails in stable Safari at least) which leaves us with the ASCII code points I am certain of we need to support.

Namely: a-Z, 0-9, -, ., and _ (only _ is not technically allowed in domain names, but is sometimes used in hostnames around the web and therefore browsers have to special case there HTTPS code around it).

annevk · 2017-01-04T15:37:14Z

That would be the most conservative we can go with host names (for special schemes). If that sounds reasonable I'll propose a commit to the test and the standard.

sleevi · 2017-01-04T21:20:01Z

@annevk Not sure I understand your comment on conservatism.

It sounds like you're proposing requiring the host component of the authority (for non-special schemes) to follow the LDH rule (plus _ for historic compat) of DNS? And to not allow any escape sequences? Is that a fair read?

annevk · 2017-01-05T09:13:56Z

That is a fair read. (It's conservative with respect to how much input ends up being parsed into something. More would end up rejected.)

annevk · 2017-01-10T13:23:09Z

@valenting @achristensen07 @sleevi shall we go with the proposal in #159 (comment)?

valenting · 2017-01-10T16:34:56Z

👍 The simpler the better.

sleevi · 2017-01-10T19:33:53Z

@annevk I would say that switching the host parser to observe DNS rules is, right now, for us, a non-goal. It's something that I think we'd be very unlikely to implement in Chrome, in part, because there's more ways than DNS to name things. That's not to say "No, never," but one with known sharp edge cases for our usage, so would require a lot more time and energy to work with internal teams to determine an appropriate suggestion.

In this space, at least, the 'obsoleted' IETF RFC was far more lenient with respect to DNS, and as much as possible, I think we'd want to avoid coupling URLs to DNS for the time being, including any format rules.

(I'm not sure about any thoughts with respect to whether encoding would be an acceptable compromise)

annevk · 2017-01-24T13:16:13Z

So given that non-DNS systems are still in use here and there and that at least Google still needs to support them, I'm not going to change the current approach in the standard until we have more data in some way.

If browsers want to be more restrictive in their address bar they can already do so, as long as parsing for <a> and such is not affected.

annevk · 2017-02-08T13:21:05Z

Closing this per the above comment due to lack of further feedback. #218 tweaks host parsing a little further, but doesn't really increase the number of restrictions.

See whatwg/url#159 for context.

annevk changed the title ~~Semi-colon (;) should be illegal in URL hostname parsing~~ Add more restrictions to the host parser Dec 19, 2016

annevk mentioned this issue Dec 19, 2016

Disallow "!" in host #98

Closed

annevk added topic: parser and removed topic: parser labels Dec 20, 2016

annevk added a commit to web-platform-tests/wpt that referenced this issue Jan 4, 2017

URL: test the host parser

15aa999

See whatwg/url#159 for context.

annevk mentioned this issue Jan 4, 2017

URL: test the host parser web-platform-tests/wpt#4504

Closed

annevk mentioned this issue Jan 23, 2017

Forbidden host code points #214

Closed

annevk closed this as completed Feb 8, 2017

annevk added a commit to web-platform-tests/wpt that referenced this issue Feb 9, 2017

URL: test the host parser

ddb154d

See whatwg/url#159 for context.

annevk mentioned this issue Jun 8, 2018

DNS (and other naming systems) vs the host parser #397

Open

annevk mentioned this issue Nov 29, 2024

URL: Update IdnaTestV2 to UTS46 16.0.0 web-platform-tests/wpt#48301

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more restrictions to the host parser #159

Add more restrictions to the host parser #159

gijsk commented Nov 2, 2016

valenting commented Nov 7, 2016

achristensen07 commented Dec 1, 2016

annevk commented Dec 9, 2016

annevk commented Jan 4, 2017 •

edited

Loading

annevk commented Jan 4, 2017

achristensen07 commented Jan 4, 2017

achristensen07 commented Jan 4, 2017

annevk commented Jan 4, 2017

annevk commented Jan 4, 2017

sleevi commented Jan 4, 2017

annevk commented Jan 5, 2017

annevk commented Jan 10, 2017

valenting commented Jan 10, 2017

sleevi commented Jan 10, 2017

annevk commented Jan 24, 2017

annevk commented Feb 8, 2017

Add more restrictions to the host parser #159

Add more restrictions to the host parser #159

Comments

gijsk commented Nov 2, 2016

valenting commented Nov 7, 2016

achristensen07 commented Dec 1, 2016

annevk commented Dec 9, 2016

annevk commented Jan 4, 2017 • edited Loading

annevk commented Jan 4, 2017

achristensen07 commented Jan 4, 2017

achristensen07 commented Jan 4, 2017

annevk commented Jan 4, 2017

annevk commented Jan 4, 2017

sleevi commented Jan 4, 2017

annevk commented Jan 5, 2017

annevk commented Jan 10, 2017

valenting commented Jan 10, 2017

sleevi commented Jan 10, 2017

annevk commented Jan 24, 2017

annevk commented Feb 8, 2017

annevk commented Jan 4, 2017 •

edited

Loading