-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more restrictions to the host parser #159
Comments
(In reply to zeusex81 from comment #8)
We should consider whether these characters should also be valid in a hostname. |
I noticed Safari doesn't accept '<' or '>' but Firefox does, and Chrome percent-encodes these characters. According to https://tools.ietf.org/html/rfc3986 we should only accept ALPHA DIGIT -._~!$&'()*+,;= or properly percent-encoded values, which would have been decoded at this point. Where did the current list of forbidden characters come from? I would support a change to match rfc3986 more closely by having a list of allowed characters instead of a list of forbidden characters. |
The current list is basically maximally liberal while avoiding deeply problematic code points. I did this because in theory hosts basically have no restrictions on what they can contain. But yeah, we should probably move into the other direction given what user agents already do. Note that since RFC3986 allows percent-encoded characters they do allow basically anything. |
See whatwg/url#159 for context.
I created a test and Safari TP appears to follow the URL Standard, apart from setting to Firefox is wrong for U+000B, U+000C, and Chrome is all over the map. Given that Firefox and Safari TP are this close I'm a little more hesitant to make changes, but I'm open to suggestions. |
Chrome's results do mean that moving to a safelist as @achristensen07 suggests is a potential option, although Chrome does allow a lot more through setters... If we discount Chrome's setter results, the post-IDNA safelist would be a-Z, 0-9, |
Don't let Safari TP sway your decision here. I just blindly copied the spec here. I would still support changing the spec |
The reason I would still support changing the spec is that NSURL's parser follows rfc3986. I would like WebKit's URL parser to not give NSURL any "valid" URLs it considers to be invalid. I'm not sure what would happen if you did a DNS lookup with invalid characters. |
My analysis of Chrome discounted the fact that while Chrome throws for a lot more hosts, it also happily emits percent-encoded hosts sometimes. So If we wanted to make a change, I would remove the Namely: a-Z, 0-9, |
That would be the most conservative we can go with host names (for special schemes). If that sounds reasonable I'll propose a commit to the test and the standard. |
@annevk Not sure I understand your comment on conservatism. It sounds like you're proposing requiring the host component of the authority (for non-special schemes) to follow the LDH rule (plus |
That is a fair read. (It's conservative with respect to how much input ends up being parsed into something. More would end up rejected.) |
@valenting @achristensen07 @sleevi shall we go with the proposal in #159 (comment)? |
👍 The simpler the better. |
@annevk I would say that switching the host parser to observe DNS rules is, right now, for us, a non-goal. It's something that I think we'd be very unlikely to implement in Chrome, in part, because there's more ways than DNS to name things. That's not to say "No, never," but one with known sharp edge cases for our usage, so would require a lot more time and energy to work with internal teams to determine an appropriate suggestion. In this space, at least, the 'obsoleted' IETF RFC was far more lenient with respect to DNS, and as much as possible, I think we'd want to avoid coupling URLs to DNS for the time being, including any format rules. (I'm not sure about any thoughts with respect to whether encoding would be an acceptable compromise) |
So given that non-DNS systems are still in use here and there and that at least Google still needs to support them, I'm not going to change the current approach in the standard until we have more data in some way. If browsers want to be more restrictive in their address bar they can already do so, as long as parsing for |
Closing this per the above comment due to lack of further feedback. #218 tweaks host parsing a little further, but doesn't really increase the number of restrictions. |
See whatwg/url#159 for context.
AFAICT Safari and Chrome already fail to parse this, both when trying to click links to e.g. "http://www.;.com/" and when trying to use the
URL
constructor to create that URL.Edge and Firefox both seem to think it's valid. IE shows an error page but it's not clear to me what it thinks, and it doesn't seem to implement the
URL
constructor.The text was updated successfully, but these errors were encountered: