-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disallow all C0 control and DEL codepoints in hosts #673
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not adding percent-encoding of additional code points to opaque hosts at the same time. I.e., restrict this change to additional failure cases of domains.
What I was envisioning is that we have two constants:
- Forbidden host code points (this can be renamed to forbidden domain code points). We would modify this as you did.
- Forbidden opaque-host code points (this would be the old forbidden host code points).
One thing I'd like to add (before I'm too full of "festive cheer" and forget it): Failing on certain forbidden host code-points can unlock optimisation opportunities. For example, currently when parsing the hostname portion of a URL string, we have to precisely track whether we're inside or outside of a square bracket (in order to determine whether a colon is part of an IPv6 address or is a hostname/port delimiter). This slows down all URL strings. I've been experimenting with an approach that simply splits on the first colon after a closing square bracket. So for the hostname
Currently that's fine (I think?). Since the hostname doesn't begin with an opening bracket, it won't be parsed as IPv6, and ultimately the opaque host parser will reject them both because square brackets are forbidden. If we changed to percent-encode forbidden code-points, this input would potentially be accepted. Perhaps this could be limited to the setter function? |
Ok. So it makes sense to investigate this more thoroughly; how percent coding opaque hosts could/ should interact with the ipv6 address syntax, and make it a separate issue (as was suggested). Maybe we can take care of that first? I’m fine with handling this one first, but I myself just can’t really get behind having two distinct forbidden host codepoints sets… it may just be a minor thing, I realise. But still… Is failing on all controls in opaque hosts really not an option? Maybe there are there solutions? |
I expect we'll take #685 at which point we should probably close this. |
Of course. I’m happy @karwa picked this up. |
Sorry, this was just blocking some interop stuff that I was working on, and since there seemed to be agreement on the idea of blocking C0s and delete, I figured I'd just get it done and take the opportunity to make the tests more comprehensive. |
Oh no it’s excellent. I felt like I was blocking the progress here, and now this issue is taken care of and the opaque hosts can maybe be looked at later. I’ll close the PR. |
A first pass at fixing #627 and #319
This adds the C0 controls and delete to the forbidden host codepoints, as discussed in #627.
It also adapts the opaque-host-parser to percent-encode rather than fail on these code points.
For this I added an opaque-host percent encode set that refers to the forbidden host codepoints.
Please review.
Preview | Diff