Disallow all C0 control and DEL codepoints in hosts #673

alwinb · 2021-11-22T13:02:56Z

A first pass at fixing #627 and #319

At least two implementers are interested (and none opposed):
- Safari
- Chrome already implement this behaviour
Tests are written and can be reviewed and commented upon at:
- …
Implementation bugs are filed:
- Chrome: …
- Firefox: …
- Safari: …

This adds the C0 controls and delete to the forbidden host codepoints, as discussed in #627.
It also adapts the opaque-host-parser to percent-encode rather than fail on these code points.

For this I added an opaque-host percent encode set that refers to the forbidden host codepoints.
Please review.

Preview | Diff

Fixes #627

annevk

I would prefer not adding percent-encoding of additional code points to opaque hosts at the same time. I.e., restrict this change to additional failure cases of domains.

What I was envisioning is that we have two constants:

Forbidden host code points (this can be renamed to forbidden domain code points). We would modify this as you did.
Forbidden opaque-host code points (this would be the old forbidden host code points).

karwa · 2021-12-24T10:09:39Z

One thing I'd like to add (before I'm too full of "festive cheer" and forget it):

Failing on certain forbidden host code-points can unlock optimisation opportunities. For example, currently when parsing the hostname portion of a URL string, we have to precisely track whether we're inside or outside of a square bracket (in order to determine whether a colon is part of an IPv6 address or is a hostname/port delimiter). This slows down all URL strings. I've been experimenting with an approach that simply splits on the first colon after a closing square bracket.

So for the hostname "][::][:80", the split is:

Method	Host	Port
Std	`"][::]["`	`"80"`
Mine	`"]["`	`":][:80"`

Currently that's fine (I think?). Since the hostname doesn't begin with an opening bracket, it won't be parsed as IPv6, and ultimately the opaque host parser will reject them both because square brackets are forbidden. If we changed to percent-encode forbidden code-points, this input would potentially be accepted.

Perhaps this could be limited to the setter function?

alwinb · 2021-12-27T10:29:50Z

Ok. So it makes sense to investigate this more thoroughly; how percent coding opaque hosts could/ should interact with the ipv6 address syntax, and make it a separate issue (as was suggested).

Maybe we can take care of that first? I’m fine with handling this one first, but I myself just can’t really get behind having two distinct forbidden host codepoints sets… it may just be a minor thing, I realise. But still…

Is failing on all controls in opaque hosts really not an option? Maybe there are there solutions?

annevk · 2022-01-25T15:58:18Z

I expect we'll take #685 at which point we should probably close this.

alwinb · 2022-01-25T20:20:16Z

Of course. I’m happy @karwa picked this up.

karwa · 2022-01-25T20:53:44Z

Sorry, this was just blocking some interop stuff that I was working on, and since there seemed to be agreement on the idea of blocking C0s and delete, I figured I'd just get it done and take the opportunity to make the tests more comprehensive.

alwinb · 2022-01-25T22:13:44Z

Oh no it’s excellent. I felt like I was blocking the progress here, and now this issue is taken care of and the opaque hosts can maybe be looked at later. I’ll close the PR.

Disallow all C0 control and DEL codepoints in hosts

993aa2a

Fixes #627

annevk reviewed Nov 22, 2021

View reviewed changes

karwa mentioned this pull request Jan 25, 2022

Split forbidden host/domain code points, and add all C0 controls and add U+007F to the latter #685

Merged

3 tasks

alwinb closed this Jan 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disallow all C0 control and DEL codepoints in hosts #673

Disallow all C0 control and DEL codepoints in hosts #673

alwinb commented Nov 22, 2021 •

edited by pr-preview bot

Loading

annevk left a comment

karwa commented Dec 24, 2021

alwinb commented Dec 27, 2021

annevk commented Jan 25, 2022

alwinb commented Jan 25, 2022

karwa commented Jan 25, 2022

alwinb commented Jan 25, 2022

Disallow all C0 control and DEL codepoints in hosts #673

Disallow all C0 control and DEL codepoints in hosts #673

Conversation

alwinb commented Nov 22, 2021 • edited by pr-preview bot Loading

annevk left a comment

Choose a reason for hiding this comment

karwa commented Dec 24, 2021

alwinb commented Dec 27, 2021

annevk commented Jan 25, 2022

alwinb commented Jan 25, 2022

karwa commented Jan 25, 2022

alwinb commented Jan 25, 2022

alwinb commented Nov 22, 2021 •

edited by pr-preview bot

Loading