Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow protocol setter to switch between special and non-special schemes #674

Open
achristensen07 opened this issue Nov 23, 2021 · 17 comments
Labels
compat Standard is not web compatible or proprietary feature needs standardizing topic: api

Comments

@achristensen07
Copy link
Collaborator

Chrome, Firefox, and Safari all agree on this behavior:

u = new URL("custom-scheme://host?initially-no-path");
u.protocol = "https";
u.protocol = "custom-scheme";
u.href; // custom-scheme://host/?initially-no-path

Even though it is a little strange, I have reason to believe that people are using the URL protocol setter to do this already.

@annevk
Copy link
Member

annevk commented Nov 23, 2021

Isn't this rather problematic as it changes the nature of the host? How do you deal with that, reparse? Query also currently branches on "is special".

@achristensen07
Copy link
Collaborator Author

There is a whole host of problems here (pun definitely intended) but I'm just suggesting that this is the reality of URLs and the spec might want to reflect that. We could make the protocol setter re-parse the serialized URL or something like that.

@annevk annevk added compat Standard is not web compatible or proprietary feature needs standardizing topic: api labels Nov 23, 2021
@annevk
Copy link
Member

annevk commented Nov 23, 2021

Yeah, I guess that is what we would have to do.

cc @TimothyGu

@TimothyGu
Copy link
Member

@achristensen07 would you be able to describe the algorithm that Safari uses for the protocol setter, including the reparse step?

@achristensen07
Copy link
Collaborator Author

The current algorithm we use is as follows ( See URL::setProtocol at https://trac.webkit.org/browser/webkit/trunk/Source/WTF/wtf/URL.cpp#L390 ):

  1. Take the value and remove any colon and anything after it
  2. If the value is empty, does not start with an ASCII alpha character, or contains anything but ASCII alpha characters, '+', '-', or '.' (currently implemented in URLParser::maybeCanonicalizeScheme), return the unchanged URL.
  3. If the "URL" is an invalid URL (an HTML anchor tag) then try parsing value then ':' then the invalid URL string (this step is strange and seems worth removing and not standardizing)
  4. If value is "file" and the URL has credentials or a port then return the unchanged URL. (This is an implementation of Restrict setting protocol to "file" #269 in a different place)
  5. If the URL's protocol is already "file" and has a host, return the unchanged URL. (This seems related to step 4, but I'm not sure what the history is.)
  6. Serialize the current URL to a string, remove all characters before the first colon, then parse value with a colon and that string.

It seems like there are restrictions around "file", reparsing the whole URL, and removal of stuff after the colon that are common to all browsers.

@karwa
Copy link
Contributor

karwa commented Nov 29, 2021

I think this is possible, but can be quite tricky. IMO this operation should avoid anything which would cause other URL components to be interpreted with a different meaning, and that may require a lot of testing/fuzzing/etc to figure out. But I think it's possible.

A few cases I can think of:

  • Empty hostnames should be checked before re-parsing. They're not allowed in special URLs (except file) anyway, so it's fine to bail, but sc:////////////notahost would be reparsed to http://notahost/ which seems dangerous.

  • Unescaped backslashes should be checked and maybe percent-encoded? sc://host/some/path\..\..\etc\passwd gets reparsed as http://host/etc/passwd.

@TimothyGu
Copy link
Member

I like the idea of ensuring the parts don't get reinterpreted too much before allowing a non-special URL to switch to a special scheme. As an example, to allow switching to file, I think it's reasonable to require that:

  • The source URL must not have an opaque path.

To switch to http, we should additionally require:

  • The source URL must have a host. (Technically this implies the first requirement.)

As an additional example, Firefox and Safari currently have the following behavior:

u = new URL('javascript:!!notahost()');
u.protocol = 'http:';
console.assert(u.href, 'http://!!notahost()/');
// console.assert(u.href, 'http://%21%21notahost%28%29/'); // Chrome

which I don't think is useful or worth keeping.


It also sounds like escaping \ would be quite useful for this proposal. I filed #675 to track this.

@alwinb
Copy link
Contributor

alwinb commented Jan 5, 2022

Not that complicated, and no need to reparse the URL.

@alwinb
Copy link
Contributor

alwinb commented Jan 25, 2022

The alternative for switching to http for URLs with an empty or absent host, without serialising and reparsing the URL is to remove all empty path segments up to and including the first non-empty path segment, then parse that non-empty segment as an authority and adjust the username/password/host/port properties accordingly (or fail otherwise).

You’d have to adjust for the scheme dependent percent coding though.

That may sound complex, but there’s a lot of advantages to defining that as an operation on URL records. It is key in recovering a formal grammar and reference resolution as per RFC 3986 for WHATWG URLs.

@ljharb
Copy link

ljharb commented Aug 27, 2023

I don’t have any way to take an object of properties and turn them into a URL, except Object.assign(new URL(foo), properties) - and there’s not a good blank URL to use that’s also special, so that properties can be any URL. Why arbitrarily prevent transitions if it’s mutable at all?

@valenting
Copy link
Collaborator

Hi folks,

We've finally shipped the special scheme to non-special scheme restriction in bug 1347459 in Firefox 117, but a regression has been reported: Bug 1850954 - Cant change URL.protocol since v117

Chrome has also landed this recently in https://bugs.chromium.org/p/chromium/issues/detail?id=1416018 , which isn't yet in release.

Is there a better way for sites to change the scheme other than this?

new URL("customprotocol://" + oldUrl.host + oldUrl.pathname + oldUrl.search + oldUrl.hash)

Do we commit to this restriction or is there any intent to allow special/non-special protocol changes?

@karlcow
Copy link
Member

karlcow commented Aug 31, 2023

The relevant webkit bug https://bugs.webkit.org/show_bug.cgi?id=229427

@annevk
Copy link
Member

annevk commented Sep 1, 2023

@ricea thoughts?

Is there a better way for sites to change the scheme other than this?

This is how we would have to do it as well, given that host fundamentally changes. So setting protocol would essentially be setting href with some pre-work.

@ricea
Copy link

ricea commented Sep 1, 2023

@hayatoito is handling this in Chromium and has more context than me.

@hayatoito
Copy link
Member

I received a question on https://bugs.chromium.org/p/chromium/issues/detail?id=1416018 and posted a reply on https://bugs.chromium.org/p/chromium/issues/detail?id=1416018#c11.

I just assumed that the standard intentionally included this limitation.

@annevk
Copy link
Member

annevk commented Sep 4, 2023

It's definitely intentional as changes to the scheme would end up changing other components as well, which is rather unexpected.

Having said that, https://url.spec.whatwg.org/#potentially-strip-trailing-spaces-from-an-opaque-path does so as well so there is precedent now, but that's a very limited impact. Changing the scheme however would end up changing the host, which is central to the authority of a URL. I'd rather not do that personally if we can get away with it.

@hayatoito
Copy link
Member

Thanks.

I currently don't intend to undo the Chrome fix, at least for now. The fix will be part of the M118 release to make Chrome align with the current URL standard, unless we encounter a significant issue.

If we decide to permit setting protocols, please inform me by updating the issue or here.

I thought this was a straightforward bug fix, and wasn't aware that there is on-going discussion to change the standard here. :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compat Standard is not web compatible or proprietary feature needs standardizing topic: api
Development

No branches or pull requests

10 participants