Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is the host dropped if the path contains a Windows drive letter? #302

Closed
felixfbecker opened this issue Apr 28, 2017 · 15 comments
Closed

Comments

@felixfbecker
Copy link

felixfbecker commented Apr 28, 2017

The file: URL scheme does not define that the URL cannot have a host:

A file URI takes the form of
file://host/path
where host is the fully qualified domain name of the system on which the path is accessible, and path is a hierarchical directory path of the form directory/directory/.../name. If host is omitted, it is taken to be "localhost", the machine from which the URL is being interpreted.

https://en.wikipedia.org/wiki/File_URI_scheme

It looks like the spec says in https://url.spec.whatwg.org/#path-state 1.4.1.1 that a host is to be dropped if the path contains a Windows drive letter:

  • If url’s scheme is "file", url’s path is empty, and buffer is a Windows drive letter, then:
    • If url’s host is neither the empty string nor null, validation error, set url’s host to the empty string.

This means it's impossible to have a Windows file URL have a different host than the empty string:

const uri = new URL('file://host/foo')
uri.host // 'host'
uri.pathname = '/foo'
uri.host // ''

new URL('/c:/baz/qux', 'file://host/foo/bar').href
"file:///c:/baz/qux"

This can be seen in jsdom/whatwg-url. Chrome actually behaves differently and keeps the host. NodeJS 7's bundled URL implementation also keeps the host.

but it is possible with a Unix-style file URL. Is there any reason for this?
It should be possible to refer to a file on a remote Windows machine just like on a remote Unix machine.

@annevk
Copy link
Member

annevk commented Apr 28, 2017

What does Edge do?

And just to be clear, you mean that on Windows file://somehost/C:/test/ can be a legit location?

@felixfbecker
Copy link
Author

felixfbecker commented Apr 28, 2017

@edge keeps the host, but throws an Invalid URL Error upon serializing with href or toString():

image

So it doesn't really follow the spec either, but if they changed it to support this it wouldn't be a breaking change.

There is no indication why it would not be a valid location, and why it would be valid for remote Unix machines but not for remote Windows machines.

Imagine you have an application running on machine A that identifies files by file URIs (e.g. an editor), then passes these in some way to another server or container B (e.g. a language intelligence server). For the external server, the files are not on localhost anymore, but on a different host. To prevent him from trying to read the files from his local disk, a host component must be added to the URI. This use case works 100% fine if A is running Unix, but not when A is running Windows, and I don't see a reason why it shouldn't. The URI just says "Hey, the file is on the file system of this host under this path".

@annevk
Copy link
Member

annevk commented Apr 28, 2017

Well if Windows doesn't support it I don't think it necessarily makes sense to generate such URLs since these are all special cases for Windows.

How did you run into this?

@felixfbecker
Copy link
Author

felixfbecker commented Apr 28, 2017

Shouldn't the spec decide what implementations support, not the support decide what is specced?
What do you mean with "if Windows doesn't support it"? Of course you cannot just type in a URL like file://120.53.57.76/c:/foo/bar.txt into Edge, hit enter and get the file content of that file on that IP shown in the browser. But that doesn't mean that shouldn't be able to create or parse such a URL, for identification purposes, internal application logic etc. It's up to the application how to retrieve the content and whether the content ever needs to be retrieved. And it doesn't violate the URL or file: URL specs/conventions.

The way I ran into it is actually the example I gave above. I was refactoring a language server to use the WHATWG URL API instead of Node's url.parse()/url.format(), and couldn't get some tests to pass because of this behaviour. The server is written in NodeJS and can run on any host and OS, but the text document URIs are picked by the client, which are typically file: URIs (and if the client runs Windows, they will have drive letters). If the host of the file URI is missing or localhost, the server should be able to assume that it can read the file from disk. Right now, it can't if the client is running Windows (but can if it is running Linux).

@annevk
Copy link
Member

annevk commented Apr 28, 2017

Shouldn't the spec decide what implementations support, not the support decide what is specced?

Not when it comes to legacy features, really.

@felixfbecker
Copy link
Author

How is this legacy though?

@alwinb
Copy link
Contributor

alwinb commented May 4, 2017

I've run into this. I think it is more clean to confine the deviant normalization rules of drive letters to the path only. What is the motivation for dropping the host?

@annevk
Copy link
Member

annevk commented May 5, 2017

I don't recall. I think I'd probably take a PR that changes this, but I'm not actively working on URL issues myself for a bit.

@annevk
Copy link
Member

annevk commented May 6, 2020

Per https://jsdom.github.io/whatwg-url/#url=ZmlsZTovL3NvbWVob3N0L0M6L3Rlc3Qv&base=YWJvdXQ6Ymxhbms= it seems both Chrome and Safari preserve the host, even on macOS. I think that means we should change this.

@domenic
Copy link
Member

domenic commented May 6, 2020

Hmm, on Windows I see Chrome giving the same results as the URL Standard.

@sleevi
Copy link

sleevi commented May 6, 2020

Yeah, I'm trying to make sense of why that is. It's supposed to be the same on all platforms but I haven't figured out where the host is getting stripped yet

Edit: Found it - special Windows-only logic that nukes the host

@annevk
Copy link
Member

annevk commented May 6, 2020

I tried finding rationale in https://code.google.com/archive/p/google-url/source/default/commits but not successful thus far. Not sure what we want to do here now.

@sleevi
Copy link

sleevi commented May 6, 2020

Do we know what historic IE did? Much of the files behavior is trying to match IE5.5/IE6 URL parsing quirks.

By nature of the feature, and the behavior on other Chrome platforms, I feel like we can say this is a Chrome Windows-ism to support Enterprises. If IE changed behavior, it makes a compelling argument that Enterprises don’t need this and it should be “easy” to remove. If IE matches the current behavior, we can still pursue removing, but would probably need metrics in place to measure how often this happens (and potentially the host being localhost or . as special cases) to evaluate removing the Windows-specific behavior.

@annevk
Copy link
Member

annevk commented May 10, 2020

I tested IE6 (and Chrome 49) on XP using BrowserStack via Live DOM Viewer (one of the few pages that still works, hurray) with

<a href="file://test/d:/testtest;'?"></a>
<script>
w(document.getElementsByTagName("a")[0].href)
</script>

as my input and verified that .href does show post-parse output using file:/s as input (becomes file:///s). IE6 preserves the host, Chrome drops it.

(In "real" Edge 18 input such as file://aa/d:/bb yields file://aad:/bb (indeed, d: eats a prior slash). Microsoft clearly didn't have a lot of regression tests for this.)

@alwinb
Copy link
Contributor

alwinb commented May 14, 2020

I have made a bunch of screenshots with browserstack (initially for #405) and transcribed from them the results that are relevant to this issue. The results may also be relevant to #515.

The following applies only to file URLs.

Observations:

  • Firefox always sets the host to the empty string.
  • IE, Edge, Chrome and Safari prior to version 10 preserve the host, but with exceptions:
    • IE and Edge append the drive letter to the host, if one is present. I assume this is a bug.
    • Chrome/Windows sets the host to the empty string if the path contains a drive letter.
    • IE, Edge and Safari replace localhost with the empty string.
  • Safari versions 10 and up seem to drop the host from the base URL if the input is an absolute file path, but the exact behaviour is unclear to me.

Browser Tests:

/..//localhost/pig against file://lion/
IE8+ and Edge file://lion//localhost//pig
Chrome/Windows file://lion//localhost//pig
Chrome/Mac file://lion//localhost//pig
Safari 5 - 9 file://lion//localhost//pig
Safari 10 - 13 file:////localhost//pig
Firefox file:////localhost//pig
file://localhost//a//../..//
IE8+ and Edge file:///.//
Chrome/Windows file://localhost///
Chrome/Mac file://localhost///
Safari 5 - 9 file://///
Safari 10 - 13 file://///
Firefox file://///
file://localhost////foo
IE8+ and Edge file://foo
Chrome/Windows file://localhost////foo
Chrome/Mac file://localhost////foo
Safari 5 - 9 file://////foo
Safari 10 - 13 file://////foo
Firefox file://////foo
file://somehost/C:/test
IE8+ and Edge file://somehostc:/test
Chrome/Windows file:///C:/test
Chrome/Mac file://somehost/C:/test
Safari 5 - 9 file://somehost/C:/test
Safari 10 - 13 file://somehost/C:/test
Firefox file:///C:/test
file://host2/
IE8+ and Edge file://host2/
Chrome/Windows file://host2/
Chrome/Mac file://host2/
Safari 5 - 9 file://host2/
Safari 10 - 13 file://host2/
Firefox file:///
file:file2 against file://host/D:/dir1/file1
IE8+ and Edge file://hostd:/dir1/file2
Chrome/Windows file:///D:/dir1/file2
Chrome/Mac file://host/D:/dir1/file2
Safari 5 - 9 file://host/D:/dir1/file2
Safari 10 - 13 file://host/D:/dir1/file2
Firefox file:///D:/dir1/file2
file:/file2 against file://host/D:/dir1/file1
IE8+ and Edge file://hostd:/file2
Chrome/Windows file:///D:/file2
Chrome/Mac file://host/file2
Safari 5 - 9 file:///file2
Safari 10 - 13 file:///D:/file2
Firefox file:///file2
file:c:/dir2/ against file://host/D:/dir1/file1
IE8+ and Edge file://hostd:/c:/dir2/
Chrome/Windows file:///C:/dir2/
Chrome/Mac file://host/D:/dir1/c:/dir
Safari 5 - 9 file://host/D:/dir1/c:/dir
Safari 10 - 13 file:///c:/dir2/
Firefox file:///D:/dir1/c:/dir2
  • IE6 and 7 are hard to test as they do not implement the URL constructor, and the href property on anchor elements only returns a parsed result in specific cases. So this remains to be investigated.
  • IE8+ and Edge results are from IE 8, 9, 10 and 11 and Edge 15, 16, 17, and 18.
  • Firefox results are from version 62 and 76 on Mac and from version 47 on Windows, but they all agree. (Others timed out on browserstack).
  • Chrome/Windows results are from versions 49 and 71, they all agree.
  • Chrome/Mac results are from version 71.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants