Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Url.to_file_path() doesn't produce UNC path on Windows #450

Open
mykmelez opened this issue May 9, 2018 · 5 comments
Open

Url.to_file_path() doesn't produce UNC path on Windows #450

mykmelez opened this issue May 9, 2018 · 5 comments
Labels

Comments

@mykmelez
Copy link

mykmelez commented May 9, 2018

Url.to_file_path() doesn't produce a UNC path on Windows, even when the Url was initialized with a UNC path via Url.from_file_path(). Thus this program:

extern crate url;

use std::path::Path;
use url::Url;

fn main() {
	let unc_path = Path::new(r"\\?\C:\Windows\System");
	let url = Url::from_file_path(unc_path).expect("url");
	let abs_path_buf = url.to_file_path().expect("path");
	let abs_path = abs_path_buf.as_path();
	assert_eq!(unc_path, abs_path);
}

Fails with:

thread 'main' panicked at 'assertion failed: `(left == right)`
left: `"\\\\?\\C:\\Windows\\System"`,
right: `"C:\\Windows\\System"`', src\main.rs:11:2

It seems like Url.to_file_path() should produce a UNC path, at least when the Url was initialized with one; and perhaps in all cases, for compatibility with std::fs::canonicalize(), which always produces UNC paths on Windows (although this is controversial, per rust-lang/rust#42869).

@Screwtapello
Copy link

As I understand it, the output of std::fs::canonicalize() is not a UNC path, exactly. A UNC path is of the form \\server\share\path\to\file, and the canonicalized form would be \\?\UNC\server\share\path\to\file. The best name I've been able to find for the \\?\ kind of path is extended length path.

URL syntax doesn't really have a way to represent the distinction between Windows' extended length paths and "normal" paths, so I think it's reasonable that canonicalized paths cannot round-trip through URL encoding.

@SimonSapin
Copy link
Member

I’m very unfamiliar with UNC, so it would help a lot if someone could write up or point to something about the different kinds of paths that exist on Windows and how they should map to file: URLs and back.

Note however that “this Url was created with Url::from_file_path” is not the kind of information that I think we should track separately, so the behavior of .to_file_path() should be same when .as_str() is the same.

@Screwtapello
Copy link

As I understand it, all the actual conventions about Windows paths and file: URLs are already in the WHATWG's URL spec.

Basic absolute Windows paths like C:\Windows\System32 (that begin with what Rust calls a Disk prefix) or extended length paths like \\?\C:\Windows\System32 (that begin with what Rust calls a VerbatimDisk prefix) become URLs like file:///C:/Windows/System32

UNC Windows paths like \\server\share\path\to\file.txt (that begin with what Rust calls a UNC prefix) or extended-length UNC paths like \\?\UNC\server\share\path\to\file.txt (that begin with what Rust calls a VerbatimUNC prefix) become URLs like file://server/share/path/to/file.txt

Device paths like \\.\COM2 for the second serial port (that begin with what Rust calls a DeviceNS prefix) probably shouldn't correspond to any URL at all.

Rust also supports a generic Verbatim prefix which is presumably for forwards-compatibility in case Microsoft introduces a path that's not a local or UNC path, but the URL library probably doesn't care about any of that.

Google's Project Zero has an unofficial guide to Windows path syntaxes that mentions a \??\ path prefix that Rust doesn't know anything about so it's probably too obscure to care about but I figured I'd mention it for completeness.

TL;DR: it's probably not worth bothering with anything beyond the Disk, UNC, VerbatimDisk and VerbatimUNC.prefixes.

@SimonSapin
Copy link
Member

Thanks @Screwtapello! Do you have thoughts on the other conversion, from file: URL to Path? It sounds like the two cases there are with or without a host name. In particular, what about the case in this issue’s description? If Rust’s Disk and VerbatimDisk prefixes map to the same URLs in from_file_path, we can’t discriminate them in to_file_path.

@Screwtapello
Copy link

It's hard to recommend a general plan.

On one hand, the basic path syntax is the simplest, most recognizable, most compatible with other programs (for example, in a config file, or on the command-line), and allows shortcuts like "." and ".." to navigate within the path hierarchy which some URLs may expect. Also, this is the only path syntax available on non-Windows platforms.

On the other hand, the extended-length syntax allows paths much longer than 256 characters (so for some URLs you'll need to use extended-length syntax no matter what the default is), and are immune to the classic "forbidden filenames" problem where trying to read or write a file named "COM1" or "AUX" or "LPT2" etc. will try to access a device and usually hang forever.

The easiest and safest thing would probably be to always use extended-length path syntax on Windows, shielding apps from most of the rough corner-cases of Windows path handling.

A nicer system would be to use basic syntax on Windows unless the path is longer than 256 characters, or if it mentions one of the forbidden filenames listed in the unofficial guide I linked to... but that kind of blacklisting is pretty complex and probably deserves its own crate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants