Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for std::path::absolute #92750

Closed
1 of 3 tasks
ChrisDenton opened this issue Jan 10, 2022 · 25 comments
Closed
1 of 3 tasks

Tracking Issue for std::path::absolute #92750

ChrisDenton opened this issue Jan 10, 2022 · 25 comments
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@ChrisDenton
Copy link
Member

ChrisDenton commented Jan 10, 2022

Feature gate: #![feature(absolute_path)]

This is a tracking issue for std::path::absolute for making relative paths absolute without touching the filesystem. It is a drop-in replacement for std::fs::canonicalize.

Public API

let absolute = std::path::absolute("foo/./bar")?;

Steps / History

Unresolved Questions

  • on Windows this is currently implemented using GetFullPathNameW, which isn't quite a "pure" function (in some special cases, e.g. D:file.txt, it can use hidden environment variables, set by cmd.exe, to resolve the path's root).
  • Should we recommend using absolute(a.join(b)), where a is an alternative current directory to using std::env::current_dir?
  • Is the avoidance of stripping .. actually the right call? It does seem consistent with the POSIX docs, but may not be the most intuitive behavior (particularly given that Windows doesn't do that).
    • I think making the path absolute is inherently a platform-specific function. It can only really work according to the rules of the platform.
@ChrisDenton ChrisDenton added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jan 10, 2022
@ChrisDenton
Copy link
Member Author

ChrisDenton commented Jan 10, 2022

Ok, so cards on the table. My personal motivation for this function is to expose the Windows function GetFullPathNameW (or something similar) into the standard library in a way that can be used cross-platform. This is already used by the standard library for making verbatim paths (absolute paths that aren't limited by MAX_PATH). I've also heard from Linux users who want a way to make absolute paths that won't break on Windows. Previous related discussions include these internals thread where I explore other possibilities:

I'd like to address the unresolved issues of not resolving .. on *nix: it was felt that resolving this is undesirable as it would change the meaning of the path. Whereas on Windows it does not. This is a fundamental difference between POSIX and Windows. On Windows, .. is resolved lexically (aka without reference to the filesystem) whereas on POSIX systems it's essentially a symlink that can take you anywhere in the filesystem. It's not possible to resolve this dissonance so maybe we shouldn't hide it.

@ChrisDenton
Copy link
Member Author

ChrisDenton commented Jan 11, 2022

Should we prepend the current directory, or skip that step? The user can typically join with the current directory afterwards.

I feel this is mistaken. On Windows, when you pass a path to File::open or other fs operation, there is a fair amount of parsing done to get the "real" absolute path which is then passed to the kernel. It is often desirable to know exactly what path is actually being used to avoid surprises.

Here are some examples of the potential differences between using absolute and Path::join. I'm using cmd.exe which initially sets the current directory to C:\Users\Chris. I then change the current directory to D:\Chris\rust.

Input std::path::absolute current_dir()?.join
aux \\.\aux D:\Chris\rust\aux
C:file.txt C:\Users\Chris\file.txt C:file.txt
folder.\file... .. D:\Chris\rust\folder\file D:\Chris\rust\folder.\file.. ..

The idea of absolute is that it gives you the path the kernel will see so there are no surprises. Some of the Windows behaviour (in particular the legacy ones) can be very unintuitive. The absolute function avoids having to know much of the weirdness involved with user paths because it "knows" how to parse it the same way the OS does.

@golddranks
Copy link
Contributor

golddranks commented Aug 15, 2022

whereas on POSIX systems it's essentially a symlink that can take you anywhere in the filesystem.

Could you elaborate on that? I get that /foo/a and /bar/a could point to the same directory a if, for example /bar/a is a symlink; do you mean that stripping the .. in /bar/a/.. is required to take you to the "physical" parent of the directory, /foo, thus requiring a filesystem access to resolve?

@ChrisDenton
Copy link
Member Author

Yes. sorry I worded that badly. The absolute function doesn't use the file system to resolve paths so it can't know which directory .. should resolve to. Instead, canonicalize must be used to properly resolve such paths. It's the same reason why Path::components normalizes . but not ...

@Zoxc
Copy link
Contributor

Zoxc commented Mar 20, 2023

Is there a particular reason this is added to the path module and not on the Path type like other path methods?

@ChrisDenton
Copy link
Member Author

Mainly just to be more or less a drop-in replacement for fs::canonicalize. And to avoid issues like: should Path::absolute take a base path? That said, I don't personally have any objection to it being a Path method.

@Zoxc
Copy link
Contributor

Zoxc commented Mar 20, 2023

The path module doesn't have any other path function currently, so I think adding it to Path as a drop-in for Path::canonicalize makes sense.

Resolving .. seems like a really useful usecase too, but it's probably better to make fs::canonicalize behave better on Windows for that.

@Zoxc
Copy link
Contributor

Zoxc commented Mar 24, 2023

Thinking things over a bit I think we may want 2 functions.

  • absolute which does only what is necessary to convert the path into an absolute form, so no resolving of . and ... It's intended to be a fast and as pure as possible path manipulation function with no filesystem access. This way it behaves in a consistent manner with regards to ...
  • resolve which converts the path to absolute form and only fails if absolute would. Additional it resolves . and .. by best effort using filesystem access if necessary, but it keeps . and .. instead of failing. It may resolve symlinks. This would use a realpath call per .. component and GetFullPathName on Windows. This is more expensive and is intended to produce more readable paths.

@jyn514
Copy link
Member

jyn514 commented Apr 2, 2023

What advantage would resolve have over canonicalize? "more readable paths" on Windows seems like it would be served just as well by absolute, and on Unix you can just use canonicalize.

@jyn514 jyn514 added the A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` label Apr 2, 2023
@Zoxc
Copy link
Contributor

Zoxc commented Apr 2, 2023

What advantage would resolve have over canonicalize?

It would work on Windows in more scenarios and on paths which may partially exist or not exist at all. It can avoid filesystem access if there's no .. component on Unix. It should perhaps do case and Unicode normalization however.

"more readable paths" on Windows seems like it would be served just as well by absolute, and on Unix you can just use canonicalize.

You seem to suggest that end user programs should call different functions on Windows and Unix. That's not great for a portable API.

@ChrisDenton
Copy link
Member Author

The path::absolute function, as it stands, gives an "OS view" of the path. By which I mean, you see essentially the same path as the OS sees right before it starts doing filesystem stuff (e.g. resolving symlinks, etc). This is useful because it helps to ensure the application and OS are interpreting the path similarly. In some situations it can also be a useful diagnostic to log the path before symlinks are resolved, albeit cleaned up as much as possible.

Libraries could build on this primitive to provide a number of other useful functions. They can't however unbuild it. If std only provides higher level functions, then it will not allow experimenting outside of std without those libraries having to reimplement it themselves.

@BurntSushi
Copy link
Member

This is useful because it helps to ensure the application and OS are interpreting the path similarly. In some situations it can also be a useful diagnostic to log the path before symlinks are resolved, albeit cleaned up as much as possible.

I agree this is useful. I want something close to this for implementing support for hyperlinks in ripgrep. Say, for example, a user is in /home/andrew and finds a result in /home/andrew/foo/bar.txt. The user then clicks on the hyperlinked /home/andrew/foo/bar.txt, but it winds up opening the path /home/andrew/some/deeply/nested/directory/bar.txt instead. It's the same file, but because symlinks are resolved, the path that is opened is different from the one I clicked on. This leads to a surprise, and may even lead to undesirable behavior (albeit not necessarily).

But in order for absolute to work for the hyperlink use case, I think it would need to resolve ...

@ChrisDenton
Copy link
Member Author

ChrisDenton commented Sep 21, 2023

The Unix case is tricky because there's no (safe) way to cleanup .. without asking the OS to resolve that part of the path. And if you did use e.g. readlink to resolve the .. components then you're doing a kind of hybrid canonicalization of the path anyway so I'm not sure if there's much of a difference with full canonicalization.

Can you say why .. would need to be resolved? Presumably ripgrep itself wouldn't be inserting .. components into the path? And if they come from the user then it doesn't seem like a bad thing to display paths the way the user typed it in?

@BurntSushi
Copy link
Member

Yeah, that's a good point. Apparently some stuff on Windows at least doesn't deal with .. in links while some stuff does. Of course, std::path::absolute would therefore be sufficient for that case, but it's not clear whether apps on Unix will also "not work" with .. too. Looks like it might be app specific? Not sure.

@BurntSushi
Copy link
Member

@ChrisDenton Do you think this is ready for FCP? I think there are a couple things I'd like to see:

  • A clearer answer to whether this should take a base. If it doesn't, what use cases are we rejecting? Or are we not rejecting any use cases and instead making some use cases less convenient?
  • A crisp and succinct use case for why someone would want to use this method. Ideally, it would be suitable as an example in the docs for this routine. In particular, canonicalize has a somewhat clear use case, but this routine is murkier. Can we connect the dots for users so that it's easier for them to make a judgment call about whether this is the appropriate routine to use or now?

@ChrisDenton
Copy link
Member Author

ChrisDenton commented Apr 12, 2024

The first point is something I've been thinking about recently but it requires a bit of explanation to understand my thinking.

Win32 has two legacy types of relative paths which are nonetheless still supported:

Type Example Explanation
Root relative \file Relative to the root of the current directory.
Drive relative D:file Relative to the current directory of the D drive. The concept of drives storing their own current directory is pretty archaic but it's still emulated by the command prompt (using "hidden" environment variables) and this is respected by Win32 filesystem APIs.

That second one is a pain point. It uses hidden state, which is fine when you're just asking the OS to figure out what the path really is but is very surprising when you asked it to join two paths together. Making it worse is that D:file is ambiguous. It can also be used to access a data stream called file on a file called D. E.g. C:\path\to\D:file is a perfectly valid path.

So I was thinking that something like the below would make sense, so it's clearer what you mean to happen:

// This is just a sketch
pub fn absolute(base: Option<&Path>, path: &Path) -> io::Result<PathBuf> {
    if let Some(base) = base {
        if base.is_relative() || path.is_absolute() { return err }
        // Note: `join_subpath` is a hypothetical function that, unlike `join`
        // never overrides the base path
        impl::absolute(base.join_subpath(path))
    } else {
        // If no base is given, just use `path`
        impl::absolute(path)
    }
}

A crisp and succinct use case for why someone would want to use this method.

If a user passes a path to a tool then they tend to expect printed paths to be in displayed in a similar manner to the way they were given. The canonicalized form may appear to be very different from the path the user typed.

For Windows paths specifically, they're much easier for the programmer to handle in their normalized form, e.g. without legacy relative paths or other path based weirdness. But canonicalization can fail (e.g. if the path doesn't exist yet).

@BurntSushi
Copy link
Member

So I was thinking that something like the below would make sense, so it's clearer what you mean to happen:

Hmmm. Yeah that signature that takes a base seems non-ideal. And it looks like it's something the caller could do themselves if necessary. Are you okay sticking with the current signature? If so, it looks like this might be ready for FCP.

@ChrisDenton
Copy link
Member Author

Yes, I'm fine with that if other people are.

@BurntSushi
Copy link
Member

@rfcbot fcp merge

std::path::absolute is a useful routine for turning a file path into an absolute path, but without touching the file system. It effectively exposes GetFullPathNameW on Windows, and POSIX path normalization (sans symlinks) on Unix. It is typically useful when you want to transform a file path given by a user into an absolute path, but without substantively changing the path. e.g., given ./foo where foo is a symlink, you'd get back $PWD/foo where as std::fs::canonicalize would try to resolve foo to the target it points to.

@rfcbot
Copy link

rfcbot commented Apr 12, 2024

Team member @BurntSushi has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Apr 12, 2024
@rfcbot
Copy link

rfcbot commented Apr 14, 2024

🔔 This is now entering its final comment period, as per the review above. 🔔

@RalfJung
Copy link
Member

Relative to the root of the current directory.

What is "the root of the current directory"? Does this refer to the drive letter?

@ChrisDenton
Copy link
Member Author

Yes, for drive paths. It could also mean \\servername\sharename for UNC paths.

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Apr 24, 2024
@rfcbot
Copy link

rfcbot commented Apr 24, 2024

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@rfcbot rfcbot added the to-announce Announce this issue on triage meeting label Apr 24, 2024
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Apr 24, 2024
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Apr 24, 2024
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Apr 24, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Apr 25, 2024
Rollup merge of rust-lang#124335 - ChrisDenton:stabilize-absolute, r=dtolnay

Stabilize `std::path::absolute`

FCP complete in rust-lang#92750 (comment)
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Apr 25, 2024
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Apr 29, 2024
@GrigorenkoPV
Copy link
Contributor

Now that #124335 is merged, should this issue be closed?

@dtolnay dtolnay closed this as completed Jul 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants