Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving/copying/removing a symbolic link #6

Closed
ExpHP opened this issue Feb 18, 2018 · 15 comments
Closed

Moving/copying/removing a symbolic link #6

ExpHP opened this issue Feb 18, 2018 · 15 comments

Comments

@ExpHP
Copy link

ExpHP commented Feb 18, 2018

(I wish GitHub provided a better way to have discussion other than opening a bug report)

Let's talk symlinks.

I have an application which has historically taken the easy way out of path management by changing the current directory. In moving away from this setup towards a more principled design, I considered using path_abs as a means of leveraging the type system to ensure that I don't accidentally leave behind any lingering relative paths.

Reviewing the documentation, however, it is evident to me that my mental model of symlinks differs from that of the library. Of course, this is to be expected; I understand that PathAbs is opinionated, and that our ideas will inevitably clash somewhere.

But with regard to symlinks specifically, I find the model presented by this crate to be dangerous, and quite scary! I'd like to hear your point of view on this, and I wonder if there is something I am missing.


I'll describe these clashing models by making analogy to Rust's borrow semantics.

To me:

  • A directory in the filesystem is like some Box<_> type. It has a unique owner (its "true" parent) and that's the only way you can move or delete it.
  • A file in the filesystem is like some Arc<_> type. It may have multiple owners via hard links, and dies when the last link vanishes.
  • A symlink is like some *mut _ type. It can change data belonging to its target, but only if you explicitly dereference it (by joining another path component). The pointer itself does not own its target, and its target might not even exist.

In PathAbs:

  • by immediately resolving symlinks, PathAbs makes symlinks feel more like C++'s mutable references. A symlink is the directory or file it points to. To me, this is terrifying, because if I'm not careful, then code which was intended to delete a symlink could end up accidentally recursively deleting its target!

What do you think of this? Do you disagree with this risk about deleting a symlink? Is there room in PathAbs for symlinks to live as first class citizens? Am I asking too many questions? 😛

@ExpHP
Copy link
Author

ExpHP commented Feb 20, 2018

I tried to put together something resembling what I think an API with support for symlinks might look like, without changing the semantics of anything that already existed.

e14c4d9

The commit shows:

  • The addition of a type PathEntry which is like PathAbs but with a slightly weaker invariant that it doesn't resolve a symlink in the final path component. This makes it not the canonical form of a file or directory, but rather, of a directory entry; hence the name.
  • PathSymlink which holds a PathEntry to a path previously identified as a symlink.
  • PathDir::list_entries, which is capable of losslessly listing all entries of a directory as paths in that directory, in contrast to list

@vitiral
Copy link
Owner

vitiral commented Feb 20, 2018 via email

@ExpHP
Copy link
Author

ExpHP commented Feb 23, 2018

Amusingly enough, I think I may have destroyed my own computer, too, while I was in the middle of performing the migration to path_abs. (no 5mo, just a novice's DIY follies). I'm now trying to rebuild my workflow on my desktop PC, and to hunt down one of the planet's only five drive enclosures for mSATA (a.k.a. why does this standard even exist?).

@vitiral
Copy link
Owner

vitiral commented Mar 22, 2018

Okay, I'm back and ready to look at a design which incorporates symlinks in some way.

Usecase / Problem Statement

First, some definitions:

  • absolute path: an absolute path to is the absolute path to a fs object that exists (file, directory, symlink). Some or even all of the directories in that path may be symlinks, but the beginning directory must always be root (or \\?\ on windows).
  • canonicalized path: is an absolute path with no symlinks, including the destination.

The goal of this issue should be to use absolute paths as opposed to cannonicalized paths as the base type (but cannonicalize() still returns a PathAbs` as well). I think this would solve all of your issues.

Technical Solution

  • If the path does not start with / then prefix it with std::env::current_dir().canonicalize().
  • If on windows and the path does not start with \\?\ then replace the first component with it's canonicalized() form.
  • If the path has . or .. in it then handle as follows:
    • Remove . components
    • Remove .. components that follow components by stripping out the component. I.e. /foo/bar/../baz -> /foo/baz.
    • If left with leading .. then the path is invalid and return an error (we consumed root).

Part 2

I think the above should theoretically work for what we want -- now what to do with the rest of the API. If the following can be zero cost they would be nice:

  • add is_symlink() to all Path* types that are not Arc (definitely zero cost)
  • add is_cannonical() to all Path* types that are not Arc (maybe zero-ish cost). My thought is that we are at least going to call exists() on the path, so we might as well instead call canonicalize(). If the canonicalized version is identical to the path then is_cannonical() == true

Other thoughts are welcome!

@ExpHP
Copy link
Author

ExpHP commented Mar 22, 2018

Sounds like part of what you're describing is Path::clean (stdlib PR).

Absolute paths would solve this problem and some others. What you've got drafted there at the time of me writing this sounds like a bit of a wild mix of canonicalization and cleaning; I'm not sure about Windows, but on Unix I think current_dir().join(path).clean() is usually all I am looking for.

(note: well, okay, that's only true for CLI args. For paths in config files, some form canonicalization is needed so that the current directory does not impact resolution)

@vitiral
Copy link
Owner

vitiral commented Mar 22, 2018

this and this are particularly enlightening. Fully supporting windows is not easy and I think I would want to wait until at least clean() has landed.

Edit: the first link was only interesting because they said that .. was a "symbolic link, this would have to call realpath in order to not get a different result than accessing the original path, and would have to hit the filesystem", but I thought your response was very good. Who cares if it's a symbolic link... let's just clean it.

@vitiral
Copy link
Owner

vitiral commented Mar 22, 2018

@kennytm I followed your comments on the rust issue and I'm wondering what your thoughts are for this issue.

The basic use case: we want "absolute paths for files that exist" -- but they are allowed to have symlinks inside.

In particular, do you think the technical solution I gave will work for windows?

@vitiral
Copy link
Owner

vitiral commented Mar 22, 2018

agh, I hate my life

This is not generally possible in POSIX because of symlinks: if a/b/c is a symlink to a/d, then a/b/c/../foo is actually a/foo, not a/b/foo. Now that modern versions of Windows support symlinks too, I suspect that's also an issue there.

that actually makes sense and makes me very sad.

@vitiral
Copy link
Owner

vitiral commented Mar 22, 2018

So, thinking a little more about this, I don't think there is really any value in allowing .. anywhere but the beginning. When does someone have a path like /a/b/../c/d/../e/f? That is a pretty absurd use-case IMO, and resolving it either correctly or incorrectly could lead to a lot of confusion. Better to just error in my opinion.

So my suggestion: only allow (multiple) .. at the beginning of the the path (so ../../a/b/c/d/e/f would be valid).

The issue of resolving the wrong path doesn't exist when .. is at the beginning since the current_dir is already canonicalized so we can safely walk down the path (we know it is "real").

@ExpHP
Copy link
Author

ExpHP commented Mar 22, 2018

I construct paths with interior .. all the time when joining relative paths from config files...

@vitiral
Copy link
Owner

vitiral commented Mar 22, 2018

good point. I think treating it semantically is probably the best course of action, agreed?

@vitiral
Copy link
Owner

vitiral commented Mar 22, 2018

See #11, would love any feedback!

@ExpHP
Copy link
Author

ExpHP commented Mar 22, 2018

Define semantically? (Lexically?)

I alluded to this in an edit to one of my previous posts, but I'm beginning to think that automatic normalization of any form can be hazardous. There are times when straight up canonicalization is best, and there are other times when cleaning first before canonicalization is best.

That said, I certainly will feel a lot more comfortable using the library when it isn't instantly following every symbolic link; so I do welcome any change to that effect.

bors bot added a commit that referenced this issue Mar 22, 2018
11: WIP #6: add absolute() method on PathArc, still need tests r=vitiral a=vitiral

This is a first attempt at implementing #6.
bors bot added a commit that referenced this issue Mar 22, 2018
11: WIP #6: add absolute() method on PathArc, still need tests r=vitiral a=vitiral

This is a first attempt at implementing #6.
bors bot added a commit that referenced this issue Mar 23, 2018
11: WIP #6: add absolute() method on PathArc, still need tests r=vitiral a=vitiral

This is a first attempt at implementing #6.
@vitiral
Copy link
Owner

vitiral commented Apr 24, 2018

This should be solved as of #11. Please comment!

@vitiral vitiral closed this as completed Apr 24, 2018
@vitiral
Copy link
Owner

vitiral commented Apr 24, 2018

solved > 0.4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants