-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mingw: treat junction points the same as symlinks #437
Conversation
We forgot to handle them in the same way. Pointed out by Ed Thomson. Signed-off-by: Johannes Schindelin <[email protected]>
This is now a proper Pull Request to be more visible (and have more eyeballs on it). |
Yeah, I think I would like MSys2's behavior to be replicated in Git for Windows. Any idea of the top of your head how they do it? If not, no big deal, I'll try to dig into it. |
@kblees, what makes you think that? I've always thought of junctions as being a restricted form of symlinks and I'm curious about the reasoning behind your opinion. |
Because mount points and junctions are both implemented via Mount points and junctions cannot point to files, cannot point to network locations and cannot be relative. Windows Explorer treats them like normal directories (e.g. if you drag them around, Explorer moves or copies the content, while for symlinks it just moves / copies the link). Thus I think junctions have much more in common with mount points (or bind mounts on Linux) than with symlinks. That Cygwin/MSys2 report some junctions as symlinks, but others as normal directories is quite a strange inconsistency IMO. |
Thanks. I'm pretty convinced by the evidence you presented. (Not that it affects anything. (: ) |
@kblees Do you think that this Pull Request is misguided, based on the evidence you presented? I guess I really should find out how Cygwin/MSys determines whether an IO_REPARSE_MOUNT_POINT indicates a symlink... |
I don't know they do it, but they should do it by checking the ReparseTag member of the reparse point for IO_REPARSE_TAG_SYMLINK (see, for example, https://github.com/dra27/ocaml/blob/559747ad661265244d1c67f20dd342f4a000816d/otherlibs/win32unix/readlink.c#L58) |
@dra27 please note that if that were true, my patch would not even be needed: https://github.com/git-for-windows/git/pull/437/files#diff-94a5535c71aaa5811b4b56854fe25547L12 (note how the current code already tests for |
Sorry, what I meant is that that's the only thing they should be doing. Junctions are a weird - and now deprecated in favour of directory symbolic links - Windows 2000 oddity. IIRC, Cygwin (and therefore MSYS2) tried at some point to do some weird tricks with them to emulate symbolic links which may be why some still show up in directory listings as symbolic links. But it would seem better (to me at least) to treat them as actual directories, not symbolic links. |
Yet it is possible for everybody to create junction points, but symbolic links are only for the Gods. I think we still have to handle them the same as Cygwin does, for the same reason as Cygwin does it: users rely on them. |
My opinion: If we need to implement junctions as symlinks to make git work (e.g. to find the real_path of the work-tree on startup), so be it. However, we shouldn't do it just to follow questionable Cygwin/MSys2 behaviour.
If the SubsituteName starts with https://github.com/Alexpux/Cygwin/blob/msys2-master/winsup/cygwin/fhandler_disk_file.cc#L203 |
@dscho - but it's very flaky support, surely? So the user creates a directory junction in a working directory which git then interprets as a symbolic link. What happens for that same user if they switch between branches where the junction or shouldn't exist? git would delete the junction (correctly) and then try to checkout a symbolic link and so create a SYMLINKD (or fail if the user isn't a god!). So what was a JUNCTION to start with at very best becomes a SYMLINKD or more likely a mess. |
@dra27 look at it from a maintainer's point of view. What is the best solution to prevent reports like "I ran Of course we cannot fix everything. If we switch back to the branch where the path referred to a symlink, with |
@kblees thanks for the analysis! I am mildly in favor of doing the expensive thing and extend What is your opinion about this strategy? |
IIRC neither dirent.c nor fscache.c have the full path when calculating d_type, just the basename, so this would involve some additional path mangling... I just checked The more I think about this, the more I'm convinced that the junction handling in Cygwin/MSys2 is an outright bug, not just a weirdness. As this is Git for Windows, not msysgit (@dscho I know you like that name 😄), we shouldn't use MSys2 as a measure for 'correct' junction handling. Maybe the smartest thing would be to fix this in MSys2? |
@ethomson I am tending to agree, after reflecting about this for a while: we do not create junction points in Git for Windows at all, so maybe we should continue to treat them as if they were real directories? Would you agree? |
I do not. Since Git will not create junction points (only the user will), then the user has explicitly tried to split their working tree across two locations. If Git treats junction points as directories, then this approach seems clever and workable, and it will write through the junction point into the other location. Yay! This works fine until the user checks out a commit that doesn't contain that directory. Git will then remove it from the working tree. Meaning that it will remove the junction, and not what the junction points to. If the user switches back to a commit that does have the directory, it will recreate the directory as a directory and now those contents exist in two places. That's quite surprising. I do believe that you need to treat them specially, and not like a directory. One alternative is to write through them as if they were a directory, so that you can try to give them some deeper meaning, as you might a unix mount point, and refuse to delete them altogether and just leave them empty on-disk. |
@ethomson do I understand correctly that you would prefer it if our |
No - but also, I don't want to get into the behavior of the I think that fundamentally, there are three types of reparse points that we're interested in:
After re-reading @kblees 's opinions, my takeaway is that I do understand the utility of allowing a volume to be mounted in the working tree. So I understand treating 2b as if it were a unix mount point, thus writing to it as if it's a directory (though if you're going to do that, let's please keep it around and not delete it from the working tree; that's particularly cruel.) That said, I think 2a is different scenario. It's basically a symlink into the NT namespace, and acts like a symlink (in that it can be dangling). It's been historically used to enable symlink support from when Windows lacked symlinks. |
@ethomson The problem with your 2a case is that we'd have to sub-divide it into three categories: 2.a.I. Junctions on local disks referring to local paths by drive letter + path. 2.a.II. Junctions on local disks referring to local paths by volume name + path. As MSys2 doesn't recognise volume names as valid path names, these would have to be treated as normal directories for the sake of scripted commands. 2.a.III. Junctions on remote disks referring to remote paths by drive letter or volume name + path. As the remote path may not be accessible over the network other than by letting the server resolve the junction locally, we'd have to treat these as normal directories as well. The only class of junctions that could possibly be treated as symlinks is 2.a.I.. In particular, this means that git on @dscho I just checked Linux's behaviour for bind mounts: Both stat and lstat report the bind-mounted directory as a normal directory. rmdir fails with EBUSY (i.e. checking out a branch that would remove a bind-mounted directory succeeds and leaves the directory intact; checking out a branch that would change a bind-mounted directory into a regular file fails with error message). So handling IO_REPARSE_TAG_MOUNT_POINT specially in mingw_rmdir would IMO be the best solution. |
@kblees hmm. How do we want to deal with the legion of developers who have used junction points as symlink replacements since pre-Vista times and continue now because It Just Works For Them? We cannot practically educate them all, can we? |
Git for Windows has always treated junctions as normal directories, so I don't see why keeping it that way would change anything for these developers... Although in my experience, people used Good Old
AFAIK, JGit / EGit already supports symlinks [1], via the NIO2 API introduced in Java 7 (which also treats junctions as normal directories, not symlinks). I would guess that eclipse folks would be very reluctant to write a Windows-specific JNI DLL, just to follow Git for Windows' "special" junction handling. [1] https://bugs.eclipse.org/bugs/show_bug.cgi?id=354367#c41 |
Because libgit2 has never done this, and the goal here is to get alternate implementations on the same page. Using history as the definition of correctness is not going to succeed here. Similarly, comparing Unix to Windows, and talking about how msys2 or cygwin behaves only defines the functionality for Git for Windows. We should be talking about how all Git implementations behave on Windows, because if we're not consistent with each other, then we're in serious trouble. |
Anyway. You keep saying "normal directories". Is that really what you mean, where we delete junction points and recreate directories in their place? I don't see a good solution that involves the status quo. I think what we can either:
Or some combination of the two, but never should we fall into the "normal directory" behavior. It's quite terrible. I just want to be clear about what we're talking about. |
@kblees there seems to be another voice not quite agreeing with the notion that junction points are a totally wrong choice for emulating symlinks: git-for-windows/msys2-runtime#13 |
God, though, isn't creating symlinks / junction points a whole different kettle of pain? |
No, trying to delete junction points should probably fail, e.g. by
First and foremost, we should be consistent with the behaviour of Windows. Otherwise we could just implement symlinks via plain files, as Cygwin/MSys2 does by default. Additionally, we should consider whether it is feasible or even possible to change junction handling in whatever direction.
|
AFAICT from a first glance, this code doesn't handle the 2.a.III. case correctly either (i.e. it will allow creating junctions to mapped network drives, which doesn't work at all). |
There are no strict equivalence for symlinks on Win32. So we have to use what's possible to use.
* can be ether dir symlink or file symlink, but can't automatically change type depending on which target type is created later (file or directory). Patch is good but incomplete: directory junction on remote systems (network paths/drives) are resolved transparently by remote host and should not be resolved locally as path are specific for drive's host system. So we have some limitation, like unavailability of junction on network paths but all we know that Win32 port of Git has some limitation. |
There's nothing stopping someone from creating a junction point to an absolute path inside the working tree.
They need only point to an absolute path, no? Does that not mean that they can point to volumes that aren't mounted, disks that aren't connected, and if you But the network drive was merely an example. A junction can point to a target that doesn't exist. (Because regardless of what Explorer shows you, they're a soft link.) |
This doesn't answer the question how such links should be useful to anyone. And how would you distinguish the legitimate use of junctions from your case that a user accidentally created a junction when she actually meant to create a symlink?
No. The low level driver responsible for resolving mount points cannot access network drives. |
Well, I can think of myriad ways that a symbolic link or a junction point might be useful. But I'll tell you mine, which is that we have a particular component in our build system that must emit its output in a Obviously, we do not want git to try to do anything with this junction point, so we
I don't understand why there would be a difference based on intent.
Yes, indeed you are correct. If you do create a junction point whose target is a network path, you have a dangling link regardless of whether the network path is available or not. Which was still my greater point: it's a soft link. |
Currently, Git treats junctions like normal directories, and there are people relying on this functionality (I've given two examples above, but there are many more [1]). If we change Git's behaviour to treat all junctions as if they were symlinks, the existing functionality will obviously break. IOW we would need another way to tell Git: this junction should be treated like a directory, and that junction should be treated like a symlink. Currently, this distinction is easy: if you want Git to recurse into a link target's content, use a junction (or bind mount on Unix); if you want Git to record the link itself, use a symlink.
Why does it have to be a junction point? Wouldn't you get the same effect with a symlink (aka
Do you mean that you do not even track the link with Git (i.e. you never |
They should not be, because your implementation is not well defined. Again:
This is absolutely not an implementation that a user could sensibly rely upon.
We do not need two modes. This should be a breaking change. It is already broken and needs to be fixed. |
I'm following this discussion and although I don't have much of an opinion, I do have some questions (possibly useful for others, as well).
Following this concept, would/could Git remove a non-mount junction point during an update?
Can't I do the same with bind mounts? |
Not if you protect them from deletion, as recommended by microsoft [1]. And I already agreed that we should not delete mount points in [1] https://support.microsoft.com/en-us/kb/205524
You can do the same thing with Unix bind mounts. Your alternative of tracking the junction point's absolute Windows path in the Git repository is not useful at all. And you still fail to see the valid use case of junctions pointing outside the working directory.
Let me recap:
Do you realize that this is quite an arrogant position? |
If you're going to recap, please don't put words into my mouth. Indeed I am not interested in this for my particular example. (You asked for an example and I provided one, which happened to be the issue that led me to research this fascinating topic.) I thought it was quite obvious that I am interested in building a solution that both of our users benefit from. Git for Windows - right or wrong - is considered the Gold Standard by which the other implementations are judged. As I said in my first comment:
And subsequently:
And while, yes, my initial suggestion was simply to treat junction points as symbolic links, my very first comment in this thread was:
My position is very simply that the current behavior is not right, and we need a different one. I don't believe that's arrogant. Perhaps I was unclear in my last comment - when I stated that "your implementation is not well defined" I was objecting to the current behavior and how it really must change. I was not objecting some proposed implementation. Whatever we do with junction points, we need to not treat them like normal directories. We need to treat them as junction points, whatever we define that to be. Which I think that you agree with:
But obviously there are other questions like:
(Note the edit - as you point out, if your link target is a network share, the link will be dangling.) The broader point remains: what do you do with dangling links when you're trying to check out into them? |
@ethompson I truly apologize for misunderstanding you so thoroughly. IMO we should treat all mount points the same way, including all junction points, like this:
This would have the following advantages:
|
Apologies for the delay in getting back to this. I've been thinking a bit about the dangling junction case. You mentioned that reading a directory might show this as an empty directory. I wonder if that might be a nice behavior. It's what one would expect on Unix if the drive had not been mounted (it would appear as an empty directory). It may also be less expensive for some implementations to handle things this way - perhaps avoiding a Note that I haven't actually implemented this, so this is all just in my head, but I'm curious about your thoughts. Also - @dscho This differs from your suggestion, above. Are you happy with this? |
Sorry for the long delay. I do agree with what @kblees wrote and will try to come up with the appropriate changes. Some time ;-) |
Symlinks are becoming more interesting in Windows (starting with RS2 update - coming to a PC near you): https://blogs.windows.com/buildingapps/2016/12/02/symlinks-windows-10/#t0ZkyKCrUSyMAVF4.97 Per the Windows Developer Blog, they'll no longer require elevation to create, modify, or destroy so they will become useful to Git on Windows. 😄 |
I just got reminded of this old PR of mine, and was totally surprised that I had failed to close it yet! In the meantime, the wisdom of @kblees and @ethomson managed to convince me that my approach was misguided. To address the remaining issues (such as not traversing junction points or bind mounts in |
…-fsmonitor-take2 Replace pre-V4 of FSMonitor with V4 using GFW experimental commits
We forgot to handle them in the same way. Pointed out by Ed Thomson.
Signed-off-by: Johannes Schindelin [email protected]