-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sideband discussion about replacing EvalSymlinks #4
Comments
It doesn't for path manipulations but it does as soon as those paths hit a syscall. If you are referring to pure string manipulation, then no it doesn't. Feel free to open issues about them and tag me (please just
Thanks for these pointers, I'll have a look at Join and Clean.
I know your opinion, I've read your very detailed issue about it. I'm disagreeing with it, but I hope we can still have a constructive discussion about it. Some of the bugs you reported are fixed with my CL, as they relate to long paths. Others are fixed by another CL I have pending (EvalSymlinks is not calling fixLongPath before FindFirstFile). My position you might disagree with is that these are just bugs and as such I think they should be fixed. We might still end up in a situation where EvalSymlinks does not have any good usage on Windows -- I'm not ruling that out, but I can't make myself convinced until we finish fixing the bugs.
Well if you suggest that we deprecate EvalSymlinks and don't add anything else, then we're giving no options to Go programmers. It might be that you think that the world is a better place without any path canonicalization function -- I sympathize with that, but it contradicts what every single other programming language and framework in the world is doing. We can't have the perfect being enemy of the good -- we have to be pragmatic and realize that people leave in a world where they have to interoperable with other software that does indeed path canonicalization. That's a fundamental point where I don't think we can easily agree. So, I would really love to hear your technical opinion on how you would go about solving the issue at hand: either proving a path resolution function, or providing a different set of functions to solve the same problems (like my proposal of filepath.SameFile), without pushing every request back. |
In fact I agree.
Sure there needs to be a replacement! It just shouldn't be making the world look like the stage for child play. Canonicalization can't be done on Windows. It will break in various simple situations that I have tested already and those are hardly all cases that need to be considered. And it doesn't definitively tell you whether one path is inside another, so it doesn't solve the author's problem. Path canonicalization is a wrong concept if it is to be ported to Windows. The perfect here is resolving all links! Don't do that and canonicalization is for about all apps good enough - you can do it with There obviously needs to be functions that translate various kinds of indirections, such as drive mappings, volume access paths, junctions and reparse points, but those functions should adhere to all the objections against |
Your proposal for Perhaps it will become slightly more complex than just that single API though - there needs to be a way to represent identifiers, store them, compare them and ask for various kinds of things like whether this path is inside that directory with some identifier. |
Well, there's already os.SameFile which takes two os.FileInfo and is similar to what you are describing. There are reasons I think an API with a pathname is better -- but I haven't had time to write them down yet, so to hear your comments as well |
I can totally imagine Git LFS to have to keep track of some context to do these file system checks. That context may be obtained from a path string and the second path may be a string. Keeping a context may improve performance by at least a factor two, before considering types of caching that it may keep track of, which further improves performance. Those are the fundamentals proper software should use. A more convenient, slow API using strings only is a no brainer once that is in place. |
SameFile uses https://github.com/golang/go/blob/612a363bef9ae29d190f6daa2a5a1623f78c874b/src/os/types_windows.go#L216 It doesn't call fixLongPath though... so it won't work on most peculiar file paths, but I can fix that right away and add tests to make sure it works for any kind of Windows path. We then could have Then we need something like |
BY_HANDLE_FILE_INFORMATION (fileapi.h) - Win32 apps | Microsoft Docs
So the comparison needs to include a unique identifier for the server coughing up the information - which means resolving UNC paths and perhaps even canonicalizing the server name - and to be better safe than sorry, I don't immediately see whether this works for directories, although there is a comment to that effect - won't hurt to test that. The docs stick to calling the operand 'file' everywhere. I really like that with this links can be left unresolved. If indeed links aren't resolved by the However every comparison with just string arguments needs to call this twice, so for high-volume calling, the fundamentals are to use a variant that takes just one string. Worse for the |
It's true that you could cook up a scenario where two disks have the same volume serial number and the combine a clash in the 64-bit identifier, but I think that situation is rare enough that I won't be happy to complicate the code more than that. If anything, one might decide to document this.
No we actually do need to solve the links. We need a function that tells use that M:\FOO and \SERVER\SHARE\FOO is the same file if \SERVER\SHARE is mounted as M. That is the whole point of our discussion, isn't it? We're saying that we are not going to provide a canonicalized path because the concept itself is moot, but we need a way to assess that two files are indeed the same.
I was thinking of calling GetFinalPathNameByHandle with VOLUME_NAME_NT or VOLUME_NAME_GUID and compare the paths. Any case where this wouldn't work? |
It is probably OK to be optimistic about this most of the time, but then why need to resolve links? If
I think that is the set of functions that will tell you
I think it is pointless to do that and asking for bugs and edge cases. Why have
Each of the paths I tested in each situation with The situations where this might still not work is perhaps DFS or CSV or similar technologies, where multiple servers are able to cough up the same file from a shared location. You might get different ID's for the same file. That needs to be tested. And I don't think Google can do that during CI today. By the way, these ID's are almost certainly not persistable. Apps that would need to do these operations across reboots would have to obtain NTFS object identifiers, which are persistable, but aren't available for FAT type of volumes. |
@rasky I don't know whether
fixLongPath
fixes many issues. The prefixes aren't specific to long paths. I don't have a complete oversight readily available for the issues with the API's, but some of those I see withJoin
are very common accrosspath/filepath
.There is a whole series of issues evidencing that there are string manipulations that are just wrong. Often the first
\
is stripped off the prefix, for example. The\\.\
prefix isn't recognized at all and there are lots of cases where\\?\
is being ignored. After all,\\?\
means do not parse i.e. leave unmodified. I believeClean
is a major source of issues and that function is used in several other API's.Fixing any part of
EvalSymlinks
is not an advantage imho. Better leave it broken and deprecate it.Then also don't introduce a drop-in replacement that will suffer many of the same issues. There are much better ways of solving the problems at hand. I didn't mention any use of
GetFinalPathNameByHandle
will have to iterate flags until there is a result that is not 'Path not found' or another error and then still the developer might have to hint at how to call that API to get the correct result. On top of that, it is bad performance wise to have string arguments and do the retries for perhaps thousands of files in succession.The text was updated successfully, but these errors were encountered: