-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FileInfo/DirectoryInfo: enable transparent handling of symbolic links #52908
Comments
Tagging subscribers to this area: @carlossanlop Issue DetailsBackground and MotivationUsers may not be aware that the This proposal makes it possible to operate on such path without the user having to manually handle symbolic links using the APIs added in #24271. Proposed APIA new constructor on class FileInfo
{
public FileInfo(string fileName, bool followLink) { }
}
class DirectoryInfo
{
public DirectoryInfo(string path, bool followLink) { }
} The argument causes information for the link target to be returned for the following properties: class FileInfo
{
public bool IsReadOnly { get { throw null; } set { } }
public long Length { get { throw null; } }
}
class FileSystemInfo
{
public FileAttributes Attributes { get { throw null; } set { } }
public DateTime CreationTime { get { throw null; } set { } }
public DateTime CreationTimeUtc { get { throw null; } set { } }
public abstract bool Exists { get; }
public DateTime LastAccessTime { get { throw null; } set { } }
public DateTime LastAccessTimeUtc { get { throw null; } set { } }
public DateTime LastWriteTime { get { throw null; } set { } }
public DateTime LastWriteTimeUtc { get { throw null; } set { } }
} It does not affect the following operations. Which are either:
class DirectoryInfo
{
// applies to 'path':
public DirectoryInfo? Parent { get { throw null; } }
public DirectoryInfo Root { get { throw null; } }
// applies to 'target':
public void Create() { }
public DirectoryInfo CreateSubdirectory(string path) { throw null; }
public void Delete(bool recursive) { }
public IEnumerable<DirectoryInfo> EnumerateDirectories() { throw null; }
public IEnumerable<DirectoryInfo> EnumerateDirectories(string searchPattern) { throw null; }
public IEnumerable<DirectoryInfo> EnumerateDirectories(string searchPattern, EnumerationOptions enumerationOptions) { throw null; }
public IEnumerable<DirectoryInfo> EnumerateDirectories(string searchPattern, SearchOption searchOption) { throw null; }
public IEnumerable<FileInfo> EnumerateFiles() { throw null; }
public IEnumerable<FileInfo> EnumerateFiles(string searchPattern) { throw null; }
public IEnumerable<FileInfo> EnumerateFiles(string searchPattern, EnumerationOptions enumerationOptions) { throw null; }
public IEnumerable<FileInfo> EnumerateFiles(string searchPattern, SearchOption searchOption) { throw null; }
public IEnumerable<FileSystemInfo> EnumerateFileSystemInfos() { throw null; }
public IEnumerable<FileSystemInfo> EnumerateFileSystemInfos(string searchPattern) { throw null; }
public IEnumerable<FileSystemInfo> EnumerateFileSystemInfos(string searchPattern, EnumerationOptions enumerationOptions) { throw null; }
public IEnumerable<FileSystemInfo> EnumerateFileSystemInfos(string searchPattern, SearchOption searchOption) { throw null; }
public DirectoryInfo[] GetDirectories() { throw null; }
public DirectoryInfo[] GetDirectories(string searchPattern) { throw null; }
public DirectoryInfo[] GetDirectories(string searchPattern, EnumerationOptions enumerationOptions) { throw null; }
public DirectoryInfo[] GetDirectories(string searchPattern, SearchOption searchOption) { throw null; }
public FileInfo[] GetFiles() { throw null; }
public FileInfo[] GetFiles(string searchPattern) { throw null; }
public FileInfo[] GetFiles(string searchPattern, EnumerationOptions enumerationOptions) { throw null; }
public FileInfo[] GetFiles(string searchPattern, SearchOption searchOption) { throw null; }
public FileSystemInfo[] GetFileSystemInfos() { throw null; }
public FileSystemInfo[] GetFileSystemInfos(string searchPattern) { throw null; }
public FileSystemInfo[] GetFileSystemInfos(string searchPattern, EnumerationOptions enumerationOptions) { throw null; }
public FileSystemInfo[] GetFileSystemInfos(string searchPattern, SearchOption searchOption) { throw null; }
public void MoveTo(string destDirName) { }
// applies to 'path':
public override void Delete() { }
}
class FileInfo
{
// applies to 'fileName'
public DirectoryInfo? Directory { get { throw null; } }
public string? DirectoryName { get { throw null; } }
// applies to 'target':
public StreamWriter AppendText() { throw null; }
public FileInfo CopyTo(string destFileName) { throw null; }
public FileInfo CopyTo(string destFileName, bool overwrite) { throw null; }
public FileStream Create() { throw null; }
public StreamWriter CreateText() { throw null; }
public void Decrypt() { }
public void Encrypt() { }
public void MoveTo(string destFileName) { }
public void MoveTo(string destFileName, bool overwrite) { }
public FileStream Open(FileMode mode) { throw null; }
public FileStream Open(FileMode mode, FileAccess access) { throw null; }
public FileStream Open(FileMode mode, FileAccess access, FileShare share) { throw null; }
public FileStream OpenRead() { throw null; }
public StreamReader OpenText() { throw null; }
public FileStream OpenWrite() { throw null; }
public FileInfo Replace(string destinationFileName, string? destinationBackupFileName) { throw null; }
public FileInfo Replace(string destinationFileName, string? destinationBackupFileName, bool ignoreMetadataErrors) { throw null; }
// applies to 'fileName':
public override void Delete() { }
}
class FileSystemInfo
{
// applies to 'fileName'/'path':
protected string FullPath;
protected string OriginalPath;
public string Extension { get { throw null; } }
public virtual string FullName { get { throw null; } }
public abstract string Name { get; }
} Usage ExamplesFileInfo file = new FileInfo("/tmp/my-file", followLink: true);
DateTime lastWriteTime = file.LastWriteTimeUtc;
while (true)
{
Thread.Sleep(1000);
file.Refresh();
DateTime writeTime = file.LastWriteTimeUtc;
if (writeTime != lastWriteTime)
{
Console.WriteLine("The file has changed");
lastWriteTime = writeTime;
}
}
|
This comment has been minimized.
This comment has been minimized.
I don't think this is problematic. I think it is desired to not expose a symlink was involved.
No such file exception.
Return information about the final target.
For Unix, it should be about using |
This seems less flexible and more confusing than #24271. Especially disturbing is that we set the behavior "before", but then we don't even know about it (i.e. if then there are two code paths, we can't choose the right one). In other words it's not so easy to implement the "find all symbolic links" scenario. |
This comment has been minimized.
This comment has been minimized.
How should these properties behave when the path does not exist, or when the link is broken? I think it is valid for these properties to reflect the path that was passed to the constructor and maintain the behavior that has always existed here. Changing that can also cause problems. If the final target is desired, maybe add a property for that. For example:
They reflect the target independent of whether the path is a link or not. |
This comment has been minimized.
This comment has been minimized.
So the constructor throws if the path does not exist, or the link is broken?
I don't understand the problem.
Some properties are proposed to keep matching the path that was passed to the constructor. There is no guessing. |
This comment has been minimized.
This comment has been minimized.
It looks like an insidious trick :-) when some properties and methods return information about the source object and others about the target object. This approach opens up unlimited ways for bugs in applications. (And please add more use-cases in OP - your current demo says nothing.) |
I'd take the path as is, just like it happens now.
@iSazonov you say this causes bugs. @mklement0 you call it problematic. Can you demonstrate issues with some code? I propose to keep invariant that has existed between the constructor argument and properties. Changing those definitely creates opportunity for bugs with existing code. I don't see the value of making these properties reflect the target path.
Except the ones that are part of this proposal.
Other example: const string filename = "/tmp/filename";
File.WriteAllText(filename, "content");
var fi = new FileInfo(filename, followsLink: true);
Console.WriteLine("File length is " + fi.Length);
// Make path a symbolic link to the same content.
const string target = "/tmp/target";
File.Move(filename, target);
File.CreateLink(target, filename); // use API from https://github.com/dotnet/runtime/issues/24271
fi.Refresh();
Console.WriteLine("File length is " + fi.Length); |
This comment has been minimized.
This comment has been minimized.
The goal of the proposal is the user doesn't need to be aware they are providing a symbolic link or not.
So these expectations are not true.
And non links are definitely meant to be used with this constructor.
With this goal in mind, I think it is desired for these properties to reflect the path that was passed to the constructor. And definitely it's desired for |
This is what the |
This comment has been minimized.
This comment has been minimized.
It is interesting proposal but really many APIs already resolve symbolic links transparently and we have no need the implicit magic in the FileInfo API - what we want is to get more possibilities to work with symbolic links as end objects. |
I agree. I find it also ambiguous with the default constructor: File.CreateLink("target", "link");
FileInfo fi = new FileInfo("link");
fi.Create(); // creates "target"
fi.Delete(); // deletes "link" Maybe something explicit should be added like |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I disagree with a target based Delete (and target based MoveTo). My understanding is that the intent here is to let applications see symlinks as basically just a normal file, and not need to worry about the fact they might be a link. I'd argue the correct approach for an API like this is to make symbolic links act as much like hard-links as feasible (Note: for this analogy we are assuming all the interesting properties other than path live on the inode or whatever the most equivalent structure in the file system is, and thus are shared by all hard-links. We are also ignoring the fact that nearly all platforms/file systems disallow directory hard-links. [0]) Behaving just like normal files/directories is literally true of hard-links, because by definition they are just normal files/directories. There is no "this one is the link, that one is the target" distinction. They both/all are just links the the underlying inode, just like every file and directory is. (The exceptions are when a VFS layer steps in for things like mount points, or other reparse-point like constructs not known to the underlying file system.) For hard-links, opening the file opens the "target" (the inode). Editing the dates or permissions edits the "target". Opening the hard-link (in either the access file contents, or traverse directory structure sense) opens the "target". All of those match the proposed behavior. However, moving or deleting hard-links affects exactly the path you supply, and does not affect some arbitrary other path on the filesystem. This analogy cannot be completely perfect, as broken hard-links are not supposed to be a thing than can exist (they imply some form of file system corruption). But treating them broken symlinks symmetrically, with most of the properties, open method, etc acting like any other FileSystemInfo created referencing a non-existent file/directory feels reasonable. Something mostly unrelated that is also worth pointing out is that to really work the way one would want, any FileSystemInfo objects returned from one that created with this flag would probably also want to be created with the same flag, as the user would have no way to specify the constructor parameter otherwise. That is not too big a problem, but it does mean that if any library is using these types as exchange types, they could get different behavior if they are passed in a copy with this flag set or not, and it would only affect just the file passed in, but could also affect other files reached via the one passed in. I know the that the framework generally does not use these as exchange types, but I suspect there are user libraries who do use them. While not a library, powershell is one place that uses them heavily as an exchange type. Footnote [0]As long as they don't form loops, they are actually relatively well behaved on most systems. Of course, they can really confuse tools, since they allow things like a file with only one hardlink to have multiple canonical paths, directories no longer have a single unambigious parent, etc. But despite these issues, they mostly work.One caveat is deletion, due to the typical restriction of only deleting empty directories. An that allows them should allow deleting populated directories if there are other hardlinks (not any counting the special "." and ".." hardlinks seen in some Unix Filesystems, as the normal exception to the no hardlink rule). But many tools would in practice recursively delete the contents, before even trying to unlink the folder itself. |
Replying to #24271 (comment): (With respecting to defaulting to hybrid version):
The fact that one of the applications it would break is PowerShell itself makes me very skeptical that the team would find it acceptable. When a user runs Get-ChildItems to see the contents of a directory, they expect the values shown to represent the entries in the directory. That is the behavior of "dir" and "ls" in other shells, and both are aliased by default in PowerShell to Get-ChildItems. That means the attributes on a symlink must represent the symlink itself. And guess what datatype is returned there? Yep, FileInfo/DirectoryInfo. Having this expose a LinkInfo for the links might be acceptable (but I would be worried that it could break scripts.) Unfortunately that would require a time machine. Existing PowerShell builds only enumerate Directories or Files, so could never get a LinkInfo. Because framework-dependent versions of PowerShell (like the dotnet global tools version) roll forward to new major releases if their preferred version is not installed, the existing code could end up running on the newer version, and would then be broken. If PowerShell had used EnumerateFileSystemInfos and that API were to be updated to return LinkInfo objects by default it might not have been a problem. Except, of course, that there are plenty of other existing applications that would likely start breaking, because they assume they will only get back FileInfo or DirectoryInfo. And I'm betting you would not want to be returning LinkInfo objects by default. But in any case, PowerShell is certainly not the only tool out there that wants to produce accurate directory listings, where attributes on a symlink show as the attributes of the link itself, so even if PowerShell had avoided it, others could be impacted. |
Another consideration (that is not necessarily a problem): As mentioned in passing the first API review for #24271, on Windows fully resolving symlinks can cause network access, with both security and performance implications. This is because they can point at a UNC path. One of the major perf considerations is that the server in question could be one that that is basically a black hole. Resolving such a symlink would basically pause the application until a network timeout occurs. (Perhaps you remember how applications used to lock up, if you accidentally selected an empty or even non-existent floppy drive, while it checked for a disk? Or perhaps a similar, but less common, occurrence with optical drives? Now imagine a pause that could be even longer.) That means that APIs if like DirectoryInfo.EnumerateFiles resolved each file it returns it would need to lazily resolve symlinks in some cases, otherwise a directory full of UNC symlinks to a blackhole servers could be an extremely effective denial of service attack vector. This is mostly fine, because these classes always lazily resolve the properties, and there is no reason not to do that in this scenario either. True, but it does mean that accessing the properties might throw, since the symlink in question could be part of a looping chain of symlinks, and thus not able to be fully resolved. I suppose the other option to to treat this case like the target not existing instead of throwing, which might also be reasonable. The potential problem actually lies with the POSIX platforms, since there it is a requirement to fully resolve a symlink to even know if it is a file or a directory (which very much matters when doing this hybrid approach, since unfortunately |
So do I - As for not defaulting to hybrid representations: good points, I agree. |
@KevinCathcart: With requiring opt-in to the hybrid representation, my original concern returns: if you're handed a preexisting |
To answer my own question, returning to my original idea: If we make However - assuming the path refers to a link whose target exists - you then need to be aware that you're dealing with a non-link representation whose Also, While an existing non-link path could just act as its own target, the question returns as to how to handle Unlike what I proposed earlier, it is probably better to retain the existing behavior of the parameter-less constructor, which quietly accepts a non-existent path. This enables:
Note:
@carlossanlop, as you can see, the above is both cumbersome and obscure:
|
I've changed back from target-based |
I wonder if we could instead add APIs for querying link's target file/directory information and return that in a minimal wrapper (FileSystemEntry) and keep the APIs focused in doing as less allocations as possible. public static class File
{
public static FileSystemEntry GetFinalInformation(ReadOnlySpan<char> path);
}
public static class Directory
{
public static FileSystemEntry GetFinalInformation(ReadOnlySpan<char> path);
}
public class FileSystemInfo
{
public FileSystemEntry FinalInformation => default;
} Notes:
public readonly struct MinimalFileSystemInfo
{
// On Windows this can be obtained with GetFileInformationByHandleEx(lpFileInformation: FileBasicInfo).
// On Unix this can be obtained with stat.
public DateTimeOffset CreationTimeUtc;
public DateTimeOffset LastAccessTimeUtc;
public DateTimeOffset LastWriteTimeUtc;
public FileAttributes Attributes;
} |
Would it make sense to consider this scenario as an extension of |
Do you mean if |
I mean why not add link target support to the enumeration APIs. |
We can add support for enumeration scenarios as well, I was thinkking in an instance method that returns a new public ref struct FileSystemEntry
{
public FileSystemEntry GetFinalLinkTarget();
} I don't know if we should care for the non-final target scenario. |
I think an enhancement of FileSystemEntry looks right direction as implementation details for both enumeration and explicit (sym)link manipulations. See my old comment #52666 (comment). |
My focus is to eliminate the system calls that hop links in
|
Background and Motivation
Users may not be aware that the
fileName
/path
they are providing is a symbolic link.This proposal makes it possible to operate on such path without the user having to manually handle symbolic links using the APIs added in #24271.
Proposed API
A new constructor on
FileInfo
allows to specify whether the instance should return information about the target instead of the symbolic link.The argument causes information for the link target to be returned for the following properties. When there is no target they throw no such file.
It does not affect the following operations. Which are either:
Usage Examples
Implementation
On Linux, this means the information for the properties is retrieved using
stat
instead ofvstat
.For
Delete
, the final target is first located, e.g. by callingrealpath
, and that is then deleted.edits:
IsSymbolicLink
from Proposed API for symbolic links #24271followLink
tofollowLinks
.Delete
target based.Delete
to be path based again.MoveTo
under path-basedThe text was updated successfully, but these errors were encountered: