-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow opting out of following directory links (reparse points / symlinks) during recursive file-system enumerations #52666
Comments
Tagging subscribers to this area: @carlossanlop Issue DetailsBackground and MotivationRelated to #24271. Especially on Unix-like platforms, use of (directory) symlinks (symbolic links) is common. If, while recursively enumerating a directory (subtree) directory, symlinks to directories are encountered, these symlinks are currently invariably followed, so that the enumeration includes the content of the linked directory as well. However, you may not want this behavior:
For this reason, standard Unix utilities such as While an opt-in isn't an option anymore for reasons of backward compatibility, an opt-out should be considered. Note:
Proposed APIAdd a new The property name is negotiable. I avoided the term "ReparsePoint" as a generalization (given that it is NTFS-specific) and chose "Link" instead. Usage Examples// E.g., on macOS, where /System/Library contains a lot of directory symlinks.
Directory.EnumerateDirectories(
"/System/Library",
"*",
new EnumerationOptions { RecurseSubdirectories = true, DoNotFollowDirectoryLinks = true, AttributesToSkip = 0 }
); Alternative DesignsPerhaps introduce a RisksNone that I'm aware of.
|
The proposed name conflicts with the naming guidelines: https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/names-of-type-members
Since public bool FollowDirectoryLinks { get; set; } = true; |
Thanks, @Joe4evr; I've updated the initial post accordingly. |
Thanks for creating the proposal, @mklement0 . There are the enumeration APIs that will finish without problems when encountering cyclic symbolic links:
These are the enumeration APIs that will throw when encountering a cycle:
I like the idea of adding the new property to Filtering Also, I think the name should be BTW, I have a related PR to fix a small bug preventing the first two cases to finish successfully. I took this PR as an opportunity to add unit tests to verify enumerations when cyclic symbolic links are found (we didn't have any). #52746 |
Thank you for the thoughtful and detailed response, @carlossanlop.
|
Just brainstorming: would it be better to have an option that either "ignores" or "doesn't follow" links to directories that were already enumerated i.e: break the cycle? Of course that would be more expensive since it implies maintaining an stack throughout enumeration. |
@jozkee, setting However, following links - the current default behavior - should always prevent cycles, which not only prevents re-enumeration of already enumerated directories, but, more importantly, prevents infinite loops. @carlossanlop, I haven't worked with the latest builds, but as of .NET 6.0.0-preview.3.21201.4, such infinite loops can happen, at least on Windows (on macOS, the enumeration eventually stops, after many levels of nesting, but I don't know what drives that). Has this since been fixed / will it be fixed? There is no functional reason not to prevent cycle detection, the only conceivable reason would be to avoid the additional performance cost that it would incur. Truthfully, the default behavior should never have been to follow links.
Then following links as an opt-in could justify the extra cost of cycle detection, as a deliberate choice. In fact, that's how PowerShell's |
@carlossanlop Do you mean only symbolic link to itself in the post?
For a notice. This is implemented in PowerShell 6/7. You could look the code if it helps you to implement the feature in .Net. |
Today we know we want to follow not only symbolic links but also some reparse point too (mount points, OneDrive and others). Perhaps we don't ready to conclude today what reparse points we should follow. Also the list could be adjusted in future since new reparse points can be introduced. So implementing Nevertheless, we can make progress here. In FileSystemEnumerable we can already define ShouldRecursePredicate of FindPredicate type. This allows us to pass FileSystemEntry (Win and Unix) to the action. If we expose ReparsePointTag property in FileSystemEntry users get possibility to implement custom behavior for following links based on reparse point tags. Proposalnamespace System.IO.Enumeration
{
public ref partial struct FileSystemEntry
{
public ReparsePointTag ReparsePointTag { get; } // public if we want full flexibility and delegate all to users
// alternative
private bool IsLikeSymbolicLink() // otherwise, if we don't expect new RPTs will be introduced often
// and we agree to consider only name surrogates and AppExecLink.
}
}
namespace System.IO
{
public enum ReparsePointTag // no need if we will use private IsLikeSymbolicLink()
{
NameSurrogateBit = 0x20000000, // See https://docs.microsoft.com/en-us/windows/win32/fileio/reparse-point-tags
MountPoint = 0xA0000003, // Maps to IO_REPARSE_TAG_MOUNT_POINT, describes a Junction
SymbolicLink = 0xA000000C, // Maps to IO_REPARSE_TAG_SYMLINK, the only value used by Unix
ApplicationExecLink = 0x8000001B, // Maps to IO_REPARSE_TAG_APPEXECLINK
}
public class EnumerationOptions
{
public FollowLinks FollowLinks { get; set; } = = FollowLinks.Follow;;
}
public enum FollowLinks
{
None,
Follow,
FollowWithLoopTracking,
}} Implementation detailsI added I removed
Underlying API already returns 0 for ReparsePointTag if the entity is not reparse point so it makes no sense to replace 0 with -1:
So on Unix FileSystemEnty implementation is (always returns 0): ReparsePointTag ReparsePointTags { get; } On Windows currently we use NtQueryDirectoryFile - see - with FileInformationClass.FileFullDirectoryInformation and get FILE_FULL_DIR_INFORMATION structure. The struct does not contains ReparsePointTag information but we can use FileInformationClass.FileIdExtdDirectoryInfo and get FILE_ID_EXTD_DIR_INFORMATION with ReparsePointTag but the value is not documented in NtQueryDirectoryFile - I don't know whether NtQueryDirectoryFile can do the request. If no we could use GetFileInformationByHandleEx as documented here. ReparsePointTag ReparsePointTags _info->ReparsePointTag; @carlossanlop @jozkee Are we ready for the API review? Update: FILE_ID_EXTD_DIR_INFORMATION contains |
I think we need to get clarity on:
Again, because following by default is problematic in terms of performance and required cycle-detection overhead (in the case of true links), we should consider the breaking change of defaulting to not following. |
Yes, it emulates a regular file system. I think we should think about behavior. The behavior can be:
We even renamed PowerShell method to This force we think that if we want any generalization of RPs like IsLink() method we have to add a mask property (LinkReparsePoints?) where collect all RPs whose behavior we want consider like symbolic link. Then IsLink() would use the property to work as expected and users could add new RPs to the property as new RPs are introduced. |
I am not happy with Future milestone. I'd like to see a solution in 6.0.0 for PowerShell.
namespace System.IO
{
public class EnumerationOptions
{
public FollowLinks FollowLinks { get; set; } = FollowLinks.Follow;
}
public enum FollowLinks
{
None,
Follow,
FollowWithLoopTracking, // implemented on Phase 2
}
} Internally we replace FileInformationClass.FileFullDirectoryInformation with FileInformationClass.FileIdExtdDirectoryInfo to get internally ReparsePointTag and IsLikeSymbolicLink() with locked logic like PowerShell IsReparsePointLikeSymlink() method works.
I am ready to implement Phase 1. |
Background and Motivation
Related to #24271.
Especially on Unix-like platforms, use of (directory) symlinks (symbolic links) is common.
If, while recursively enumerating a directory (subtree) directory, symlinks to directories are encountered, these symlinks are currently invariably followed, so that the enumeration includes the content of the linked directory as well.
However, you may not want this behavior:
For this reason, standard Unix utilities such as
find
require opt-in in order for directory symlinks to be followed.While an opt-in isn't an option anymore for reasons of backward compatibility, an opt-out should be considered.
Note:
The opt-out should apply to all types of directory links, which additionally includes junctions and volume mount points on Windows.
The opt-out should only apply to directory links encountered during enumeration, not to the input path (that is, a directory link as the starting point of an enumeration should always be followed).
Using
AttributesToSkip = FileAttributes.ReparsePoint | FileAttributes.Directory
is not a general solution, because you may still want the link itself to be enumerated.Proposed API
Add a new
bool
DoNotFollowDirectoryLinks
propertyFollowDirectoryLinks
property that defaults totrue
(see @Joe4evr's comment below) to theSystem.IO.EnumerationOptions
class.The property name is negotiable. I avoided the term "ReparsePoint" as a generalization (given that it is NTFS-specific) and chose "Link" instead (without including the word "Symbolic", given that the behavior wouldn't be limited to additional link types beside symbolic links).
Update: @carlossanlop proposes
FollowSymbolicLinks
instead.Usage Examples
Alternative Designs
Perhaps introduce a
RecurseSubdirectoriesExceptLinks
property instead, mutually exclusive withRecurseSubdirectories
, which, however, requires enforcement at runtime.Risks
None that I'm aware of.
The text was updated successfully, but these errors were encountered: