-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-77609: Add follow_symlinks argument to pathlib.Path.glob()
#102616
GH-77609: Add follow_symlinks argument to pathlib.Path.glob()
#102616
Conversation
Add a keyword-only *follow_symlinks* parameter to `pathlib.Path.glob()` and `rglob()`, defaulting to false. When set to true, symlinks to directories are followed as if they were directories. Previously these methods followed symlinks except when evaluating "`**`" wildcards; on Windows they returned paths in filesystem casing except when evaluating non-wildcard tokens. Both these problems are solved here. This will allow us to address pythonGH-102613 and pythonGH-81079 in future commits.
Table showing whether symlinks to directories are followed:
Table showing whether filesystem case is returned:
|
I wrote a fairly long post about my concerns on the changed behaviour, but then I realised that my main question is whether the symlink following changes applies to the entire path or only the bit in pattern? If symlinks in the base path are going to be followed regardless of the option, I'm far less concerned. However, I'd like to be able to figure this out from the documentation :) So there's a change required there to clarify (and maybe I'll be more concerned if it clarifies the wrong way) |
Symlinks in the base path (the p in |
Thanks for the feedback btw! I'm also nervous about changing behaviour. I could revise this to make |
I don't want to add I think the behaviour of path parts before any wildcard are easily assumed to be equivalent of being part of the base path. That is:
Everything from the first wildcard becomes a match pattern. We need to recurse deep enough to create paths with the same number of segments, and then match all the candidates against the pattern. Since we're now recursing, choosing not to follow symlinks by default is justifiable. Given the current behaviour is to follow I'm pretty much leaning towards needing a deprecation period though. I think that'll be safest, and probably necessary, so the best way to do that is to support |
Right! I think I agree that leading literals should always be followed. What about trailing literals, e.g. ... but that optimization only works if the |
Also agree on the need to preserve existing behaviour and emit deprecation warnings before we change defaults. I'd argue the default should become true, not false, in future. It's the more useful behaviour. |
I've switched the default to Passing
I mention this only because the names "glob" and "rglob" are a bit opaque unless you've done shell scripting before. The suggested names align with |
Passing
Yeah, and so if we have Maybe we can optimise the case where the |
👍 sounds good. As things stand with this PR, This PR still breaks backwards compat in two (minor?) ways:
|
Matching filesystem case is an improvement, IMHO, even for back-compat. On the I also don't have any real feeling for whether we should |
Pathlib uniformly avoids collapsing I've fixed handling of An updated table showing whether symlinks are followed:
Note that, for the moment, literals are handled exactly the same whether they appear before or after recursive wildcards. |
Looks good. We should still add the deprecation warning for |
Done. But honestly I don't feel great about |
Oh right, I also need to update the docs examples to pass |
Alternative PR that adds support for a |
Closing this PR as I believe #104176 is the right way forward. |
@zooba I'd be much more comfortable with this PR if we delayed deprecating |
We can document it as deprecated with no planned removal date (or planned change of default, in this case). Or just strongly recommend the use of If we think (or people start saying) that the default is harmful/a trap, we can deprecate properly at that point (and wait two releases before changing it). |
I'll pick this up again once #104512 lands. |
In fact, it doesn't really matter whether #104512 lands first. Marking as ready for review! |
On |
Interesting. I think if you can clearly explain the different behaviour ("following symlinks matched by |
I think we're in agreement, then! The docs in this PR say: By default, or when the *follow_symlinks* keyword-only argument is set to
``None``, this method follows symlinks except when expanding "``**``"
wildcards. Set *follow_symlinks* to ``True`` to always follow symlinks, or
``False`` to treat all symlinks as files. Does that sound OK to you? |
Sounds good to me |
Mega, thank you. Would you be up for reviewing the PR as a whole, if you have the time? No worries if not, and no urgency from my side. The patch is somewhat repetitive due to the way globbing is currently implemented. If you find it offensive, we might want to simplify things first via #104512 |
Woot! Thank you Steve! |
Add a keyword-only follow_symlinks parameter to
pathlib.Path.glob()
andrglob()
.When follow_symlinks is
None
(the default), these methods follow symlinks except when evaluating "**
" wildcards. When set to true or false, symlinks are always or never followed, respectively.This allows us to address GH-102613 in future.