-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit adding lexical normalization support to pathlib
#124825
Comments
What is the use case for |
If you want to rule out cases like I don't mind leaving out I only included it in my suggested API because Edit: even if we leave out |
Opinions differ on this point! e.g. PEP 428 says:
|
Ah, I knew I had seen a written objection to lexical normalisation somewhere, but didn't think to check the original PEP! The context where this came up for me is symlinked virtual environments: fully resolving I think (but haven't confirmed), there are also some symlink games happening in the python-build-standalone macOS builds. |
I think I'd be OK with adding this as a Users of I don't like a name involving "resolve" because I think it could imply symlink resolution, which isn't what we're doing. Nor are we using the POSIX algorithm for pathname resolution. Adding "lexical" helps, but I think there's still a small potential for confusion. Maybe. Names involving "normalize" are a tiny bit problematic because pathlib already performs path normalization (specifically, removal of redundant slashes and |
Feature or enhancement
Proposal:
I'd like to add a
resolve_lexical
method to concretepathlib
objects:As with
resolve()
, if strict isTrue
and any segment of the given path doesn't exist,FileNotFoundError
is raised (so/some/dir/_nonexistent_/..
would fail). If strict isFalse
(the default), all path segments are processed without checking whether they exist (so/some/dir/_nonexistent_/..
lexically resolves to/some/dir
).While theoretically this could be added to
PurePath
without thestrict
option, I don't see any significant benefit to that (whereas I do see benefits to paralleling thePath.resolve()
API as closely as possible).As a minor note, adding this method would give a more direct way of checking if a path contains any symlinks at any level:
path.resolve() == path.resolve_lexical()
(vs the currentpath.is_symlink() or any(segment.is_symlink() for segment in path.parents)
).Chaining the two resolution methods would also be valid (
path.resolve().resolve_lexical()
), with symlinks then being resolved in the segments that actually exist, and the rest of the path, if any, being resolved lexically)On my current project, I recently ran into a pair of subtle symlink-and-relative-path-handling bugs.
path.resolve()
to fully resolve a path to its actual target. This gave a dynamic loading error on macOS, because one of the dynamic libraries the referenced executable needed was stored relative to the symlink, not relative to the actual binary. While that feels like a bug in the way the offending executable was packaged, handling it meant actively avoiding fully resolving paths, and instead respecting their nominal locations.path.absolute()
not only turned off the symlink resolution, it also turned off the path normalisation that removes\..\
segments. This also resulted in dynamic loading errors on macOS when those segments were present in the executable reference (probably due to an underlying Python or OS API conditionally doing its own equivalent ofpath.resolve()
when/../
was present in the path, since the resulting dynamic loading errors looked very similar to those I saw when hitting the first bug).The resolution that handled both situations ended up being to use
os.path
to perform lexical normalisation (viaos.path.abspath
):Having to drop down to the lower level API to request "resolve
/../
relative to the path as given" instead of the default "resolve/../
relative to symlink targets" feels like an unnecessary gap in the abstraction layer.Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
Previously suggested here: #83105
When it comes to symlink security vulnerabilities arising from parent directory traversal, they mostly relate to using symlink resolution to reach an unexpected location:
/some/dir/symlink/../sibling
will expect it to refer to/some/dir/sibling
(lexical normalization)Path(/some/dir/symlink/../sibling).resolve()
actually refers to/parent_of_symlink_target/sibling
(i.e. you have no idea where it points without access to the filesystem state that specifies the destination of/some/dir/symlink
)As a result, the rationale for rejection doesn't feel strong to me (since the intuitive behaviour is unavailable in the high level API, and instead only the subtle system state dependent behaviour is offered)
Also noting that Java does offer lexical normalisation on its Path abstraction: https://docs.oracle.com/en/java/javase/23/docs/api/java.base/java/nio/file/Path.html#normalize()
The text was updated successfully, but these errors were encountered: