-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support recursive wildcards in pathlib.PurePath.match() #73435
Comments
>>> from pathlib import Path
>>> Path("a/b/c/d/e.txt").match('a/*/**/*')
False |
Isn't this intended? According to https://docs.python.org/2/library/glob.html and wiki, typical UNIX glob pattern does not have the reqursive matching operator ( |
The ticket is not about glob but about pathlib. Pathlib supports ** directory globbing, but it's only documented as prefix globbing, https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob |
** is supported not just as a prefix. Path('./Lib').glob('**/.py') emits the same paths as Path('.').glob('Lib/**/.py'). But ** is supported only in glob(), not in match(). The support of ** in match() is not documented. Would be worth to document explicitly that it is not supported. |
Seems a bit strange to not have glob() and match() working the same though. Is there any reason for that? |
I just ran into this also. It seems like a very strange omission that match and glob don't support the same patterns (and I'm surprised that they don't share more code). |
Because of backwards compatibility (despite a statement saying it's not guaranteed for pathlib), I think the best approach would be to create a 'globmatch' function for PurePath instead of modifying the match function, and document that the match function does a different kind of matching. This isn't a patch for cpython per se (ironically, don't have time for that this month...), but here's a MIT-licensed gist that patches pathlib2 and adds a globmatch function to it, plus associated tests extracted from pathlib2 and my own ** related tests. Works for me, feel free to do with it as you wish. https://gist.github.com/virtuald/dd0373bf3f26ec0730adf1da0fb929bb |
bpo-34731 was a duplicate of this pytest was affected, as we port more bits to pathlib we hit this as well bruno kindly implemented a local workaround in https://github.com/pytest-dev/pytest/pull/3980/files#diff-63fc5ed688925b327a5af20405bf4b09R19 |
I think the idea of adding a globmatch function is a decent idea. That is what I did in a library I wrote to get more out of glob than what Python offered out of the box: https://facelessuser.github.io/wcmatch/pathlib/#purepathglobmatch. Specifically the differences are globmatch is just a pure match of a path, it doesn't do the implied >>> pathlib.Path("a/b/c/d/e.txt").match('a/*/**/*', flags=pathlib.GLOBSTAR)
True This isn't to promote my library, but more to say, as a user, I found such functionality worth adding. I think it would be generally nice to have such functionality in some form in Python by default. Maybe something called |
Today when porting some random project from os.path to pathlib I encountered a homemade filename matching method that I wanted to port to pathlib.Path.match. Unfortunately >>> pathlib.Path('x').match('**/x')
False although if I have a file called
and zsh's $ ls **/x can find it. It would be really nice to have analogous .glob and .match methods. |
I'm +1 on adding ** to match. My first bet would be to add it to match, not adding a new method, nor a flag, as it should not break compatibility: It would only break iif someone have a Would this break something I did not foresee? |
…tch() Add a new *recursive* argument to `pathlib.PurePath.match()`, defaulting to `False`. If set to true, `match()` handles the `**` wildcard as in `Path.glob()`, i.e. it matches any number of path segments. We now compile a `re.Pattern` object for the entire pattern. This is made more difficult by `fnmatch` not treating directory separators as special when evaluating wildcards (`*`, `?`, etc), and so we arrange the path parts onto separate *lines* in a string, and ensure we don't set `re.DOTALL`.
…ch() Add a new *recursive* argument to `pathlib.PurePath.match()`, defaulting to `False`. If set to true, `match()` handles the `**` wildcard as in `Path.glob()`, i.e. it matches any number of path segments. We now compile a `re.Pattern` object for the entire pattern. This is made more difficult by `fnmatch` not treating directory separators as special when evaluating wildcards (`*`, `?`, etc), and so we arrange the path parts onto separate *lines* in a string, and ensure we don't set `re.DOTALL`.
I have a PR that implements this (pending another fix), but it doesn't support newlines in filenames or patterns. Is that any use, or should I try to work up a version that supports embedded newlines too? #101398 |
I've fixed support for newlines in my patch. I believe these two PRs will resolve this issue:
Would a core dev be willing to review, please? Thanks! |
First PR has landed. This one is now ready: It also makes |
…#101398) `PurePath.match()` now handles the `**` wildcard as in `Path.glob()`, i.e. it matches any number of path segments. We now compile a `re.Pattern` object for the entire pattern. This is made more difficult by `fnmatch` not treating directory separators as special when evaluating wildcards (`*`, `?`, etc), and so we arrange the path parts onto separate *lines* in a string, and ensure we don't set `re.DOTALL`. Co-authored-by: Hugo van Kemenade <[email protected]> Co-authored-by: Alex Waygood <[email protected]>
Re-opening. I'm beginning to think that the implied I'm going to put up a PR that reverts |
In 49f90ba we added support for the recursive wildcard `**` in `pathlib.PurePath.match()`. This should allow arbitrary prefix and suffix matching, like `p.match('foo/**')` or `p.match('**/foo')`, but there's a problem: for relative patterns only, `match()` implicitly inserts a `**` token on the left hand side, causing all patterns to match from the right. As a result, it's impossible to match relative patterns from the left: `PurePath('foo/bar').match('bar/**')` is true! This commit reverts the changes to `match()`, and instead adds a new `globmatch()` method that: - Supports the recursive wildcard `**` - Matches the *entire* path when given a relative pattern As a result, `globmatch()`'s pattern language exactly matches that of `glob()`.
In 49f90ba we added support for the recursive wildcard `**` in `pathlib.PurePath.match()`. This should allow arbitrary prefix and suffix matching, like `p.match('foo/**')` or `p.match('**/foo')`, but there's a problem: for relative patterns only, `match()` implicitly inserts a `**` token on the left hand side, causing all patterns to match from the right. As a result, it's impossible to match relative patterns from the left: `PurePath('foo/bar').match('bar/**')` is true! This commit reverts the changes to `match()`, and instead adds a new `full_match()` method that: - Allows empty patterns - Supports the recursive wildcard `**` - Matches the *entire* path when given a relative pattern
Re-resolving. This is now implemented as |
In 49f90ba we added support for the recursive wildcard `**` in `pathlib.PurePath.match()`. This should allow arbitrary prefix and suffix matching, like `p.match('foo/**')` or `p.match('**/foo')`, but there's a problem: for relative patterns only, `match()` implicitly inserts a `**` token on the left hand side, causing all patterns to match from the right. As a result, it's impossible to match relative patterns from the left: `PurePath('foo/bar').match('bar/**')` is true! This commit reverts the changes to `match()`, and instead adds a new `full_match()` method that: - Allows empty patterns - Supports the recursive wildcard `**` - Matches the *entire* path when given a relative pattern
In 49f90ba we added support for the recursive wildcard `**` in `pathlib.PurePath.match()`. This should allow arbitrary prefix and suffix matching, like `p.match('foo/**')` or `p.match('**/foo')`, but there's a problem: for relative patterns only, `match()` implicitly inserts a `**` token on the left hand side, causing all patterns to match from the right. As a result, it's impossible to match relative patterns from the left: `PurePath('foo/bar').match('bar/**')` is true! This commit reverts the changes to `match()`, and instead adds a new `full_match()` method that: - Allows empty patterns - Supports the recursive wildcard `**` - Matches the *entire* path when given a relative pattern
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
pathlib.PurePath.match()
#101398pathlib.PurePath.full_match()
#114350The text was updated successfully, but these errors were encountered: