-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'*' matches entire path in fnmatch #72904
Comments
A '*' in fnmatch.translate is converted into '.*', which will greedily match directory separators. This doesn't match shell behavior, which is that * will only match file names:
From a posix standpoint, this would easily be fixed by using '[^/]*' instead of '.*'. I'm not sure how to make this work cross-platform though. It's worth noting that some programs (rsync, git) support **, which would correctly translate to '.*'. |
Presumably something like: r'(?:' + r'|'.join({re.escape(os.path.sep), re.escape(os.path.altsep)}) + r')' would cover it completely. I switched to using non-capturing groups over a character class both to deal with the fact that escaping doesn't work the same way for character classes and to cover the possibility (no idea here) that some terrible OS might have a multicharacter path separator. |
Oops, altsep is None, not the empty string when there is only one separator. And I didn't handle inverting the match. Sigh. You get the idea. |
Note that somebody has forked the standard library to implement this: It is also worth noting that the glob standard library: I do not think we can change the default behaviour of fnmatch at this point, but I would like to see this behaviour triggered by an optional argument to the various functions, e.g.: In each case, if glob_asterisks (or whatever other name we came up with) is true, the behaviour would match the pywildcard behaviour, i.e.: I look after the glob matching code in duplicity and would like to start using the standard library to do filename matching for us, but we need the above behaviour. I am happy to do the patching if there is a realistic chance of it being accepted. |
Posted to the [Python-ideas] mailing list, as it is proposing a change to a standard library: Nobody has responded so far, however. I take this as at least no vehement objection to the idea. |
I see that they have commented on the lib that I made a few years ago (python-wildcard). The reason for the creation of that little fork started in this issue: |
For consistency with the corresponding feature in the glob function since Python 3.5, I would suggest to add an extra optional argument 'recursive' instead of 'glob_asterisks'. With the default recursive=False, one gets the old behavior, with recursive=True, it can handle the '**' and '*' as in pywildcard. I realize that with recursive=False, the behavior is not exactly consistent with glob, but I'd still prefer the same name for the optional argument. It is the common terminology for this type of feature. See https://en.wikipedia.org/wiki/Matching_wildcards |
Just for reference, here are a few more implementations of the same idea, next to pywildcard, sometimes combined with other useful features:
The last one is rather active, with regular releases, last one on March 24, 2019. |
I have an implementation of this for pathlib: It exploits a simple trick: swapping path separators and newlines, and then matching without setting If folks thought it was a good idea, we could instead put the implementation in an |
Some other differences between
|
If a sequence of path separators is given to the new argument, `translate()` produces a pattern that matches similarly to `pathlib.Path.glob()`. Specifically: - A `*` pattern segment matches precisely one path segment. - A `**` pattern segment matches any number of path segments - If `**` appears in any other position within the pattern, `ValueError` is raised. - `*` and `?` wildcards in other positions don't match path separators. This change allows us to factor out a lot of complex code in pathlib.
PR available: #106703 |
The PR above adds an argument to def match(path: os.PathLike, pattern: os.PathLike) -> bool:
... This would take care of supplying Questions:
|
I've posted about this on discuss.python.org: https://discuss.python.org/t/add-glob-translate-convert-path-with-shell-wildcards-to-regular-expression/31549 |
Adding to the helpful list from tovrstra, I can recommend pathspec. |
Use `re.Scanner` to scan shell-style patterns, rather than parsing them by hand in a fat loop. This makes the code slower (!) but more obvious, and lays some groundwork for a future `glob.translate()` function.
Add `glob.translate()` function that converts a pathname with shell wildcards to a regular expression. The regular expression is used by pathlib to implement `match()` and `glob()`. This function differs from `fnmatch.translate()` in that wildcards do not match path separators by default, and that a `*` pattern segment matches precisely one path segment. When *recursive* is set to true, `**` pattern segments match any number of path segments, and `**` cannot appear outside its own segment. In pathlib, this change speeds up directory walking (because `_make_child_relpath()` does less work), makes path objects smaller (they don't need a `_lines` slot), and removes the need for some gnarly code. Co-authored-by: Jason R. Coombs <[email protected]> Co-authored-by: Adam Turner <[email protected]>
Addressed in Python 3.13 / cf67ebf / #106703 -- we've added a new |
Add `glob.translate()` function that converts a pathname with shell wildcards to a regular expression. The regular expression is used by pathlib to implement `match()` and `glob()`. This function differs from `fnmatch.translate()` in that wildcards do not match path separators by default, and that a `*` pattern segment matches precisely one path segment. When *recursive* is set to true, `**` pattern segments match any number of path segments, and `**` cannot appear outside its own segment. In pathlib, this change speeds up directory walking (because `_make_child_relpath()` does less work), makes path objects smaller (they don't need a `_lines` slot), and removes the need for some gnarly code. Co-authored-by: Jason R. Coombs <[email protected]> Co-authored-by: Adam Turner <[email protected]>
@barneygale minor thing, I noticed that the |
Add `glob.translate()` function that converts a pathname with shell wildcards to a regular expression. The regular expression is used by pathlib to implement `match()` and `glob()`. This function differs from `fnmatch.translate()` in that wildcards do not match path separators by default, and that a `*` pattern segment matches precisely one path segment. When *recursive* is set to true, `**` pattern segments match any number of path segments, and `**` cannot appear outside its own segment. In pathlib, this change speeds up directory walking (because `_make_child_relpath()` does less work), makes path objects smaller (they don't need a `_lines` slot), and removes the need for some gnarly code. Co-authored-by: Jason R. Coombs <[email protected]> Co-authored-by: Adam Turner <[email protected]>
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
glob.translate()
function #106703fnmatch.translate()
#109879The text was updated successfully, but these errors were encountered: