Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-73435: Implement recursive wildcards in pathlib.PurePath.match() #101398

Merged
merged 40 commits into from
May 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
608e917
gh-73435: Implement recursive wildcards in pathlib.PurePath.match()
barneygale Jan 28, 2023
9a43c7f
Simplify code slightly
barneygale Jan 29, 2023
a846279
Fix support for newlines
barneygale Feb 15, 2023
bbd8cd6
Cache translation of individual components
barneygale Feb 15, 2023
b5c002e
Drop 'recursive' argument, make this the only behaviour.
barneygale Feb 15, 2023
0afcd54
Undo modifications to fnmatch.py
barneygale Feb 16, 2023
fe32717
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale Feb 17, 2023
7b6f850
Fix Windows support
barneygale Feb 17, 2023
037488a
Tidy up code.
barneygale Feb 17, 2023
0741950
Add news blurb.
barneygale Feb 17, 2023
e1c9731
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale Feb 20, 2023
db6f0ad
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale Apr 3, 2023
8dff9e2
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale Apr 9, 2023
314679f
Simplify patch; prepare for use in `glob()`
barneygale Apr 9, 2023
90eebcc
Make better use of path object caching.
barneygale Apr 9, 2023
4b5fffd
Add performance tip to docs
barneygale Apr 9, 2023
5e8bc28
Skip re-initialisation of PurePath patterns.
barneygale Apr 20, 2023
e81ab5a
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale Apr 29, 2023
afb8047
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale May 2, 2023
722a1ab
Use `re.IGNORECASE` rather than `os.path.normcase()`
barneygale May 2, 2023
0ccf3df
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale May 6, 2023
ccea5e1
Add whats new entry
barneygale May 11, 2023
dd04294
Update Doc/whatsnew/3.12.rst
barneygale May 11, 2023
b258641
Apply suggestions from code review
barneygale May 14, 2023
ced8998
Explain _FNMATCH_SLICE
barneygale May 14, 2023
a33c7b6
Accidentally a word.
barneygale May 14, 2023
4b3bddb
Cache pattern compilation
barneygale May 14, 2023
6ad30dd
Remove unneeded `from None` suffix, whoops.
barneygale May 14, 2023
052890f
Tiny performance improvement: avoid accessing path.parts
barneygale May 14, 2023
d789b6d
Typo fix
barneygale May 14, 2023
4fe77c6
Avoid hashing path object when compiling pattern.
barneygale May 14, 2023
4770c13
More performance tweaks
barneygale May 14, 2023
559787d
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale May 18, 2023
9c09fc4
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale May 23, 2023
eb35dbc
Re-target to 3.13.
barneygale May 23, 2023
8959dfd
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale May 27, 2023
fec7702
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale May 29, 2023
89bc380
Merge branch 'main' into gh-73435-pathlib-match-recursive
barneygale May 29, 2023
9211297
Add more comments!
barneygale May 30, 2023
73bb309
Update Lib/pathlib.py
barneygale May 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions Doc/library/pathlib.rst
Original file line number Diff line number Diff line change
Expand Up @@ -569,6 +569,13 @@ Pure paths provide the following methods and properties:
>>> PurePath('a/b.py').match('/*.py')
False

The *pattern* may be another path object; this speeds up matching the same
pattern against multiple files::

>>> pattern = PurePath('*.py')
>>> PurePath('a/b.py').match(pattern)
True

As with other methods, case-sensitivity follows platform defaults::

>>> PurePosixPath('b.py').match('*.PY')
Expand All @@ -581,6 +588,10 @@ Pure paths provide the following methods and properties:
.. versionadded:: 3.12
The *case_sensitive* argument.

.. versionchanged:: 3.13
Support for the recursive wildcard "``**``" was added. In previous
versions, it acted like the non-recursive wildcard "``*``".


.. method:: PurePath.relative_to(other, walk_up=False)

Expand Down
3 changes: 3 additions & 0 deletions Doc/whatsnew/3.13.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,9 @@ Improved Modules
pathlib
-------

* Add support for recursive wildcards in :meth:`pathlib.PurePath.match`.
(Contributed by Barney Gale in :gh:`73435`.)

* Add *follow_symlinks* keyword-only argument to :meth:`pathlib.Path.glob` and
:meth:`~pathlib.Path.rglob`.
(Contributed by Barney Gale in :gh:`77609`.)
Expand Down
99 changes: 85 additions & 14 deletions Lib/pathlib.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,13 +54,30 @@ def _ignore_error(exception):
getattr(exception, 'winerror', None) in _IGNORED_WINERRORS)


@functools.cache
def _is_case_sensitive(flavour):
return flavour.normcase('Aa') == 'Aa'

#
# Globbing helpers
#


# fnmatch.translate() returns a regular expression that includes a prefix and
# a suffix, which enable matching newlines and ensure the end of the string is
# matched, respectively. These features are undesirable for our implementation
# of PurePatch.match(), which represents path separators as newlines and joins
# pattern segments together. As a workaround, we define a slice object that
# can remove the prefix and suffix from any translate() result. See the
# _compile_pattern_lines() function for more details.
_FNMATCH_PREFIX, _FNMATCH_SUFFIX = fnmatch.translate('_').split('_')
_FNMATCH_SLICE = slice(len(_FNMATCH_PREFIX), -len(_FNMATCH_SUFFIX))
barneygale marked this conversation as resolved.
Show resolved Hide resolved
_SWAP_SEP_AND_NEWLINE = {
'/': str.maketrans({'/': '\n', '\n': '/'}),
'\\': str.maketrans({'\\': '\n', '\n': '\\'}),
}


@functools.lru_cache()
def _make_selector(pattern_parts, flavour, case_sensitive):
pat = pattern_parts[0]
Expand Down Expand Up @@ -92,6 +109,51 @@ def _compile_pattern(pat, case_sensitive):
return re.compile(fnmatch.translate(pat), flags).match


@functools.lru_cache()
def _compile_pattern_lines(pattern_lines, case_sensitive):
"""Compile the given pattern lines to an `re.Pattern` object.

The *pattern_lines* argument is a glob-style pattern (e.g. '**/*.py') with
its path separators and newlines swapped (e.g. '**\n*.py`). By using
newlines to separate path components, and not setting `re.DOTALL`, we
ensure that the `*` wildcard cannot match path separators.

The returned `re.Pattern` object may have its `match()` method called to
match a complete pattern, or `search()` to match from the right. The
argument supplied to these methods must also have its path separators and
newlines swapped.
"""

# Match the start of the path, or just after a path separator
parts = ['^']
for part in pattern_lines.splitlines(keepends=True):
if part == '**\n':
# '**/' component: we use '[\s\S]' rather than '.' so that path
# separators (i.e. newlines) are matched. The trailing '^' ensures
# we terminate after a path separator (i.e. on a new line).
part = r'[\s\S]*^'
barneygale marked this conversation as resolved.
Show resolved Hide resolved
elif part == '**':
# '**' component.
part = r'[\s\S]*'
elif '**' in part:
raise ValueError("Invalid pattern: '**' can only be an entire path component")
else:
# Any other component: pass to fnmatch.translate(). We slice off
# the common prefix and suffix added by translate() to ensure that
# re.DOTALL is not set, and the end of the string not matched,
# respectively. With DOTALL not set, '*' wildcards will not match
# path separators, because the '.' characters in the pattern will
# not match newlines.
part = fnmatch.translate(part)[_FNMATCH_SLICE]
parts.append(part)
# Match the end of the path, always.
parts.append(r'\Z')
flags = re.MULTILINE
if not case_sensitive:
flags |= re.IGNORECASE
return re.compile(''.join(parts), flags=flags)


class _Selector:
"""A selector matches a specific glob pattern part against the children
of a given path."""
Expand Down Expand Up @@ -276,6 +338,10 @@ class PurePath:
# to implement comparison methods like `__lt__()`.
'_parts_normcase_cached',

# The `_lines_cached` slot stores the string path with path separators
# and newlines swapped. This is used to implement `match()`.
'_lines_cached',

# The `_hash` slot stores the hash of the case-normalized string
# path. It's set when `__hash__()` is called for the first time.
'_hash',
Expand Down Expand Up @@ -441,6 +507,16 @@ def _parts_normcase(self):
self._parts_normcase_cached = self._str_normcase.split(self._flavour.sep)
return self._parts_normcase_cached

@property
def _lines(self):
# Path with separators and newlines swapped, for pattern matching.
try:
return self._lines_cached
except AttributeError:
trans = _SWAP_SEP_AND_NEWLINE[self._flavour.sep]
self._lines_cached = str(self).translate(trans)
return self._lines_cached

def __eq__(self, other):
if not isinstance(other, PurePath):
return NotImplemented
Expand Down Expand Up @@ -697,23 +773,18 @@ def match(self, path_pattern, *, case_sensitive=None):
"""
Return True if this path matches the given pattern.
"""
if not isinstance(path_pattern, PurePath):
path_pattern = self.with_segments(path_pattern)
if case_sensitive is None:
case_sensitive = _is_case_sensitive(self._flavour)
pat = self.with_segments(path_pattern)
if not pat.parts:
pattern = _compile_pattern_lines(path_pattern._lines, case_sensitive)
if path_pattern.drive or path_pattern.root:
return pattern.match(self._lines) is not None
elif path_pattern._tail:
return pattern.search(self._lines) is not None
else:
raise ValueError("empty pattern")
pat_parts = pat.parts
parts = self.parts
if pat.drive or pat.root:
if len(pat_parts) != len(parts):
return False
elif len(pat_parts) > len(parts):
return False
for part, pat in zip(reversed(parts), reversed(pat_parts)):
match = _compile_pattern(pat, case_sensitive)
if not match(part):
return False
return True


# Subclassing os.PathLike makes isinstance() checks slower,
# which in turn makes Path construction slower. Register instead!
Expand Down
24 changes: 23 additions & 1 deletion Lib/test/test_pathlib.py
Original file line number Diff line number Diff line change
Expand Up @@ -310,8 +310,30 @@ def test_match_common(self):
self.assertFalse(P('/ab.py').match('/a/*.py'))
self.assertFalse(P('/a/b/c.py').match('/a/*.py'))
# Multi-part glob-style pattern.
self.assertFalse(P('/a/b/c.py').match('/**/*.py'))
self.assertTrue(P('a').match('**'))
self.assertTrue(P('c.py').match('**'))
self.assertTrue(P('a/b/c.py').match('**'))
self.assertTrue(P('/a/b/c.py').match('**'))
self.assertTrue(P('/a/b/c.py').match('/**'))
self.assertTrue(P('/a/b/c.py').match('**/'))
self.assertTrue(P('/a/b/c.py').match('/a/**'))
self.assertTrue(P('/a/b/c.py').match('**/*.py'))
self.assertTrue(P('/a/b/c.py').match('/**/*.py'))
self.assertTrue(P('/a/b/c.py').match('/a/**/*.py'))
self.assertTrue(P('/a/b/c.py').match('/a/b/**/*.py'))
self.assertTrue(P('/a/b/c.py').match('/**/**/**/**/*.py'))
self.assertFalse(P('c.py').match('**/a.py'))
self.assertFalse(P('c.py').match('c/**'))
self.assertFalse(P('a/b/c.py').match('**/a'))
self.assertFalse(P('a/b/c.py').match('**/a/b'))
self.assertFalse(P('a/b/c.py').match('**/a/b/c'))
self.assertFalse(P('a/b/c.py').match('**/a/b/c.'))
self.assertFalse(P('a/b/c.py').match('**/a/b/c./**'))
self.assertFalse(P('a/b/c.py').match('**/a/b/c./**'))
self.assertFalse(P('a/b/c.py').match('/a/b/c.py/**'))
self.assertFalse(P('a/b/c.py').match('/**/a/b/c.py'))
self.assertRaises(ValueError, P('a').match, '**a/b/c')
self.assertRaises(ValueError, P('a').match, 'a/b/c**')
# Case-sensitive flag
self.assertFalse(P('A.py').match('a.PY', case_sensitive=True))
self.assertTrue(P('A.py').match('a.PY', case_sensitive=False))
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add support for recursive wildcards in :meth:`pathlib.PurePath.match`.