Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-115060: Speed up pathlib.Path.glob() by not scanning literal parts #117732

Merged
merged 4 commits into from
Apr 12, 2024

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Apr 10, 2024

Don't bother calling os.scandir() to scan for literal pattern segments, like foo in foo/*.py. Instead, append the segment(s) as-is and call through to the next selector with exists=False, which signals that the path might not exist. Subsequent selectors will call os.scandir() or os.lstat() to filter out missing paths as needed.

Timings:

$ ./python -m timeit -s "from pathlib import Path; p = Path.cwd()" "list(p.glob('Lib'))"
5000 loops, best of 5: 69.4 usec per loop
20000 loops, best of 5: 13.6 usec per loop
# --> 5.1x faster

$ ./python -m timeit -s "from pathlib import Path; p = Path.cwd()" "list(p.glob('Lib/'))"
5000 loops, best of 5: 73.3 usec per loop
20000 loops, best of 5: 14.2 usec per loop
# --> 5.16x faster

$ ./python -m timeit -s "from pathlib import Path; p = Path.cwd()" "list(p.glob('Lib/*'))"
1000 loops, best of 5: 362 usec per loop
1000 loops, best of 5: 301 usec per loop
# --> 1.2x faster

$ ./python -m timeit -s "from pathlib import Path; p = Path.cwd()" "list(p.glob('Lib/*/__init__.py'))"
200 loops, best of 5: 1.18 msec per loop
1000 loops, best of 5: 273 usec per loop
# --> 4.32x faster

$ ./python -m timeit -s "from pathlib import Path; p = Path.cwd()" "list(p.glob('Lib/**/__init__.py'))"
50 loops, best of 5: 9.46 msec per loop
50 loops, best of 5: 5.72 msec per loop
# --> 1.65x faster

$ ./python -m timeit -s "from pathlib import Path; p = Path.cwd()" "list(p.glob('Lib/pathlib/__init__.py'))"
1000 loops, best of 5: 210 usec per loop
20000 loops, best of 5: 14.9 usec per loop
# --> 14.1x faster

…al parts

Don't bother calling `os.scandir()` to scan for literal pattern segments,
like `foo` in `foo/*.py`. Instead, append the segment(s) as-is and call
through to the next selector with `exists=False`, which signals that the
path might not exist. Subsequent selectors will call `os.scandir()` or
`os.lstat()` to filter out missing paths as needed.
@barneygale barneygale merged commit 0eb52f5 into python:main Apr 12, 2024
33 checks passed
diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024
…al parts (python#117732)

Don't bother calling `os.scandir()` to scan for literal pattern segments,
like `foo` in `foo/*.py`. Instead, append the segment(s) as-is and call
through to the next selector with `exists=False`, which signals that the
path might not exist. Subsequent selectors will call `os.scandir()` or
`os.lstat()` to filter out missing paths as needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage topic-pathlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant