Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-115060: Speed up pathlib.Path.glob() by omitting initial stat() #117831

Merged
merged 6 commits into from
Apr 13, 2024

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Apr 13, 2024

Since 6258844, paths that might not exist can be fed into pathlib's globbing implementation, which will call os.scandir() / os.lstat() only when strictly necessary. This allows us to drop an initial self.is_dir() call, which saves a stat().

$ ./python -m timeit -s "from pathlib import Path; p = Path.cwd()" "list(p.glob('Lib'))"
20000 loops, best of 5: 13.6 usec per loop
20000 loops, best of 5: 10.4 usec per loop
# --> 1.31x faster

$ ./python -m timeit -s "from pathlib import Path; p = Path.cwd()" "list(p.glob('*.py'))"
5000 loops, best of 5: 88.4 usec per loop
5000 loops, best of 5: 83.8 usec per loop
# --> 1.05x faster

$ ./python -m timeit -s "from pathlib import Path; p = Path.cwd()" "list(p.glob('*'))"
2000 loops, best of 5: 145 usec per loop
2000 loops, best of 5: 139 usec per loop
# --> 1.04x faster

📚 Documentation preview 📚: https://cpython-previews--117831.org.readthedocs.build/

…stat()`

Since 6258844, paths that might not exist can be fed into pathlib's
globbing implementation, which will call `os.scandir()` / `os.lstat()` only
when strictly necessary. This allows us to drop an initial `self.is_dir()`
call, which saves a `stat()`.
Copy link
Contributor

@hauntsaninja hauntsaninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this was previously explicitly documented, should this have a versionchanged in the docs? Oh hmm, I guess this was just documented recently in #114036 by you, so it's probably fine... :-)

I also wonder if we can improve tests, e.g. it looks like the if not self.is_dir(): branch was not covered by tests

@barneygale
Copy link
Contributor Author

Thanks! I think it's probably not important enough for .. versionchanged::, particularly as we don't document the sorts of OSError that are raised or suppressed from is_dir().

@barneygale
Copy link
Contributor Author

barneygale commented Apr 13, 2024

On reflection, I think this works best as a .. versionchanged:: directive. Thank you for the pointer :)

@barneygale barneygale merged commit a74f117 into python:main Apr 13, 2024
33 checks passed
diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024
…stat()` (python#117831)

Since 6258844, paths that might not exist can be fed into pathlib's
globbing implementation, which will call `os.scandir()` / `os.lstat()` only
when strictly necessary. This allows us to drop an initial `self.is_dir()`
call, which saves a `stat()`.

Co-authored-by: Shantanu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage topic-pathlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants