-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance fixes #4237
Performance fixes #4237
Conversation
@@ -305,3 +305,8 @@ def fnmatch_ex(pattern, path): | |||
else: | |||
name = six.text_type(path) | |||
return fnmatch.fnmatch(name, pattern) | |||
|
|||
|
|||
def parts(s): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please take a look if a Path.parents
or Path.parts
works out here as well as this looks like a nice potential match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using p = Path(s); list(p.parents) + [p]
makes it slower.
This would only make sense if it would be a Path
already I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No surprise there really. Path.parents is a pretty complex chain of stuff that is hard to follow. Would be amazing if it wasn't slower than the simple string operations in this function.
i believe a more critical element is actually turning the paths variable into a set, at first glance it doesn't seem to be used anywhere else, and using issubset instead of any should be a massive boost since it turns a linear search loop at the python level into a hash membership test loop on the c level |
I don't think a set operator is a big win because we're talking about small numbers here, but I'll certainly give it a try tomorrow. There's another issue with this code in that it searches all the way up to / which makes no sense. It should stop at the base directory of tests (current working directory?). Otherwise pytest is needlessly slower the deeper in the directory tree you have your code. |
That would be
Currently it checks if any of the parts are in already handled paths, which appears to be different from using |
I'm looking into this now also - @boxed, is it OK to push things here, or should I rather post diffs/patches? |
Push away |
I tried to find out why for 1000 files there were 2 million calls to stat but I had trouble figuring this out. Maybe a lot of micro optimizations is what's needed for that issue. |
Codecov Report
@@ Coverage Diff @@
## features #4237 +/- ##
============================================
+ Coverage 95.75% 95.84% +0.09%
============================================
Files 111 111
Lines 24794 24804 +10
Branches 2420 2422 +2
============================================
+ Hits 23741 23773 +32
+ Misses 751 735 -16
+ Partials 302 296 -6
Continue to review full report at Codecov.
|
Some of this might also be due to |
Re stopping at the base dir: this appears to be the case already, no? |
how much improvement did the set usage give? |
It's in the commit message: 4.45s => 3.55s |
Re skipping pycache: we should explicitly skip pyc files for py2.7 too right? I did that change and it gave a modest speed boost so I skipped it while looking for bigger fish. Should have kept it in hindsight. |
Re stopping at the base die: I mean the paths function should stop at the base path. Currently it returns "/" as the first element always which is rather silly. |
Actually, why aren't we skipping non-py-files explicitly? |
My times (using https://github.com/blueyed/dotfiles/blob/abe59a331eb0aeebccba55516ffbf885c577e8c6/usr/bin/timeit-shell),
@RonnyPfannschmidt |
Cool! Can't wait to try this when I get in tomorrow at work! |
Yes, although IIRC it was easier / better to just check the dir upfront, but could also be done in a special way for py27, of course.
You mean the "parts" function, right?
|
Re paths function: yep. Could be make fast with just s[len(basepath):] before splitting shouldn't add too much complexity either. Maybe a bit premature though. |
Re ignoring .pyc files: |
Yep, this brings time almost back to the 3.4 level on my machine at work for the example script. Oooh, but it brings the time down to 4.5 on the test at work, where 6s is the benchmark (although 3.6.0 was 3s). This is good stuff! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks great so far 👍 , would you like to investigate some more of the loose ends or would you like to get this merged to proceed on a new pr?
The test suite at TriOptima now has 104k calls to posix.stat (compared to millions before). Huge improvement! But it's still 0.8 seconds just for all the posix.stat calls. 2906 files and folders (after I delete all pychache dirs and not counting things excluded by norecusedirs). That's 104273/2906 ~= 35.88 posix.stat per file. There are 1767 .py files, so it's 59.0 stat calls per file if you just count that. |
@boxed lovely stats - i believe we may need to create a little trace helper to trace the stats/assign them context - i believe the import system of python generates dozens of them for trying different locations in sys.path for example - i believe there is room for improvement there but its not yet clear how to materialize it yet right now we cant even clearly measure ^^ |
I guess some of those stat calls are for python to just check if it needs to rebuild a pyc file. I made a small script that basically just imports the entire app and I get 11586 stat calls for that, so that doesn't change the math much. |
Hmm... interestingly we get 284957 calls to posix.stat for my toy example script, which is drastically more than for our production code base where the directory has a huge venv. I don't think this should be too hard to figure out. I'll give it a shot today. |
Oh, and to answer the question on merging this.. I am not in a huge hurry to merge no. I'd like to spend today on this at least and if I can't find anything more then I guess it makes sense to merge and then start with a new PR. |
Not at this time. But it might be worth a look if someone can get to the bottom of the assert rewrite import hook performance problems. |
Also renames `_path2confmods` to `_dirpath2confmods` for clarity (it is expected to be a dirpath in `_importconftest`). Uses an explicit maxsize, since it appears to be only relevant for a short period [1]. 1: pytest-dev#4237 (comment)
Also renames `_path2confmods` to `_dirpath2confmods` for clarity (it is expected to be a dirpath in `_importconftest`). Uses an explicit maxsize, since it appears to be only relevant for a short period [1]. Removes the lru_cache on _getconftest_pathlist, which makes no difference when caching _getconftestmodules, at least with the performance test of 100x10 files (pytest-dev#4237). 1: pytest-dev#4237 (comment)
Also renames `_path2confmods` to `_dirpath2confmods` for clarity (it is expected to be a dirpath in `_importconftest`). Uses an explicit maxsize, since it appears to be only relevant for a short period [1]. Removes the lru_cache on _getconftest_pathlist, which makes no difference when caching _getconftestmodules, at least with the performance test of 100x10 files (pytest-dev#4237). 1: pytest-dev#4237 (comment)
Also renames `_path2confmods` to `_dirpath2confmods` for clarity (it is expected to be a dirpath in `_importconftest`). Uses an explicit maxsize, since it appears to be only relevant for a short period [1]. Removes the lru_cache on _getconftest_pathlist, which makes no difference when caching _getconftestmodules, at least with the performance test of 100x10 files (pytest-dev#4237). 1: pytest-dev#4237 (comment)
After the PRs related to performance are merged, I think we are in good shape for a 3.10 release. 👍 |
Also renames `_path2confmods` to `_dirpath2confmods` for clarity (it is expected to be a dirpath in `_importconftest`). Uses an explicit maxsize, since it appears to be only relevant for a short period [1]. Removes the lru_cache on _getconftest_pathlist, which makes no difference when caching _getconftestmodules, at least with the performance test of 100x10 files (pytest-dev#4237). 1: pytest-dev#4237 (comment)
Also renames `_path2confmods` to `_dirpath2confmods` for clarity (it is expected to be a dirpath in `_importconftest`). Uses an explicit maxsize, since it appears to be only relevant for a short period [1]. Removes the lru_cache on _getconftest_pathlist, which makes no difference when caching _getconftestmodules, at least with the performance test of 100x10 files (pytest-dev#4237). 1: pytest-dev#4237 (comment)
Rebased (old HEAD: 5850de6). I've run the timing on each commit again. Some data points:
|
src/_pytest/main.py
Outdated
@@ -469,7 +470,7 @@ def _perform_collect(self, args, genitems): | |||
return items | |||
|
|||
def collect(self): | |||
for parts in self._initialparts: | |||
for parts in self._initialparts: # noqa: F402 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not really like this (it is needed because of the imported path being shadowed).
Suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for initialpart in self._initialparts?
@blueyed So the performance on features is ok with this patch? I am confused by your last comment :P |
Looking a bit at the performance again, using my test script with many dirs again and just running the collect, I get ~3.8s (with the profiler on), and 0.1 of that is spent in normpath (22k calls!) which seems weird to me. Looking at posixpath.py, normpath looks like this: def normpath(path):
"""Normalize path, eliminating double slashes, etc."""
path = os.fspath(path)
if isinstance(path, bytes):
sep = b'/'
empty = b''
dot = b'.'
dotdot = b'..'
else:
sep = '/'
empty = ''
dot = '.'
dotdot = '..'
if path == empty:
return dot
initial_slashes = path.startswith(sep)
# POSIX allows one or two initial slashes, but treats three or more
# as single slash.
if (initial_slashes and
path.startswith(sep*2) and not path.startswith(sep*3)):
initial_slashes = 2
comps = path.split(sep)
new_comps = []
for comp in comps:
if comp in (empty, dot):
continue
if (comp != dotdot or (not initial_slashes and not new_comps) or
(new_comps and new_comps[-1] == dotdot)):
new_comps.append(comp)
elif new_comps:
new_comps.pop()
comps = new_comps
path = sep.join(comps)
if initial_slashes:
path = sep*initial_slashes + path
return path or dot seems to be doing an awful lot! Trying it in ipython:
hmm... I try this:
So this simple optimization seems to make it 83 times faster in the (normal?) case of the paths not needing normalization. It does slow down the normalizing case:
but not by much. Is this stuff worth sending a PR to CPython you think? |
Time: 8.53s => 5.73s
Time: 5.73s => 5.88s/5.82s
Time: 5.73s/5.88s => 5.36s (Before rebase: 4.86s => 4.45s)
Time: 5.36s => 4.85s (before rebase: 4.45s => 3.55s)
Fixed the formatting.
Sure. I've removed the |
Hmm AppVeyor is having trouble to clone the repository... |
Re-started.. likely due to GitHub issues. |
What does AppVeyor do that travis does not? |
Nothing likely, but it is usually behind, i.e. falls into other time windows. btw: I still think it would be enough to only test a smaller subset on AppVeyor, given that it often takes hours to get results / finally green PRs. |
🎉 |
Discussion here: #2206 (comment)
TODO: