-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue on big projects #53
Comments
I actually did a lot of optimization for big projects and very few changes. This might be a regression. Does it also happen if you create and consume .testmondata on the same machine? the read_fs function, shouldn't do any source code processing and checksums if the file modification time on file system matches the modification time stored in .testmondata Does it happen on second run too? The source code crunching and checksums is a joke regarding effectivity. It was built in a way that allowed me to learn AST lib and the problem space in general. I actually find it hard to believe that the str.encode is the slow operation in all the string manipulation and all the loops going on there. |
I actually did a lot of optimization for big projects and very few changes. This might be a regression.
Does it also happen if you create and consume .testmondata on the same machine? the read_fs function, shouldn't do any source code processing and checksums if the file modification time on file system matches the modification time stored in .testmondata
Oh. That's crucial data. Could probably go into the SQLite SN and set the modification dates so I avoid this problem :P I already run through it to change the absolute paths to correspond to the paths on developers machines instead of the CI machine (and I've opened an issue for that too :P) so should be simple to hack around.
It was two days ago I did it locally so don't remember clearly. Will try tomorrow.
Does it happen on second run too?
I refetched the database always so that gets screwed up! Will look at that. It's definitely faster second time the file system changes are detected when I run with ptw.
The source code crunching and checksums is a joke regarding effectivity. It was built in a way that allowed me to learn AST lib and the problem space in general.
Heh. I skimmed the code and it seems pretty complex to me. I honestly didn't understand why, what or how it's doing things so prepare for some stupid questions:
Couldn't you basically do a git stat if it's a git dir?
I've also recently worked with baron (as part of my mutation testing library mutmut) and it creates ASTs that cleanly round trip back to code so you can get exact line numbers from the AST for nodes. Would that help?
|
Now I noticed #52 . I think that's the cause of this issue. testmon_data.mtimes also stores absolute paths. mtimes is the optimization to avoid parsing the whole source tree in case the files barely changed |
Ok. I'll check into handling this too. I shouldn't of be able to hack around the modification time issue. Probably by just using some data from git and setting the modification time on the files directly, or modifying the database based on the git data. Either way it seems fairly simple to handle.
|
f533c2a partially addresses this issue |
The overhead of Something seems weird here :P |
…use case where modified times changed but contents of files didn't. re #53
@boxed Could you confirm this works for you and there is no obvious regression? (it's in master). It think it's working and it's ready for release. |
I'm currently away from work sick and have only next week left before vacation plus parental leave so either I can test this next week or in march(!). I am not feeling good about my chances of getting back to work next week (the kids and wife also needs to get well plus me) so I say release now.
I looked at the change a bit before and it seems pretty straight forward to me.
Great job on all this!
|
We have a big project with a big test suite. When starting pytest with testmon enabled it takes something like 8 minutes just to start when running (almost) no tests. A profile dump reveals this:
as you can see the last line is 80 seconds cumulative, but the two lines above are 360 and 484 respectively.
This hurts our use case a LOT, and since we use a reference .testmondata file that has been produced by a CI job, it seems excessive (and useless) to recalculate this on each machine when it could be calculated once up front.
So, what do you guys think about caching this data in .testmondata?
The text was updated successfully, but these errors were encountered: