Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue on big projects #53

Closed
boxed opened this issue Dec 7, 2016 · 9 comments
Closed

Performance issue on big projects #53

boxed opened this issue Dec 7, 2016 · 9 comments

Comments

@boxed
Copy link
Contributor

boxed commented Dec 7, 2016

We have a big project with a big test suite. When starting pytest with testmon enabled it takes something like 8 minutes just to start when running (almost) no tests. A profile dump reveals this:

Wed Dec  7 14:37:13 2016    testmon-startup-profile

         353228817 function calls (349177685 primitive calls) in 648.684 seconds

   Ordered by: cumulative time
   List reduced from 15183 to 100 due to restriction <100>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001  648.707  648.707 env/bin/py.test:3(<module>)
 10796/51    0.006    0.000  648.614   12.718 /Users/andersh/triresolve/env/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py:335(_hookexec)
 10796/51    0.017    0.000  648.614   12.718 /Users/andersh/triresolve/env/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py:332(<lambda>)
 11637/51    0.063    0.000  648.614   12.718 /Users/andersh/triresolve/env/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py:586(execute)
        1    0.000    0.000  648.612  648.612 /Users/andersh/triresolve/env/lib/python2.7/site-packages/_pytest/config.py:29(main)
  10596/2    0.016    0.000  648.612  324.306 /Users/andersh/triresolve/env/lib/python2.7/site-packages/_pytest/vendored_packages/pluggy.py:722(__call__)
        1    0.000    0.000  562.338  562.338 /Users/andersh/triresolve/env/lib/python2.7/site-packages/testmon/pytest_testmon.py:80(pytest_cmdline_main)
        1    0.000    0.000  562.338  562.338 /Users/andersh/triresolve/env/lib/python2.7/site-packages/testmon/pytest_testmon.py:70(init_testmon_data)
        1    0.004    0.004  562.338  562.338 /Users/andersh/triresolve/env/lib/python2.7/site-packages/testmon/testmon_core.py:258(read_fs)
     4310    1.385    0.000  545.292    0.127 /Users/andersh/triresolve/env/lib/python2.7/site-packages/testmon/testmon_core.py:224(test_should_run)
     4310    3.995    0.001  542.647    0.126 /Users/andersh/triresolve/env/lib/python2.7/site-packages/testmon/testmon_core.py:229(<dictcomp>)
  4331550   54.292    0.000  538.652    0.000 /Users/andersh/triresolve/env/lib/python2.7/site-packages/testmon/process_code.py:104(checksums)
        1    0.039    0.039  537.138  537.138 /Users/andersh/triresolve/env/lib/python2.7/site-packages/testmon/testmon_core.py:273(compute_unaffected)
 73396811   67.475    0.000  484.571    0.000 /Users/andersh/triresolve/env/lib/python2.7/site-packages/testmon/process_code.py:14(checksum)
 73396871  360.852    0.000  360.852    0.000 {method 'encode' of 'str' objects}
        1    0.000    0.000   83.370   83.370 /Users/andersh/triresolve/env/lib/python2.7/site-packages/_pytest/main.py:118(pytest_cmdline_main)
        1    0.000    0.000   83.370   83.370 /Users/andersh/triresolve/env/lib/python2.7/site-packages/_pytest/main.py:118(pytest_cmdline_main)

as you can see the last line is 80 seconds cumulative, but the two lines above are 360 and 484 respectively.

This hurts our use case a LOT, and since we use a reference .testmondata file that has been produced by a CI job, it seems excessive (and useless) to recalculate this on each machine when it could be calculated once up front.

So, what do you guys think about caching this data in .testmondata?

@tarpas
Copy link
Owner

tarpas commented Dec 7, 2016

I actually did a lot of optimization for big projects and very few changes. This might be a regression.

Does it also happen if you create and consume .testmondata on the same machine? the read_fs function, shouldn't do any source code processing and checksums if the file modification time on file system matches the modification time stored in .testmondata

Does it happen on second run too?

The source code crunching and checksums is a joke regarding effectivity. It was built in a way that allowed me to learn AST lib and the problem space in general.

I actually find it hard to believe that the str.encode is the slow operation in all the string manipulation and all the loops going on there.

@boxed
Copy link
Contributor Author

boxed commented Dec 7, 2016 via email

@tarpas
Copy link
Owner

tarpas commented Dec 7, 2016

Now I noticed #52 . I think that's the cause of this issue. testmon_data.mtimes also stores absolute paths. mtimes is the optimization to avoid parsing the whole source tree in case the files barely changed

@boxed
Copy link
Contributor Author

boxed commented Dec 8, 2016 via email

@tarpas
Copy link
Owner

tarpas commented Dec 8, 2016

f533c2a partially addresses this issue

@tarpas tarpas mentioned this issue Dec 8, 2016
@boxed
Copy link
Contributor Author

boxed commented Dec 8, 2016

The overhead of encode is pretty terrifying too. In python 2 it's also, as far as I understand, 100% redundant. Does this mean we can optimize the python 2 path nicely but the python 3 version must be super slow? Or is utf8 encoding of str in python 3 super fast?

Something seems weird here :P

@tarpas
Copy link
Owner

tarpas commented Dec 9, 2016

@boxed reports that this is not solved in #54 . The solution then probably is making the file change detection in 2 stages:

  1. modified time,
    if changed
  2. checksum of the whole file
    if changed, then proceed to the expensive parsing and blocks comparison, etc.

tarpas added a commit that referenced this issue Dec 16, 2016
…use case where modified times changed but contents of files didn't. re #53
@tarpas
Copy link
Owner

tarpas commented Dec 16, 2016

@boxed Could you confirm this works for you and there is no obvious regression? (it's in master). It think it's working and it's ready for release.

@boxed
Copy link
Contributor Author

boxed commented Dec 17, 2016 via email

@tarpas tarpas closed this as completed Dec 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants