Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pex] Memoize calls to Crawler.crawl() for performance win in find-links based resolution. #187

Closed
wants to merge 1 commit into from

Conversation

kwlzn
Copy link
Contributor

@kwlzn kwlzn commented Dec 17, 2015

While investigating a user-reported performance issue in the creation of pex files (via pants), profiling revealed around ~60% of the total time being spent in pex's Crawler.crawl() function with over 47 calls. Note that calls to Crawler.crawl() involve use of the re module to match html tags from a given index page (in our case, with roughly ~5000+ files) - with excessive application of the re module being a known culprit for slowness.

I was able to approximately repro the same scenario directly with pex:

pex --disable-cache --no-pypi -f http://$URL -f http://$URL/dist \
    pex psutil requests ansible jsonschema -o /tmp/throwaway

..and further inspection here revealed upwards of 24 non-cached Crawler.crawl() calls just for these 5 dependencies.

This PR memoizes calls to Crawler.crawl() to save overhead on subsequent calls for a ~2x/200% speedup in the above test case:

before

real    0m19.943s
user    0m5.437s
sys     0m2.427s

after

real    0m8.288s
user    0m3.420s
sys     0m1.986s

@kwlzn kwlzn force-pushed the kwlzn/pex/memoize_crawl branch 2 times, most recently from 36c361c to 3f3fb4a Compare December 17, 2015 07:17
@kwlzn
Copy link
Contributor Author

kwlzn commented Dec 17, 2015

merged @ fcdee8a

@kwlzn kwlzn closed this Dec 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant