-
-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Pip.spawn_install_wheel
& optimize.
#2305
Conversation
Now both the build time resolve code and the run time layout code use the same parallelization logic to install wheels using `pex.pep_427` via a new pair of `pex.jobs.{imap,map}_parallel` functions. Previously, both used `pex.jobs.execute_parallel`, which incurs a fork/exec per processed item along with the ensuing overhead of re-importing all the Pex code needed to do a `pex.pep_427` wheel install. Although this makes sense for calling Pip, which shares no code with Pex, it is wasted effort to call pure Pex code. Although early experiments with parallelizing `pex.pep_427` wheel installs with a thread pool showed `pex.jobs.execute_parallel` to perform consistently better, I never experimented with multiprocessing process-based pools. These perform better than both; and, in hindsight, for two obvious reasons: 1. A process pool only incurs a fork once per pool slot. Job inputs are then fed by pipe; so no fork per every input is required as it is when using `pex.jobs.execute_parallel`. As a result, the import price is paid at most once per slot instead of once per job input. 2. A process pool does not exec, at least on Linux; so all the imports done in the main process live on in the forked pool processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not spent enough time to actually understand the change, so this'll be an approve to unblock in case you wish to proceed trusting the test results alone.
Perf improvementsBuild Time - for traditional installed wheel chroot PEXes
Runtime
|
# Sorted becomes: | ||
# [10, 6, 4, 3, 1, 1] -> slot1[10, 3] slot2[6, 4, 1, 1]: 13 long pole | ||
# | ||
input_items.sort(key=costing_function, reverse=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is now new binning logging below and it just proves this out. For example, for the pantsbuild.pants==2.17.1
example case:
# Unsorted
pex: Elapsed time per install job:
[657197] 3.75s 5 wheels
[657199] 4.33s 7 wheels
[657198] 4.43s 2 wheels
[657200] 7.39s 3 wheels
[657196] 10.13s 5 wheels
# Sorted biggest 1st
pex: Elapsed time per install job:
[656446] 4.36s 8 wheels
[656447] 4.36s 9 wheels
[656448] 4.49s 3 wheels
[656444] 7.04s 1 wheel
[656445] 7.86s 1 wheel
This ~2.5s improvement in the overall processing time was consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear - this was not new behavior, just new experimental confirmation of what was just previously chicken-scratch with pen and paper proving this to myself in the initial PR.
Looking now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice wins!
pex/jobs.py
Outdated
index=index, | ||
pid=pid, | ||
count=len(elapsed), | ||
wheels=pluralize(elapsed, "wheel"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be the following?
wheels=pluralize(elapsed, "wheel"), | |
items=pluralize(elapsed, "item"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, though by parametrizing a noun
instead.
"--find-links", | ||
vendored_pip_dists_dir, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this test change? Is it a behaviour change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a behavior change introduced in #2302, which has vendored Pip lazy installing wheel to support Pip.spawn_build_wheel
just in time instead of using vendored wheel, which was removed.
In that change I neglected to do an exhaustive search for tests that used "--no-pypi --find-links ..." and needed to build wheels. Those style tests are susceptible to this issue, but in an order dependent way. If any other test that needed to build a wheel ran before this style "--no-pypi --find-links ..." test, the lazy wheel install would be satisfied and short-circuit.
During this PR work, I happened to hit this case iterating on individual test groups and rm -rf ~/.pex
.
Now both the build time resolve code and the run time layout code use
the same parallelization logic to install wheels using
pex.pep_427
viaa new pair of
pex.jobs.{imap,map}_parallel
functions.Previously, both used
pex.jobs.execute_parallel
, which incurs afork/exec per processed item along with the ensuing overhead of
re-importing all the Pex code needed to do a
pex.pep_427
wheelinstall. Although this makes sense for calling Pip, which shares no code
with Pex, it is wasted effort to call pure Pex code. Although early
experiments with parallelizing
pex.pep_427
wheel installs with athread pool showed
pex.jobs.execute_parallel
to perform consistentlybetter, I never experimented with multiprocessing process-based pools.
These perform better than both; and, in hindsight, for two obvious
reasons:
then fed by pipe; so no fork per every input is required as it is
when using
pex.jobs.execute_parallel
. As a result, the import priceis paid at most once per slot instead of once per job input.
done in the main process live on in the forked pool processes.