Remove `Pip.spawn_install_wheel` & optimize. #2305

jsirois · 2023-12-16T02:52:59Z

Now both the build time resolve code and the run time layout code use
the same parallelization logic to install wheels using pex.pep_427 via
a new pair of pex.jobs.{imap,map}_parallel functions.

Previously, both used pex.jobs.execute_parallel, which incurs a
fork/exec per processed item along with the ensuing overhead of
re-importing all the Pex code needed to do a pex.pep_427 wheel
install. Although this makes sense for calling Pip, which shares no code
with Pex, it is wasted effort to call pure Pex code. Although early
experiments with parallelizing pex.pep_427 wheel installs with a
thread pool showed pex.jobs.execute_parallel to perform consistently
better, I never experimented with multiprocessing process-based pools.
These perform better than both; and, in hindsight, for two obvious
reasons:

A process pool only incurs a fork once per pool slot. Job inputs are
then fed by pipe; so no fork per every input is required as it is
when using pex.jobs.execute_parallel. As a result, the import price
is paid at most once per slot instead of once per job input.
A process pool does not exec, at least on Linux; so all the imports
done in the main process live on in the forked pool processes.

Now both the build time resolve code and the run time layout code use the same parallelization logic to install wheels using `pex.pep_427` via a new pair of `pex.jobs.{imap,map}_parallel` functions. Previously, both used `pex.jobs.execute_parallel`, which incurs a fork/exec per processed item along with the ensuing overhead of re-importing all the Pex code needed to do a `pex.pep_427` wheel install. Although this makes sense for calling Pip, which shares no code with Pex, it is wasted effort to call pure Pex code. Although early experiments with parallelizing `pex.pep_427` wheel installs with a thread pool showed `pex.jobs.execute_parallel` to perform consistently better, I never experimented with multiprocessing process-based pools. These perform better than both; and, in hindsight, for two obvious reasons: 1. A process pool only incurs a fork once per pool slot. Job inputs are then fed by pipe; so no fork per every input is required as it is when using `pex.jobs.execute_parallel`. As a result, the import price is paid at most once per slot instead of once per job input. 2. A process pool does not exec, at least on Linux; so all the imports done in the main process live on in the forked pool processes.

kaos

I've not spent enough time to actually understand the change, so this'll be an approve to unblock in case you wish to proceed trusting the test results alone.

jsirois · 2023-12-16T02:59:34Z

Perf improvements

Build Time - for traditional installed wheel chroot PEXes

Small:

$ hyperfine \
    -w2 \
    -p 'rm -rf ~/.pex' \
    -n 'execute_parallel buildtime wheel chroot install' \
    -n 'imap_parallel buildtime wheel chroot install' \
    'pex --python python3.11 -D src -m main --no-pypi -f find-links cowsay==5.0 ansicolors==1.1.8 -o cowsay.ep.pex' \
    'python3.11 -m pex -D src -m main --no-pypi -f find-links cowsay==5.0 ansicolors==1.1.8 -o cowsay.mp.pex'
Benchmark 1: execute_parallel buildtime wheel chroot install
  Time (mean ± σ):      3.217 s ±  0.018 s    [User: 3.087 s, System: 0.453 s]
  Range (min … max):    3.183 s …  3.238 s    10 runs

Benchmark 2: imap_parallel buildtime wheel chroot install
  Time (mean ± σ):      1.866 s ±  0.010 s    [User: 1.653 s, System: 0.221 s]
  Range (min … max):    1.848 s …  1.877 s    10 runs

Summary
  imap_parallel buildtime wheel chroot install ran
    1.72 ± 0.01 times faster than execute_parallel buildtime wheel chroot install

Medium (compare imap_parallel / execute_parallel pairs):

$ hyperfine \
    -w2 \
    -p 'rm -rf ~/.pex' \
    -n 'raw .whl build' \
    -n 'execute_parallel buildtime wheel chroot install' \
    -n 'imap_parallel buildtime wheel chroot install' \
    -n 'execute_parallel buildtime wheel chroot install --no-compress' \
    -n 'imap_parallel buildtime wheel chroot install --no-compress' \
    'python3.9 -m pex -c pants --no-pypi -f find-links --no-pre-install-wheels pantsbuild.pants==2.17.1 -o pants.whls.mp.pex' \
    'pex --python python3.9 -c pants --no-pypi -f find-links pantsbuild.pants==2.17.1 -o pants.ep.pex' \
    'python3.9 -m pex -c pants --no-pypi -f find-links pantsbuild.pants==2.17.1 -o pants.mp.pex' \
    'pex --python python3.9 -c pants --no-pypi -f find-links pantsbuild.pants==2.17.1 --no-compress -o pants.ep.nc.pex' \
    'python3.9 -m pex -c pants --no-pypi -f find-links pantsbuild.pants==2.17.1 --no-compress -o pants.mp.nc.pex'
Benchmark 1: raw .whl build
  Time (mean ± σ):      1.680 s ±  0.011 s    [User: 1.408 s, System: 0.214 s]
  Range (min … max):    1.662 s …  1.693 s    10 runs

Benchmark 2: execute_parallel buildtime wheel chroot install
  Time (mean ± σ):      9.265 s ±  0.048 s    [User: 12.653 s, System: 0.948 s]
  Range (min … max):    9.168 s …  9.339 s    10 runs

Benchmark 3: imap_parallel buildtime wheel chroot install
  Time (mean ± σ):      7.117 s ±  0.032 s    [User: 6.967 s, System: 0.551 s]
  Range (min … max):    7.077 s …  7.183 s    10 runs

Benchmark 4: execute_parallel buildtime wheel chroot install --no-compress
  Time (mean ± σ):      5.135 s ±  0.064 s    [User: 8.411 s, System: 1.015 s]
  Range (min … max):    5.071 s …  5.305 s    10 runs

Benchmark 5: imap_parallel buildtime wheel chroot install --no-compress
  Time (mean ± σ):      3.067 s ±  0.017 s    [User: 2.816 s, System: 0.629 s]
  Range (min … max):    3.042 s …  3.097 s    10 runs

Summary
  raw .whl build ran
    1.82 ± 0.02 times faster than imap_parallel buildtime wheel chroot install --no-compress
    3.06 ± 0.04 times faster than execute_parallel buildtime wheel chroot install --no-compress
    4.24 ± 0.03 times faster than imap_parallel buildtime wheel chroot install
    5.51 ± 0.05 times faster than execute_parallel buildtime wheel chroot install

$ du -sh pants*.pex | sort -n
52M     pants.whls.mp.pex
53M     pants.ep.pex
53M     pants.mp.pex
239M    pants.ep.nc.pex
239M    pants.mp.nc.pex

Runtime

Small (imap_parallel is better, but parallelization is still a small loss):

$ pex \
    --python python3.11 \
    -D src -m main \
    --no-pypi -f find-links cowsay==5.0 ansicolors==1.1.8 \
    --no-pre-install-wheels -o cowsay.whls.ep.pex
$ python3.11 -m pex \
    -D src -m main \
    --no-pypi -f find-links cowsay==5.0 ansicolors==1.1.8 \
    --no-pre-install-wheels -o cowsay.whls.mp.pex
$ hyperfine \
    -w2 \
    -p 'rm -rf ~/.pex' \
    -n 'serial wheel chroot install' \
    -n 'execute_parallel runtime wheel chroot install' \
    -n 'imap_parallel runtime wheel chroot install' \
    './cowsay.whls.ep.pex' \
    'PEX_MAX_INSTALL_JOBS=0 ./cowsay.whls.ep.pex' \
    'PEX_MAX_INSTALL_JOBS=0 ./cowsay.whls.mp.pex'
Benchmark 1: serial wheel chroot install
  Time (mean ± σ):     493.0 ms ±   3.4 ms    [User: 449.3 ms, System: 43.5 ms]
  Range (min … max):   488.7 ms … 498.1 ms    10 runs

Benchmark 2: execute_parallel runtime wheel chroot install
  Time (mean ± σ):     574.4 ms ±   9.0 ms    [User: 589.3 ms, System: 69.7 ms]
  Range (min … max):   567.6 ms … 597.8 ms    10 runs

Benchmark 3: imap_parallel runtime wheel chroot install
  Time (mean ± σ):     512.5 ms ±   3.0 ms    [User: 538.3 ms, System: 57.9 ms]
  Range (min … max):   508.9 ms … 518.0 ms    10 runs

Summary
  serial wheel chroot install ran
    1.04 ± 0.01 times faster than imap_parallel runtime wheel chroot install
    1.17 ± 0.02 times faster than execute_parallel runtime wheel chroot install

Medium:

$ pex \
    --python python3.9 \
    -c pants \
    --no-pypi -f find-links pantsbuild.pants==2.17.1 \
    -o pants.ep.pex
$ python3.9 -m pex \
    -c pants \
    --no-pypi -f find-links pantsbuild.pants==2.17.1 \
    -o pants.mp.pex
$ pex \
    --python python3.9 \
    -c pants \
    --no-pypi -f find-links pantsbuild.pants==2.17.1 \
    --no-pre-install-wheels -o pants.whls.ep.pex
$ python3.9 -m pex \
    -c pants \
    --no-pypi -f find-links pantsbuild.pants==2.17.1 \
    --no-pre-install-wheels -o pants.whls.mp.pex
$ hyperfine \
    -w2 \
    -p 'rm -rf ~/.pex' \
    -n 'serial wheel chroot install' \
    -n 'serial .whl file install' \
    -n 'execute_parallel runtime wheel chroot install' \
    -n 'imap_parallel runtime wheel chroot install' \
    -n 'execute_parallel runtime .whl install' \
    -n 'imap_parallel runtime .whl install' \
    './pants.ep.pex -V' \
    './pants.whls.mp.pex -V' \
    'PEX_MAX_INSTALL_JOBS=0 ./pants.ep.pex -V' \
    'PEX_MAX_INSTALL_JOBS=0 ./pants.mp.pex -V' \
    'PEX_MAX_INSTALL_JOBS=0 ./pants.whls.ep.pex -V' \
    'PEX_MAX_INSTALL_JOBS=0 ./pants.whls.mp.pex -V'
Benchmark 1: serial wheel chroot install
  Time (mean ± σ):      2.589 s ±  0.026 s    [User: 2.217 s, System: 0.241 s]
  Range (min … max):    2.551 s …  2.639 s    10 runs

Benchmark 2: serial .whl file install
  Time (mean ± σ):      2.861 s ±  0.047 s    [User: 2.451 s, System: 0.274 s]
  Range (min … max):    2.804 s …  2.943 s    10 runs

Benchmark 3: execute_parallel runtime wheel chroot install
  Time (mean ± σ):      2.814 s ±  0.029 s    [User: 5.343 s, System: 0.454 s]
  Range (min … max):    2.782 s …  2.888 s    10 runs

Benchmark 4: imap_parallel runtime wheel chroot install
  Time (mean ± σ):      2.449 s ±  0.030 s    [User: 2.550 s, System: 0.274 s]
  Range (min … max):    2.408 s …  2.515 s    10 runs

Benchmark 5: execute_parallel runtime .whl install
  Time (mean ± σ):      2.904 s ±  0.039 s    [User: 6.545 s, System: 0.618 s]
  Range (min … max):    2.860 s …  2.978 s    10 runs

Benchmark 6: imap_parallel runtime .whl install
  Time (mean ± σ):      2.587 s ±  0.026 s    [User: 2.864 s, System: 0.344 s]
  Range (min … max):    2.555 s …  2.638 s    10 runs

Summary
  imap_parallel runtime wheel chroot install ran
    1.06 ± 0.02 times faster than imap_parallel runtime .whl install
    1.06 ± 0.02 times faster than serial wheel chroot install
    1.15 ± 0.02 times faster than execute_parallel runtime wheel chroot install
    1.17 ± 0.02 times faster than serial .whl file install
    1.19 ± 0.02 times faster than execute_parallel runtime .whl install

jsirois · 2023-12-16T03:04:34Z

pex/jobs.py

+        #     Sorted becomes:
+        #         [10, 6, 4, 3, 1, 1] -> slot1[10, 3] slot2[6, 4, 1, 1]: 13 long pole
+        #
+        input_items.sort(key=costing_function, reverse=True)


There is now new binning logging below and it just proves this out. For example, for the pantsbuild.pants==2.17.1 example case:

# Unsorted pex: Elapsed time per install job: [657197] 3.75s 5 wheels [657199] 4.33s 7 wheels [657198] 4.43s 2 wheels [657200] 7.39s 3 wheels [657196] 10.13s 5 wheels # Sorted biggest 1st pex: Elapsed time per install job: [656446] 4.36s 8 wheels [656447] 4.36s 9 wheels [656448] 4.49s 3 wheels [656444] 7.04s 1 wheel [656445] 7.86s 1 wheel

This ~2.5s improvement in the overall processing time was consistent.

To be clear - this was not new behavior, just new experimental confirmation of what was just previously chicken-scratch with pen and paper proving this to myself in the initial PR.

benjyw · 2023-12-16T04:58:05Z

Looking now

pex/jobs.py

huonw

Nice wins!

huonw · 2023-12-16T08:41:39Z

pex/jobs.py

+                    index=index,
+                    pid=pid,
+                    count=len(elapsed),
+                    wheels=pluralize(elapsed, "wheel"),


Should this be the following?

Suggested change

wheels=pluralize(elapsed, "wheel"),

items=pluralize(elapsed, "item"),

Fixed, though by parametrizing a noun instead.

huonw · 2023-12-16T08:44:12Z

tests/tools/commands/test_repository.py

+            "--find-links",
+            vendored_pip_dists_dir,


Why does this test change? Is it a behaviour change?

This was a behavior change introduced in #2302, which has vendored Pip lazy installing wheel to support Pip.spawn_build_wheel just in time instead of using vendored wheel, which was removed.

In that change I neglected to do an exhaustive search for tests that used "--no-pypi --find-links ..." and needed to build wheels. Those style tests are susceptible to this issue, but in an order dependent way. If any other test that needed to build a wheel ran before this style "--no-pypi --find-links ..." test, the lazy wheel install would be satisfied and short-circuit.

During this PR work, I happened to hit this case iterating on individual test groups and rm -rf ~/.pex.

jsirois requested review from kaos, benjyw, huonw and tgolsson December 16, 2023 02:52

kaos approved these changes Dec 16, 2023

View reviewed changes

jsirois commented Dec 16, 2023

View reviewed changes

benjyw approved these changes Dec 16, 2023

View reviewed changes

pex/jobs.py Outdated Show resolved Hide resolved

huonw approved these changes Dec 16, 2023

View reviewed changes

jsirois added 2 commits December 16, 2023 09:32

Review feedback fixes.

d602969

Revert bad test change.

2435f43

jsirois merged commit 7a2cee3 into pex-tool:main Dec 16, 2023
24 checks passed

jsirois deleted the Pip/remove_install branch December 16, 2023 18:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `Pip.spawn_install_wheel` & optimize. #2305

Remove `Pip.spawn_install_wheel` & optimize. #2305

jsirois commented Dec 16, 2023

kaos left a comment

jsirois commented Dec 16, 2023

jsirois Dec 16, 2023

jsirois Dec 16, 2023

benjyw commented Dec 16, 2023

huonw left a comment

huonw Dec 16, 2023

jsirois Dec 16, 2023

huonw Dec 16, 2023

jsirois Dec 16, 2023

	wheels=pluralize(elapsed, "wheel"),
	items=pluralize(elapsed, "item"),

Remove Pip.spawn_install_wheel & optimize. #2305

Remove Pip.spawn_install_wheel & optimize. #2305

Conversation

jsirois commented Dec 16, 2023

kaos left a comment

Choose a reason for hiding this comment

jsirois commented Dec 16, 2023

Perf improvements

Build Time - for traditional installed wheel chroot PEXes

Runtime

jsirois Dec 16, 2023

Choose a reason for hiding this comment

jsirois Dec 16, 2023

Choose a reason for hiding this comment

benjyw commented Dec 16, 2023

huonw left a comment

Choose a reason for hiding this comment

huonw Dec 16, 2023

Choose a reason for hiding this comment

jsirois Dec 16, 2023

Choose a reason for hiding this comment

huonw Dec 16, 2023

Choose a reason for hiding this comment

jsirois Dec 16, 2023

Choose a reason for hiding this comment

Remove `Pip.spawn_install_wheel` & optimize. #2305

Remove `Pip.spawn_install_wheel` & optimize. #2305