Add support for `--no-pre-install-wheels` and `--max-install-jobs`. #2298

jsirois · 2023-12-13T03:45:00Z

The --no-pre-install-wheels option causes built PEXes to use raw
.whl files. For --layout zipapp this means a single .whl file is
STORED per dep, and for --layout {packed,loose} this means the loose
.deps/ dir contains raw .whl files. This speeds up all PEX builds by
avoiding pre-installing wheel deps (~unzipping into the PEX_ROOT) and
then, in the case of zipapp and packed layout, re-zipping. For large
dependencies the time savings can be dramatic.

Not pre-installing wheels comes with a PEX boot cold-start performance
tradeoff since installation now needs to be done at runtime. This is
generally a penalty of O(100ms), but that penalty can be erased for some
deployment scenarios with the new --max-install-jobs build option /
PEX_MAX_INSTALL_JOBS runtime env var. By default, runtime installs are
performed serially, but this new option can be set to use multiple
parallel install processes, which can speed up cold boots for large
dependencies.

Fixes #2292

A new `PEX_MAX_INSTALL_JOBS` variable will use this.

This change renames types and variables to take the emphasis off installation and put it on the more generic result of a resolve (which will soon include raw `.whl` files as an option in addition to default installed wheel chroots). Also factor out the machinery needed to be able to ask `BuildAndInstall` for `.whl` file results in addition to the current installed wheel chroot results.

These will be used to select raw wheel dependencies (vs the current pre-installed wheel chroots) and the parallelism used to install dependencies at runtime upon the first PEX boot.

This will be used in subsequent changes.

This support will automatically be used as soon as resolves are wired with an option to return raw `.whl` file results.

This will be needed when building PEXes that use raw `.whl` file dists.

Use the Pip resolver to test the resulting PEX contains .whl files that are not stored in the PEX with compression.

jsirois · 2023-12-13T03:48:22Z

I'll attach some example perf numbers for the torch 2GB case here as soon as CI goes green. Suffice to say, this is both a build time win and run time win for that case.

jsirois · 2023-12-13T03:50:26Z

Reviewers: Thanks in advance - this is large. I did break it up into independent commits though, none of which is larger than 550 delta-lines, with most being ~100 delta lines. So instead of ~12 PRs you get 1 with 12 commits. I needed the 1 personally to work out all the titchy details comprehensively and ensure this all worked / vet it was complicated and actually required the 2 new knobs instead of just being the new default.

The break is only under pypy3.{7,8,9,10} and was caused by the prompt-toolkit 3.0.42 release today.

Working through the perf analysis in pex-tool#2292 brought these to light.

jsirois · 2023-12-13T19:33:32Z

Alright, #2292 now has perf analysis for the OP torch case. I'm still working on the 3 mac-specific IT failures, but this is ready for review.

These were solved by switching away from the `temporary_dir` context manager to pytest's `tmpdir` fixture. Under `temporary_dir` the `~/.pex/unzipped_pexes/*/.bootstrap` symlink was not working. The mechanism by which this wasn't working still needs to be root-caused.

jsirois · 2023-12-13T21:57:32Z

pex/environment.py

+            if not self._pex_info.deps_are_wheel_files:
+                raise ResolveError(
+                    "Cannot resolve .whl files from PEX at {pex}; its dependencies are in the "
+                    "form of pre-installed wheel chroots.".format(pex=self.source_pex)
                )


In general I only like to add orthogonal features that can be combined freely. There are 3 Pex resolver sources: Pip, lock files and PEX repositories, and 2 PEX dependency formats: installed wheel chroots and (now) raw .whl files. In this 6 element matrix, only this case of resolving raw .whl files from a PEX repository containing dependencies in installed wheel chroot form cannot be satisfied currently (this is tested explicitly in the changes to test_pex_repository_resolve.py).

The pep_427.py work that now powers .whl file installs instead of pip install does make it possible to plug this gap in a later release though with some modification of pep_376.py. This will also allow the existing PEX_TOOLS=1 ./a.pex repository extract ... tool to produce fully correct wheels for the installed wheel chroot dependencies in PEXes of that style as well.

Filed: #2299

kaos

Not gone through all commits yet. Only been reading the code, trying to grok the change with limited knowledge of the surroundings.

pex/bin/pex.py

huonw

The numbers from #2292 are awesome, nice!

huonw · 2023-12-13T23:06:53Z

pex/bin/pex.py

+    group.add_argument(
+        "--max-install-jobs",
+        dest="max_install_jobs",
+        default=1,


I'm assuming this default is to preserve behaviour; if a breaking change was possible (e.g. in 3.0.0), would you consider defaulting to -1?

Probably not. Parallel installs slow down most cases and although -1 tries to guess the divide in the cases, I'm sure it does so poorly. My tests were on a 16 core SSD machine and I'm not confident on the heuristic for my machine let alone other disk / CPU combos.

pex/environment.py

huonw · 2023-12-13T23:24:56Z

pex/layout.py

+                        with identify_layout({pex!r}) as layout, atomic_directory(
+                            {spread_dest!r}, source={source!r}
+                        ) as spread_chroot:
+                            if not spread_chroot.is_finalized():
+                                with TRACER.timed({extracting_msg!r}):
+                                    layout.extract_dist(
+                                        dest_dir=spread_chroot.work_dir,
+                                        dist_relpath={dist_relpath!r},
+                                        is_wheel_file={is_wheel_file!r}
+                                    )
+
+                        symlink_dest = {symlink_dest!r}
+                        safe_mkdir(os.path.dirname(symlink_dest))
+                        os.symlink({symlink_src!r}, symlink_dest)


Is it worth exposing the core of this logic as a function that can be used by both both _ensure_distributions_installed_serial directly and by this script?

I'm not sure how you get much better. This code is pretty small and the sticking point is 7 parameters.

huonw · 2023-12-13T23:25:32Z

pex/layout.py

+            pass
+
+
+AVERAGE_DISTRIBUTION_SIZE_PARALLEL_JOB_THRESHOLD = 1 * 1024 * 1024  # ~1MB


Does this number come from experiments?

Yes. As noted elsewhere, on 1 machine and with limited cases. I'm very much not confident in the heuristic.

pex/layout.py

pex/resolver.py

huonw · 2023-12-13T23:50:21Z

pex/resolver.py

+        attr.ib()
+    )  # Mapping[ProjectName, OrderedSet[Requirement]]
+
+    def adjust(self, distributions):


What's this adjusting? (I acknowledge it's just moved code.)

The name of the extracted class is indicative here, it adjusts a distribution to note the direct top level requirement it satisfies, if any. Most dists usually don't get adjusted since they satisfy only transitive requirements. The need for associating dists to the direct requirements they satisfy goes back to handling local project directory dependencies. These have no project name known except after the resolve process which builds them into a wheel with a name and version.

huonw · 2023-12-13T23:52:29Z

tests/integration/test_integration.py

-def test_entry_point_verification_3rdparty(tmpdir):
-    # type: (Any) -> None
+@pytest.mark.parametrize(
+    "layout", [pytest.param(layout, id=layout.value) for layout in Layout.values()]


Totally minor, but are you aware that pytest.mark.parametrize supports an ids parameter that can take a function, so these could potentially be something like:

@pytest.mark.parametrize("layout", Layout.values(), ids=lambda layout: layout.value) ...

I was not aware. I'm not sure I like that though! It's not obvious the two align when you split things up like that. 10 values, 6 ids. Clearly they align in your example, but it takes a second to make sure of that.

Oh, I read poorly. It's 1 function mapped over all the values. Gotcha.

Yeah, in addition to the function, it also supports passing a list with exactly the right number of IDs, but I don't like that possibility either.

huonw · 2023-12-13T23:56:03Z

tests/integration/test_lock_resolver.py

+    pex = os.path.join(str(tmpdir), "pex")
+    exe = os.path.join(str(tmpdir), "exe")
+    with open(exe, "w") as fp:
+        fp.write("import colors; print(colors.blue('Moo?'))")


I assume the import colours is meaning to validate that ansicolors is actually installed in a useful way. Should there be an import cowsay too?

jsirois · 2023-12-14T02:05:49Z

Alright - thanks @huonw and @kaos. I've added the perf analysis for small PEXes to the ticket which backs up what is expected: Builds of .whl file-based PEXes is always at least a little bit faster, Cold 1st runs is always a little bit slower and install parallelization is generally a loss. Although I'd love to not have knobs for these two new features, it seems necessary - there are unavoidable tradeoffs that are hard to justify removing from the user's purview.

jsirois · 2023-12-14T02:08:56Z

@kaos, I'm going to submit - I feel pretty confident I've covered the bases here - but I'll follow up with any feedback you have. There is refactoring and simplification to do as at least partly tracked by #2299 that further feedback will fall naturally into.

jsirois added 12 commits December 12, 2023 11:06

Support rst content in Variables help.

ac3dd66

A new `PEX_MAX_INSTALL_JOBS` variable will use this.

Support individual file compression levels.

a458456

Add knobs to control dependency installs.

e010bab

These will be used to select raw wheel dependencies (vs the current pre-installed wheel chroots) and the parallelism used to install dependencies at runtime upon the first PEX boot.

Make layouts capable of handling raw .whl dists.

93d89e1

This will be used in subsequent changes.

Support raw .whl file dists in PEXBuilder.

5f2cf95

This support will automatically be used as soon as resolves are wired with an option to return raw `.whl` file results.

Wire up raw .whl files through resolves.

94bef0a

Support finding dist scripts in wheel files.

a430ef9

This will be needed when building PEXes that use raw `.whl` file dists.

Parameterize key layout tests with wheel style.

4d42dc1

Test --pex-repository over dep wheel types.

044acd0

Test --lock resolve for raw .whl file deps.

1e89c0a

Test --no-pre-install-wheels.

50ed346

Use the Pip resolver to test the resulting PEX contains .whl files that are not stored in the PEX with compression.

jsirois requested review from kaos, benjyw, cosmicexplorer and tgolsson December 13, 2023 03:45

jsirois requested a review from huonw December 13, 2023 04:12

jsirois mentioned this pull request Dec 13, 2023

consume packed wheel cache in zipapp creation #2175

Closed

Fix test_issue_1667.py break.

645d49f

The break is only under pypy3.{7,8,9,10} and was caused by the prompt-toolkit 3.0.42 release today.

jsirois mentioned this pull request Dec 13, 2023

Pex zip-creation takes a very long time for torch>=2 #2292

Closed

Fixes and extra logging.

af4749d

Working through the perf analysis in pex-tool#2292 brought these to light.

jsirois commented Dec 13, 2023

View reviewed changes

This was referenced Dec 13, 2023

Support turning installed wheel chroots into .whl files with full fidelity. #2299

Open

Prepare the 2.1.154 release. #2300

Merged

kaos reviewed Dec 13, 2023

View reviewed changes

pex/bin/pex.py Outdated Show resolved Hide resolved

huonw approved these changes Dec 13, 2023

View reviewed changes

Review feedback from Huon and Andreas.

d1afe28

jsirois merged commit b3a7767 into pex-tool:main Dec 14, 2023
24 checks passed

jsirois deleted the issues/2292/support-runtime-wheel-install branch December 14, 2023 02:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for `--no-pre-install-wheels` and `--max-install-jobs`. #2298

Add support for `--no-pre-install-wheels` and `--max-install-jobs`. #2298

jsirois commented Dec 13, 2023

jsirois commented Dec 13, 2023

jsirois commented Dec 13, 2023

jsirois commented Dec 13, 2023

jsirois Dec 13, 2023 •

edited

Loading

kaos left a comment

huonw left a comment

huonw Dec 13, 2023

jsirois Dec 14, 2023

huonw Dec 13, 2023

jsirois Dec 14, 2023

huonw Dec 13, 2023

jsirois Dec 14, 2023

jsirois Dec 14, 2023

huonw Dec 13, 2023

jsirois Dec 14, 2023

huonw Dec 13, 2023

jsirois Dec 14, 2023

jsirois Dec 14, 2023

huonw Dec 14, 2023

huonw Dec 13, 2023

jsirois Dec 14, 2023

jsirois commented Dec 14, 2023

jsirois commented Dec 14, 2023

		pass


		AVERAGE_DISTRIBUTION_SIZE_PARALLEL_JOB_THRESHOLD = 1 * 1024 * 1024 # ~1MB

Add support for --no-pre-install-wheels and --max-install-jobs. #2298

Add support for --no-pre-install-wheels and --max-install-jobs. #2298

Conversation

jsirois commented Dec 13, 2023

jsirois commented Dec 13, 2023

jsirois commented Dec 13, 2023

jsirois commented Dec 13, 2023

jsirois Dec 13, 2023 • edited Loading

Choose a reason for hiding this comment

kaos left a comment

Choose a reason for hiding this comment

huonw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsirois commented Dec 14, 2023

jsirois commented Dec 14, 2023

Add support for `--no-pre-install-wheels` and `--max-install-jobs`. #2298

Add support for `--no-pre-install-wheels` and `--max-install-jobs`. #2298

jsirois Dec 13, 2023 •

edited

Loading