-
-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for --no-pre-install-wheels
and --max-install-jobs
.
#2298
Add support for --no-pre-install-wheels
and --max-install-jobs
.
#2298
Conversation
A new `PEX_MAX_INSTALL_JOBS` variable will use this.
This change renames types and variables to take the emphasis off installation and put it on the more generic result of a resolve (which will soon include raw `.whl` files as an option in addition to default installed wheel chroots). Also factor out the machinery needed to be able to ask `BuildAndInstall` for `.whl` file results in addition to the current installed wheel chroot results.
These will be used to select raw wheel dependencies (vs the current pre-installed wheel chroots) and the parallelism used to install dependencies at runtime upon the first PEX boot.
This will be used in subsequent changes.
This support will automatically be used as soon as resolves are wired with an option to return raw `.whl` file results.
This will be needed when building PEXes that use raw `.whl` file dists.
Use the Pip resolver to test the resulting PEX contains .whl files that are not stored in the PEX with compression.
I'll attach some example perf numbers for the torch 2GB case here as soon as CI goes green. Suffice to say, this is both a build time win and run time win for that case. |
Reviewers: Thanks in advance - this is large. I did break it up into independent commits though, none of which is larger than 550 delta-lines, with most being ~100 delta lines. So instead of ~12 PRs you get 1 with 12 commits. I needed the 1 personally to work out all the titchy details comprehensively and ensure this all worked / vet it was complicated and actually required the 2 new knobs instead of just being the new default. |
The break is only under pypy3.{7,8,9,10} and was caused by the prompt-toolkit 3.0.42 release today.
Working through the perf analysis in pex-tool#2292 brought these to light.
Alright, #2292 now has perf analysis for the OP torch case. I'm still working on the 3 mac-specific IT failures, but this is ready for review. |
These were solved by switching away from the `temporary_dir` context manager to pytest's `tmpdir` fixture. Under `temporary_dir` the `~/.pex/unzipped_pexes/*/.bootstrap` symlink was not working. The mechanism by which this wasn't working still needs to be root-caused.
if not self._pex_info.deps_are_wheel_files: | ||
raise ResolveError( | ||
"Cannot resolve .whl files from PEX at {pex}; its dependencies are in the " | ||
"form of pre-installed wheel chroots.".format(pex=self.source_pex) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I only like to add orthogonal features that can be combined freely. There are 3 Pex resolver sources: Pip, lock files and PEX repositories, and 2 PEX dependency formats: installed wheel chroots and (now) raw .whl
files. In this 6 element matrix, only this case of resolving raw .whl
files from a PEX repository containing dependencies in installed wheel chroot form cannot be satisfied currently (this is tested explicitly in the changes to test_pex_repository_resolve.py
).
The pep_427.py
work that now powers .whl
file installs instead of pip install
does make it possible to plug this gap in a later release though with some modification of pep_376.py
. This will also allow the existing PEX_TOOLS=1 ./a.pex repository extract ...
tool to produce fully correct wheels for the installed wheel chroot dependencies in PEXes of that style as well.
Filed: #2299
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not gone through all commits yet. Only been reading the code, trying to grok the change with limited knowledge of the surroundings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The numbers from #2292 are awesome, nice!
group.add_argument( | ||
"--max-install-jobs", | ||
dest="max_install_jobs", | ||
default=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this default is to preserve behaviour; if a breaking change was possible (e.g. in 3.0.0), would you consider defaulting to -1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not. Parallel installs slow down most cases and although -1 tries to guess the divide in the cases, I'm sure it does so poorly. My tests were on a 16 core SSD machine and I'm not confident on the heuristic for my machine let alone other disk / CPU combos.
with identify_layout({pex!r}) as layout, atomic_directory( | ||
{spread_dest!r}, source={source!r} | ||
) as spread_chroot: | ||
if not spread_chroot.is_finalized(): | ||
with TRACER.timed({extracting_msg!r}): | ||
layout.extract_dist( | ||
dest_dir=spread_chroot.work_dir, | ||
dist_relpath={dist_relpath!r}, | ||
is_wheel_file={is_wheel_file!r} | ||
) | ||
|
||
symlink_dest = {symlink_dest!r} | ||
safe_mkdir(os.path.dirname(symlink_dest)) | ||
os.symlink({symlink_src!r}, symlink_dest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth exposing the core of this logic as a function that can be used by both both _ensure_distributions_installed_serial
directly and by this script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how you get much better. This code is pretty small and the sticking point is 7 parameters.
pex/layout.py
Outdated
pass | ||
|
||
|
||
AVERAGE_DISTRIBUTION_SIZE_PARALLEL_JOB_THRESHOLD = 1 * 1024 * 1024 # ~1MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this number come from experiments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. As noted elsewhere, on 1 machine and with limited cases. I'm very much not confident in the heuristic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented.
attr.ib() | ||
) # Mapping[ProjectName, OrderedSet[Requirement]] | ||
|
||
def adjust(self, distributions): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this adjusting? (I acknowledge it's just moved code.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the extracted class is indicative here, it adjusts a distribution to note the direct top level requirement it satisfies, if any. Most dists usually don't get adjusted since they satisfy only transitive requirements. The need for associating dists to the direct requirements they satisfy goes back to handling local project directory dependencies. These have no project name known except after the resolve process which builds them into a wheel with a name and version.
def test_entry_point_verification_3rdparty(tmpdir): | ||
# type: (Any) -> None | ||
@pytest.mark.parametrize( | ||
"layout", [pytest.param(layout, id=layout.value) for layout in Layout.values()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally minor, but are you aware that pytest.mark.parametrize
supports an ids
parameter that can take a function, so these could potentially be something like:
@pytest.mark.parametrize("layout", Layout.values(), ids=lambda layout: layout.value)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not aware. I'm not sure I like that though! It's not obvious the two align when you split things up like that. 10 values, 6 ids. Clearly they align in your example, but it takes a second to make sure of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I read poorly. It's 1 function mapped over all the values. Gotcha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in addition to the function, it also supports passing a list with exactly the right number of IDs, but I don't like that possibility either.
pex = os.path.join(str(tmpdir), "pex") | ||
exe = os.path.join(str(tmpdir), "exe") | ||
with open(exe, "w") as fp: | ||
fp.write("import colors; print(colors.blue('Moo?'))") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the import colours
is meaning to validate that ansicolors
is actually installed in a useful way. Should there be an import cowsay
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Alright - thanks @huonw and @kaos. I've added the perf analysis for small PEXes to the ticket which backs up what is expected: Builds of |
The
--no-pre-install-wheels
option causes built PEXes to use raw.whl
files. For--layout zipapp
this means a single.whl
file isSTORED
per dep, and for--layout {packed,loose}
this means the loose.deps/
dir contains raw.whl
files. This speeds up all PEX builds byavoiding pre-installing wheel deps (~unzipping into the
PEX_ROOT
) andthen, in the case of zipapp and packed layout, re-zipping. For large
dependencies the time savings can be dramatic.
Not pre-installing wheels comes with a PEX boot cold-start performance
tradeoff since installation now needs to be done at runtime. This is
generally a penalty of O(100ms), but that penalty can be erased for some
deployment scenarios with the new
--max-install-jobs
build option /PEX_MAX_INSTALL_JOBS
runtime env var. By default, runtime installs areperformed serially, but this new option can be set to use multiple
parallel install processes, which can speed up cold boots for large
dependencies.
Fixes #2292