Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify PEX buildtime and runtime wheel caches. #821

Merged
merged 6 commits into from
Dec 6, 2019

Conversation

jsirois
Copy link
Member

@jsirois jsirois commented Dec 6, 2019

Previously these caches were seperate. Downloaded wheels and sdists were
cached in ~/.pex/build and wheels unzipped from zipped pexes at
runtime were cached to ~/.pex/install.

Now the caches are unified by the resolver such that any wheel installs
performed by it can be seen by zipped PEXes on the same machine when
they go to potentially unzip wheel distributions stored within at PEX
boot time.

N.B.: The cache is not unified in the other direction. If a zipped PEX
is executed on the same machine a PEX build resolve later happens on,
any intersecting wheels will be re-downloaded, built and installed by
the resolve finally unifying the caches from that point forward.

Fixes #820

Previously these caches were seperate. Downloaded wheels and sdists were
cached in `~/.pex/build` and wheels unzipped from zipped pexes at
runtime were cached to `~/.pex/install`.

Now the caches are unified by the resolver such that any wheel installs
performed by it can be seen by zipped PEXes on the same machine when
they go to potentially unzip wheel distributions stored within at PEX
boot time.

N.B.: The cache is not unified in the other direction. If a zipped PEX
is executed on the same machine a PEX build resolve later happens on,
any intersecting wheels will be re-downloaded, built and installed by
the resolve finally unifying the caches from that point forward.

Fixes pex-tool#820
The test demonstrates how to create a dehydrated pex that, upon first
execution, produces a hydrated pex that it hands control to from then
forward. This is the motivating use case for the cache unification
change which prevents dehydrated pexes from performing wheel unzipping
twice - once during pex hydration (resolve) and once during hydrated pex
run.
@jsirois
Copy link
Member Author

jsirois commented Dec 6, 2019

Reviewers, feel free to ignore the second commit unless it's of particular interest to you or otherwise request it be reverted. It is fairly unrelated to the main change except that it spells out the motivating case and that may be worth preserving. I'm at your command here.

Copy link
Contributor

@cosmicexplorer cosmicexplorer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to start playing around with replicating test_issues_789_demo() to pants!

pex/environment.py Show resolved Hide resolved
pex/common.py Show resolved Hide resolved
pex/pex_builder.py Outdated Show resolved Hide resolved
pex/common.py Show resolved Hide resolved
pex/environment.py Show resolved Hide resolved
pex/resolver.py Show resolved Hide resolved
tests/test_unified_install_cache.py Show resolved Hide resolved
@cosmicexplorer
Copy link
Contributor

It is fairly unrelated to the main change except that it spells out the motivating case and that may be worth preserving.

I would not have a problem with it being reverted! I think it may be worth preserving if it doesn't seem too brittle, but I also plan to hit the ground running with this concept immediately in pants, so we may not need to keep around a reference for the technique. It may be helpful for other build tools wishing to employ the "dehydrated" pex technique however, at least in giving them a place to start from.

@jsirois jsirois merged commit 66aa5ef into pex-tool:master Dec 6, 2019
@jsirois jsirois deleted the isses/820/unify_wheel_cache branch December 6, 2019 07:52
cosmicexplorer added a commit to pantsbuild/pants that referenced this pull request Mar 12, 2020
### Problem

See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

### Solution

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

### Result

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
cosmicexplorer added a commit to cosmicexplorer/pants that referenced this pull request Mar 12, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert pantsbuild#8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
cosmicexplorer added a commit to cosmicexplorer/pants that referenced this pull request Mar 31, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert pantsbuild#8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
stuhood pushed a commit to pantsbuild/pants that referenced this pull request Mar 31, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
stuhood pushed a commit to pantsbuild/pants that referenced this pull request Mar 31, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
stuhood pushed a commit to pantsbuild/pants that referenced this pull request Apr 1, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
cosmicexplorer added a commit to pantsbuild/pants that referenced this pull request May 4, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unify resolve and runtime wheel caches.
3 participants