Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize resolve. #819

Merged
merged 10 commits into from
Dec 6, 2019
Merged

Parallelize resolve. #819

merged 10 commits into from
Dec 6, 2019

Conversation

jsirois
Copy link
Member

@jsirois jsirois commented Dec 4, 2019

The three major phases of a pex resolve: download, build and install,
are delegated to pip subprocesses. As such, we can easily parallelize
these operations only needing to take care of shared portions of the
filesystem the processes might mutate. In the end this is only relevant
to the build and install phases which are natural points to cache
results in a shared filesystem cache. The install phase is already
cached in this way (#815) and we add caching for the build phase as
well, both utilizing (posix in the end) guarantees around os.rename.

The atomic shared directory updates are achieved with AtomicDirectory
and the subprocess parallelization is acheived using the new jobs
module and request / response data object pairs to coordinate parallel
pip jobs.

Example speedups:

  • PEX build using prebuilt wheels (the Slow builds in 2.0 - how does --python interact with --platform? #811 case):

    pex \
      --platform=manylinux1-x86_64-cp-35-m \
      --platform=manylinux1-x86_64-cp-36-m \
      --platform=manylinux1-x86_64-cp-37-m \
      --platform=macosx-10.9-x86_64-cp-35-m \
      --platform=macosx-10.9-x86_64-cp-36-m \
      --platform=macosx-10.9-x86_64-cp-37-m \
      numpy==1.17.4 \
      --python-shebang "/usr/bin/env python3" \
      -o numpy-pex1.6.2.pex
    
    pex version cold warm
    1.6.12 0m41.313s 0m20.759s
    2.0.2 0m53.236s 0m27.217s
    HEAD 0m32.336s 0m17.596s
  • PEX build including sdists (wheel builds):

    pex \
      pantsbuild.pants==1.22.0 \
      -o pantsbuild.pants.pex
    
    pex version cold warm
    1.6.12 0m49.101s 0m11.788s
    2.0.2 1m15.602s 1m2.476s
    HEAD 0m47.174s 0m15.309s

Fixes #811
Fixes #817
Fixes #818

@jsirois
Copy link
Member Author

jsirois commented Dec 4, 2019

I'd reccomend reviewing by reading the diffs in-order and not expanding the auto-collapsed resolver.py diff (it's large). This will allow you to familiarize yourself with DistributionTarget and Jobs which are two of the main players in the resolver.py diff. Once ready to look at the resolver.py diff, I'd focus on the AtomicDirectory class 1st, then skip to the ResolveRequest class and focus in its resolve_distributions method which is the old resolve function made parallel using jobs, AtomicDirectory and request / response object pairs for each resolve stage.

This extracts some calculations out of the main `resolve_distributions`
method and streamlines install request categorization to use a similar
form to build request categorization.

Also cleanup marker calculation with some consolidation of code as well
as elevation of that tail portion of the resolve to an officially
commented resolve stage.
pex/bin/pex.py Outdated Show resolved Hide resolved
pex/distribution_target.py Outdated Show resolved Hide resolved
pex/jobs.py Outdated Show resolved Hide resolved
pex/jobs.py Outdated Show resolved Hide resolved
pex/jobs.py Outdated Show resolved Hide resolved
pex/jobs.py Show resolved Hide resolved
pex/jobs.py Show resolved Hide resolved
pex/jobs.py Show resolved Hide resolved
pex/jobs.py Outdated Show resolved Hide resolved
pex/jobs.py Outdated Show resolved Hide resolved
pex/jobs.py Outdated Show resolved Hide resolved
pex/jobs.py Outdated Show resolved Hide resolved
pex/jobs.py Outdated Show resolved Hide resolved
pex/jobs.py Outdated Show resolved Hide resolved
pex/jobs.py Outdated Show resolved Hide resolved
pex/jobs.py Outdated Show resolved Hide resolved
pex/jobs.py Show resolved Hide resolved
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Still reviewing - mostly followup to your responses)

pex/jobs.py Show resolved Hide resolved
pex/jobs.py Outdated Show resolved Hide resolved
pex/jobs.py Outdated Show resolved Hide resolved
pex/jobs.py Outdated Show resolved Hide resolved
pex/jobs.py Show resolved Hide resolved
pex/distribution_target.py Outdated Show resolved Hide resolved
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Very impressive performance results. The abstractions you introduce make this complex functionality much easier to understand.

pex/pip.py Show resolved Hide resolved
pex/pip.py Outdated Show resolved Hide resolved
pex/resolver.py Show resolved Hide resolved
pex/resolver.py Outdated Show resolved Hide resolved
pex/resolver.py Show resolved Hide resolved
Copy link
Contributor

@cosmicexplorer cosmicexplorer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for showing the table of performance comparisons! That is a super useful basis for me to understand the performance implications and how to test them!

pex/jobs.py Show resolved Hide resolved
pex/resolver.py Outdated Show resolved Hide resolved
pex/resolver.py Show resolved Hide resolved
pex/resolver.py Show resolved Hide resolved
pex/jobs.py Show resolved Hide resolved
pex/jobs.py Show resolved Hide resolved
pex/resolver.py Show resolved Hide resolved
Co-Authored-By: Danny McClanahan <[email protected]>
@jsirois jsirois merged commit f796e00 into pex-tool:master Dec 6, 2019
cosmicexplorer added a commit to pantsbuild/pants that referenced this pull request Mar 12, 2020
### Problem

See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

### Solution

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

### Result

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
cosmicexplorer added a commit to cosmicexplorer/pants that referenced this pull request Mar 12, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert pantsbuild#8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
cosmicexplorer added a commit to cosmicexplorer/pants that referenced this pull request Mar 31, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert pantsbuild#8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
stuhood pushed a commit to pantsbuild/pants that referenced this pull request Mar 31, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
stuhood pushed a commit to pantsbuild/pants that referenced this pull request Mar 31, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
stuhood pushed a commit to pantsbuild/pants that referenced this pull request Apr 1, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
cosmicexplorer added a commit to pantsbuild/pants that referenced this pull request May 4, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
@jsirois jsirois deleted the issues/811/finish branch September 5, 2024 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants