-
-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize resolve. #819
Parallelize resolve. #819
Conversation
I'd reccomend reviewing by reading the diffs in-order and not expanding the auto-collapsed |
This extracts some calculations out of the main `resolve_distributions` method and streamlines install request categorization to use a similar form to build request categorization. Also cleanup marker calculation with some consolidation of code as well as elevation of that tail portion of the resolve to an officially commented resolve stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Still reviewing - mostly followup to your responses)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Very impressive performance results. The abstractions you introduce make this complex functionality much easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for showing the table of performance comparisons! That is a super useful basis for me to understand the performance implications and how to test them!
Co-Authored-By: Danny McClanahan <[email protected]>
### Problem See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches. @jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787. **Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it. ### Solution With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains: - in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from. - in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False. - in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed. ### Result For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size: ```bash X> ls dist total 145M -rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex* -rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex* ```
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches. @jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert pantsbuild#8787. **Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it. With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains: - in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from. - in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False. - in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed. For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size: ```bash X> ls dist total 145M -rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex* -rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex* ```
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches. @jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert pantsbuild#8787. **Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it. With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains: - in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from. - in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False. - in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed. For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size: ```bash X> ls dist total 145M -rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex* -rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex* ```
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches. @jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787. **Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it. With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains: - in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from. - in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False. - in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed. For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size: ```bash X> ls dist total 145M -rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex* -rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex* ```
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches. @jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787. **Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it. With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains: - in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from. - in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False. - in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed. For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size: ```bash X> ls dist total 145M -rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex* -rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex* ```
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches. @jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787. **Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it. With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains: - in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from. - in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False. - in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed. For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size: ```bash X> ls dist total 145M -rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex* -rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex* ```
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches. @jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787. **Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it. With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains: - in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from. - in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False. - in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed. For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size: ```bash X> ls dist total 145M -rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex* -rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex* ```
The three major phases of a pex resolve: download, build and install,
are delegated to pip subprocesses. As such, we can easily parallelize
these operations only needing to take care of shared portions of the
filesystem the processes might mutate. In the end this is only relevant
to the build and install phases which are natural points to cache
results in a shared filesystem cache. The install phase is already
cached in this way (#815) and we add caching for the build phase as
well, both utilizing (posix in the end) guarantees around
os.rename
.The atomic shared directory updates are achieved with
AtomicDirectory
and the subprocess parallelization is acheived using the new
jobs
module and request / response data object pairs to coordinate parallel
pip jobs.
Example speedups:
PEX build using prebuilt wheels (the Slow builds in 2.0 - how does --python interact with --platform? #811 case):
PEX build including sdists (wheel builds):
Fixes #811
Fixes #817
Fixes #818