-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Byte-compile files after installation #8541
Byte-compile files after installation #8541
Conversation
e1e941d
to
4180d19
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two suggestions, both of which can be deferred to follow-up PRs if you prefer.
Also, at some point someone should really move all of those inlined functions out of install_unpacked_wheel
- it's a monster. I imagine that doing so would involve a chunk of work to decouple them, so that's for the future though.
continue | ||
yield full_installed_path | ||
|
||
def pyc_output_path(path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be better moved to utils, as it's basically a compatibility wrapper around importlib.util.cache_from_source
. Once we drop Python 2 support, we should remove this in favour of calling importlib directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If/when we switch to py_compile.compile
I expect this to go away (or only be used inside it) since we'd need a compatibility wrapper for that as well.
# unicode input, but the declared type doesn't indicate | ||
# it. | ||
path_arg = cast('str', path) | ||
success = compileall.compile_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using py_compile.compile
would be better, as it returns the path of the compiled file, so we don't have to rely on recalculating it ourselves. The force
and quiet
arguments don't exist for that function, though, so we would need to change the logic somehow. I'm fine with doing that in a follow-up PR (and it's not a priority).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. We'd still need to manually get the path on Python 2, unfortunately, but it should be about the same amount of code. I had originally gone that route, but figured the jump from compileall.compile_dir
to compileall.compile_file
was big enough, and would at least help get us closer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 more months. Just 3 more.
else: | ||
return importlib.util.cache_from_source(path) | ||
|
||
# Compile all of the pyc files for the installed files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been moved to after some of the other code - is there a reason for this, or was it just for code organisation? I don't think it makes any difference, so I'm not looking for you to change it, I'm just curious as to whether there is a benefit that I missed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Byte-compilation has to occur after we have copied the files to their destination. In the current code that is after the calls to clobber
with the root scheme files and the files in *.data
.
Please don't. This is the right size for a PR in my opinion.
Please make those a separate PR. I'm not convinced that we should make deterministic builds a guaranteed property of pip (I'm not even sure what precisely "deterministic" means) so I'd like to see that debated as an independent item. |
Thank you for reviewing this!
Here I was just referring to the commits themselves. I normally try not to move and change code in a single commit, but in this case it was difficult not to. If it was easy enough to review, then that's usually good enough for me. :)
You're right to call this out. I'm specifically referring to the minor regression related to embedding the temp wheel extraction path into pyc files (#7808). I think this PR technically fixes it, but I don't want to consider that "done" until we have tests for it. That's lower priority for me than direct-from-wheel installs, so I don't mind leaving tests for a different PR at all. |
We want to move towards having more control over the generation of pyc files, which will allow us to provide deterministic installs and generate pyc files without relying on an already-extracted wheel. To that end, here we are stripping away one layer of abstraction, `compileall.compile_dir`. `compileall.compile_dir` essentially recurses through the provided directories and passes the files and args verbatim to `compileall.compile_file`, so removing that layer means that we directly call `compileall.compile_file`. We make the assumption that we can successfully walk over the source file tree, since we just wrote it, and omit the per-directory traversal error handling done by `compileall.compile_dir`.
`compileall.compile_file` returns a success parameter, but can return "successful" without actually generating a pyc file if the input file was filtered out and compilation was not attempted. In our file processing we mirror that logic, to ensure that a truthy success returned by `compileall.compile_file` actually indicates a file was written.
In order to add generated pyc files to the RECORD file for our package, we need to know their path! To raise confidence that we're doing this correctly, we assert the existence of the expected 'pyc' files while still using the old installation logic. Some valid reasons why pyc files may not be generated: 1. Syntax error in the installed Python files 2. There is already a pyc file in-place that isn't writable by the current user We don't fail installation in those cases today, and we wouldn't want to change our behavior here, so we only assert that the pyc file was created if `compileall.compile_file` indicates success.
In our next commit we will use the scheme path to locate files to byte-compile. If the scheme path is a `Path`, then that causes `compileall.compile_file` (via `py_compile.compile`) to fail with: ``` .tox/py38/lib/python3.8/site-packages/pip/_internal/operations/install/wheel.py:615: in install_unpacked_wheel success = compileall.compile_file( ../../../.pyenv/versions/3.8.0/lib/python3.8/compileall.py:157: in compile_file ok = py_compile.compile(fullname, cfile, dfile, True, ../../../.pyenv/versions/3.8.0/lib/python3.8/py_compile.py:162: in compile bytecode = importlib._bootstrap_external._code_to_timestamp_pyc( _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ code = <code object <module> at 0x7fa7e274f500, file "/tmp/user/1000/pytest-of-chris/pytest-37/test_std_install_with_direct_u0/dest/lib/sample/__init__.py", line 1>, mtime = 1593910285.2200587, source_size = 134 > ??? E ValueError: unmarshallable object ``` Debugging in gdb shows that the error is set due to the `Path` object being present in the code object, which `marshal.dumps` can't handle (frame 1): ``` 0 w_complex_object (v=<optimized out>, flag=<optimized out>, p=0x7fffffff7160) at Python/marshal.c:564 1 w_object (v=<Path at remote 0x7fffee51f120>, p=0x7fffffff7160) at Python/marshal.c:370 2 w_complex_object (v=<code at remote 0x7fffee591710>, flag=<optimized out>, p=0x7fffffff7160) at Python/marshal.c:544 3 w_object (v=<code at remote 0x7fffee591710>, p=0x7fffffff7160) at Python/marshal.c:370 4 w_complex_object (v=('1.2.0', <code at remote 0x7fffee591710>, 'main', None), flag=<optimized out>, p=0x7fffffff7160) at Python/marshal.c:475 5 w_object (v=('1.2.0', <code at remote 0x7fffee591710>, 'main', None), p=0x7fffffff7160) at Python/marshal.c:370 6 w_complex_object (v=<code at remote 0x7fffee591ea0>, flag=<optimized out>, p=0x7fffffff7160) at Python/marshal.c:539 7 w_object (p=0x7fffffff7160, v=<code at remote 0x7fffee591ea0>) at Python/marshal.c:370 8 PyMarshal_WriteObjectToString (version=<optimized out>, x=<code at remote 0x7fffee591ea0>) at Python/marshal.c:1598 9 marshal_dumps_impl (module=<optimized out>, version=<optimized out>, value=<code at remote 0x7fffee591ea0>) at Python/marshal.c:1739 10 marshal_dumps (module=<optimized out>, args=<optimized out>, nargs=<optimized out>) at Python/clinic/marshal.c.h:124 ``` In the interest of easy git bisects, we commit this fix before the code that would expose the bug.
There are a few changes here: 1. The byte-compilation now occurs after we copy the root-scheme files and files from any wheel data dirs 1. Instead of iterating over the files in the unpacked wheel directory, we iterate over the installed files as they exist in the installation path 2. In addition to asserting that pyc files were created, we also add them to the list of installed files, so they will be included in RECORD By compiling after installation, we no longer depend on a separate temporary directory - this brings us closer to installing directly from wheel files. By compiling with source files as they exist in the installation output directory, we no longer generate pyc files with an embedded randomized temp directory - this means that wheel installs can be deterministic.
4180d19
to
452e683
Compare
Thanks again for the review. :) |
One critical place we depend on extracted wheels is during generation of pyc files.
All high-level interfaces in the standard library expect file paths in their interface. A straightforward way to make that work with direct-from-wheel installs is to byte-compile files after they have been written to disk. In order to avoid re-compiling other files unnecessarily, we switch from
compileall.compile_dir
tocompileall.compile_file
and target the individual files we know we installed for the current wheel.As a side-effect, this should also address #7808, since generating the pyc files in-place means the temporary wheel extraction directory should no longer be present in the generated pyc files.
Progresses #6030 and #7808.