-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Freeze/link all stdlib modules imported during startup. #82
Comments
It would be nice to be able to put numbers on how substantial the I/O overhead really is. |
I've found the following .py modules get imported during startup (some only without -S) and frozen them: without site (python -S):
with site:
See my branch: https://github.com/ericsnowcurrently/cpython/tree/frozen-modules-cleanup before adding stdlib modules
after adding stdlib modules
(Note that I ran this comparison in a VM, so the pure numbers aren't so reliable.) Overall, It seems like there is a real, positive impact on performance (roughly 10%?), though the difference isn't huge (and the VM makes it a little cloudy). |
I re-ran the comparison 100 times and averaged the results: before freezing those stdlib modules:
after freezing those stdlib modules:
So I'm seeing a 15% improvement, even after factoring in disk caching. (again, running in a VM) I'm still going to check those other "todo" items above. |
@gvanrossum, if I exclude |
I've verified that all the startup modules are either builtin or frozen:
|
This is awesome. Maybe move the generated files to a subdir like clinic? |
One way to remove the regenerated file problem in PRs is to only generate one file. Another thing we can do is to only put two bytes per line not 16 in the output of the bytecode. Since we often add or remove a few instructions here and there, the diffs can be very large as subsequently lines get shifted. E.g. 100,19,100,20,147,1,100,21,100,22,147,1,100,23,100,24, would become 100,19,
100,20,
147,1,
100,21,
100,22,
147,1,
100,23,
100,24, which is only 3 bytes larger, but produces much smaller diffs should one or two instructions be changed. Forget readability of generated code. Minimizing the number of files and keeping diffs down are much more important to ease of development. |
But it’s marshal data, and the constants, names etc. are not aligned on even bytes. So maybe one byte per line? |
Note that the generated .h files are no longer in the repo, so they don't show up in diffs or PRs any more. Thus we don't need to worry about strategies to mitigate the earlier annoyances.
Good point. Having a single .h file would also mean a single include in frozen.c (and we wouldn't have to generate the includes any more). So this may be worth doing even though .h files aren't in the repo.
Good point, though there isn't much value to changing this now. |
@markshannon, I've created an "always default to on" branch: https://github.com/ericsnowcurrently/cpython/tree/frozen-modules-always-default-on. |
Probably too late now, but you could have got around the PR/diff problem with a setting in a |
Thanks for the tip. We tried doing that but it only helped with some of the problems with having the frozen .h files in the repo. |
FYI, @FFY00 is helping out with some of the remaining work related to frozen modules. |
Maybe we should put this epic to bed? We've done what we could, and at least on systems using GCC or clang the results are looking good. |
Yeah. There are lingering odds and ends but nothing essential. |
During runtime startup we import a bunch of modules, particularly if the site module is used (without
-S
). Any of these that aren't frozen/linked incur a substantial IO overhead. We already freeze importlib and link a number of builtin modules. The idea here is that we freeze/link the rest.To do:
re-run the comparison on the benchmarking machine(I'm comfortable with the results I'm seeing locally.)-X frozen-modules=[on|off]
(defaulting to "off")Exception in HTMLParser for special JavaScript code python/cpython#45188bpo-45020: Drop the frozen .h files from the repo. python/cpython#28375os
,site
, andcodecs
blocked by bpo-45186 (or maybe just by removing the .h files)__file__
(and__path__
) on frozen [stdlib] modulesFrozenImporter
importlib._bootstrap._install()
is calledco_filename
during unmarshal of frozen modules (for tracebacks, etc.) (bpo-45652)encodings
package (bpo-45653)blocked by noise added to "make" output (see comment)__path__
bpo-21736)blocked by bpo-45211"off" if Py_DEBUGalways "on" if not a PGO build, even if running out of the source tree-X frozen_modules
to-X frozen_stdlib
and "use_frozen" in import.c (see gh-28633)os.path
(bpo-45272)after:
./python -m ...
(bpo-45654)find_frozen()
(see bpo-45213)PyConfig
Makefile.pre.in
and simplifiesTools/scripts/freeze_modules.py
a little_imp.frozen_module_names()
inTools/scripts/generate_stdlib_module_names.py
(bpo-45189)FrozenImporter.get_filename()
(bpo-45659)FrozenImporter.get_source()
(bpo-45658)investigate:
related:
ModuleSpec.loader_state
(https://bugs.python.org/issue45364)Candidate solutions to the editing-stdlib-.py-files problem, from bpo-45020:
The text was updated successfully, but these errors were encountered: