-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-45020: Freeze the modules imported during startup. #28107
bpo-45020: Freeze the modules imported during startup. #28107
Conversation
Another reason not to check the generated .h files into git is that when the marshal format changes in any way, each of these files is included in the diff as changed, which makes the diff look overwhelming (this one, for example, has 151 files!). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits; the most disturbing part is that it doesn't seem to work on Windows at all (it always believes it's installed).
Doc/using/cmdline.rst
Outdated
@@ -480,6 +480,14 @@ Miscellaneous options | |||
objects and pyc files are desired as well as supressing the extra visual | |||
location indicators when the interpreter displays tracebacks. See also | |||
:envvar:`PYTHONNODEBUGRANGES`. | |||
* ``-X frozen_modules`` determines whether or not frozen modules are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little vague on the exact syntax to use on the command line. Do I write
python -X frozen_modules=on
and
python -X frozen_modules=off
???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed this.
Include/cpython/initconfig.h
Outdated
int user_site_directory; | ||
wchar_t *check_hash_pycs_mode; | ||
bool use_frozen_modules; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an unrelated refactoring snuck in? I had expected just one line adding int use_frozen_modules;
Same for the following few files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fixed.
Makefile.pre.in
Outdated
@@ -765,6 +765,671 @@ Python/frozen_modules/zipimport.h: $(srcdir)/Programs/_freeze_module $(srcdir)/L | |||
$(srcdir)/Lib/zipimport.py \ | |||
$(srcdir)/Python/frozen_modules/zipimport.h | |||
|
|||
Python/frozen_modules/abc.h: $(srcdir)/Programs/_freeze_module $(srcdir)/Lib/abc.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you understand when to write $(srcdir)/ before a path and when not? It's definitely needed in shell commands, but it seems it's not used for dependencies. For example I see:
Modules/_math.o: Modules/_math.c Modules/_math.h
and none of the variables defining lists of .o files use it (e.g. MODULE_OBJS).
But usually when .h files are mentioned they do add $(srcdir), e.g. PEGEN_HEADERS or PYTHON_HEADERS.
Is this just not used any more, or does "make" only need it in certain cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll clean this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed this.
FWIW, I'm not sure what "make" requires so I'll follow the existing patterns. There are cases where we do use $(srcdir)
with dependencies but only where the rule is also defined with $(srcdir)
or there is no corresponding rule. Neither applies so I'll drop the $srcdir)/
prefix.
Python/pathconfig.c
Outdated
} | ||
} | ||
/* We fall back to assuming this is an installed Python. */ | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using some printf() calls it looks like on Windows stdlib is NULL when run out of the development tree and we always return true here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the Windows builds aren't right yet.
a381ccf
to
cdd9784
Compare
…Py_IsInstalled()).
5c669d8
to
a333fa3
Compare
Lib/idlelib/idle_test/test_query.py
Outdated
@@ -137,8 +137,11 @@ def test_good_module_name(self): | |||
self.assertTrue(dialog.entry_ok().endswith('__init__.py')) | |||
self.assertEqual(dialog.entry_error['text'], '') | |||
dialog = self.Dummy_ModuleName('os.path') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.path is an arbitrary dotted module name. I would rather switch.
Change is in two pieces because github would not do it in one.
dialog = self.Dummy_ModuleName('os.path') | |
dialog = self.Dummy_ModuleName('idlelib.idle') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made this change. Thanks!
Lib/idlelib/idle_test/test_query.py
Outdated
with import_helper.CleanImport('os', 'os.path'): | ||
import os.path | ||
filename = dialog.entry_ok() | ||
self.assertEqual(dialog.entry_error['text'], '') | ||
self.assertTrue(filename.endswith('path.py')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with import_helper.CleanImport('os', 'os.path'): | |
import os.path | |
filename = dialog.entry_ok() | |
self.assertEqual(dialog.entry_error['text'], '') | |
self.assertTrue(filename.endswith('path.py')) | |
self.assertTrue(dialog.entry_ok().endswith('idle.py')) | |
self.assertEqual(dialog.entry_error['text'], '') |
When you're done making the requested changes, leave the comment: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed up to the latest commit, except I was daunted by the many changes in marshal.c and I will review those later.
with import_helper.frozen_modules(), captured_stdout(): | ||
spec = importlib.util.find_spec(modname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What stdout content is being captured?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The frozen test module (hello.py) prints out Hello world!
.
@@ -185,7 +185,7 @@ class CleanImport(object): | |||
importlib.import_module("foo") # new reference | |||
""" | |||
|
|||
def __init__(self, *module_names): | |||
def __init__(self, *module_names, usefrozen=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the class docstring to mention that it also disabled frozen modules unless usefrozen=True is passed?
Misc/NEWS.d/next/Core and Builtins/2021-08-31-17-44-51.bpo-45020.ZPI_3L.rst
Show resolved
Hide resolved
// To avoid unnecessary duplication during marshaling, all "complex" | ||
// objects get stored in a lookup table that allows them to be | ||
// referenced by later entries using an integer ID. However, this | ||
// optimization is not applied if the refcount is exactly 1. | ||
// | ||
// The problem is that any objects cached during runtime | ||
// initialization will get re-used in the code object during | ||
// compilation. This means some objects will have a refcount > 1, | ||
// even though there is only one instance within the code object. | ||
// This is problematic it this happens inconsistently, such as only | ||
// in non-debug builds (e.g. init_filters() in Python/_warnings.c); | ||
// the generated marshal data we are freezing will be different even | ||
// though the Python code hasn't changed. | ||
// | ||
// We address this by giving marshal the set of objects with | ||
// refcount > 1 that actually only get used once in the code object. | ||
// This mostly consists of interned strings and small integers. | ||
PyObject *refs_before = get_ref_counts(); | ||
if (refs_before == NULL) { | ||
return NULL; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we fix this problem in marshal itself? Then everyone (notably all regular, non-frozen PYC files) would benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened https://bugs.python.org/issue45186 to cover this separately.
Yeah, sorry about that. I can roll things back if you think it's too much. FWIW, it might be easier to review the 3 marshal-related changes separately: The first one re-arranges code to make the second and third commits cleaner. None of the changes in marshal.c should have any effect on the behavior of the marshal module and only change the behavior of _freeze_module.c. |
I would much rather review separate PRs than separate commits on a single PR. Review and iteration can be much more focused that way. |
Regarding the "address sanitizer" CI failure, this is similar to the earlier failure with the macOS job. When running with frozen modules, [UPDATE] Never mind. The test relies on |
I'm splitting this PR up into 3:
There will be other PRs but these three cover the fundamental changes. |
Pretty much everything from this PR has been merged through those other PRs. |
[I'm going to split this PR up.]
Doing this provides some significant performance gains for runtime startup. This change also adds a command-line flag (
-X frozen_modules=[on|off]
) that allows users to opt out of (or into) using frozen modules. When run in a development environment it defaults to opting out, to avoid contributors wasting time trying to figure out why their changes aren't getting used.Note that I plan on updating the tests for the frozen modules so they run against the frozen and unfrozen modules, like we do with importlib. We'll also be looking at making frozen modules more like source modules in a separate PR.
There are only a handful of things left to do here before this could be merged:
get the "check_generated_files" job passingsort out PyConfig.stdlib_dir on Windowshttps://bugs.python.org/issue45020