-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-40255: Implement Immortal Instances - Optimizations Combined #31491
bpo-40255: Implement Immortal Instances - Optimizations Combined #31491
Conversation
You can use this in deepfrozen modules to get this even faster see cpython/Tools/scripts/deepfreeze.py Line 140 in 74127b8
|
That's great news! I'm going to update PEP 683 with the outcome and some of the details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Here are some random review comments. Hopefully they're helpful. I decided to only review the final PR (with all optimizations). I skipped the .py files and a few others for now.
immortalize_object(PyObject *obj, PyObject *Py_UNUSED(ignored)) | ||
{ | ||
_Py_SetImmortal(obj); | ||
/* Special case for PyCodeObjects since they don't have a tp_traverse */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Various fields below are tuples and the individual items in the tuples should also become immortal, and for co_consts
this should recurse down. Maybe whenever we make a tuple immortal we should immortalize all its items?
Py_TYPE(FROM_GC(gc))->tp_traverse( | ||
FROM_GC(gc), (visitproc)immortalize_object, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this loop find code objects contained inside other code objects? (I don't know what exactly is contained in permanent_generation.head
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code objects are not tracked by GC.
And most tuples are not tracked too.
So we need to find code and tuples via function objects, module global dict, class namespace dict, etc.
@@ -1829,6 +1829,10 @@ PyImport_ImportModuleLevelObject(PyObject *name, PyObject *globals, | |||
if (mod == NULL) { | |||
goto error; | |||
} | |||
// Immortalize top level modules | |||
if (tstate->recursion_limit - tstate->recursion_remaining == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work? I put a printf here and it doesn't seem to be immortalizing most of the frozen modules:
Immortalizing <module 'winreg' (built-in)>
Immortalizing <module '_frozen_importlib_external' (frozen)>
Immortalizing <module 'zipimport' (frozen)>
Immortalizing <module 'encodings' from 'C:\\Users\\gvanrossum\\cpython\\Lib\\encodings\\__init__.py'>
Immortalizing <module '_winapi' (built-in)>
Immortalizing <module 'encodings.mbcs' from 'C:\\Users\\gvanrossum\\cpython\\Lib\\encodings\\mbcs.py'>
Immortalizing <module '_signal' (built-in)>
Immortalizing <module 'io' (frozen)>
Immortalizing <module 'site' (frozen)>
(Most frozen modules are imported at a much higher recursion level, either 7, 13 or 18.)
@@ -145,6 +167,20 @@ static inline Py_ssize_t Py_SIZE(const PyVarObject *ob) { | |||
} | |||
#define Py_SIZE(ob) Py_SIZE(_PyVarObject_CAST_CONST(ob)) | |||
|
|||
PyAPI_FUNC(PyObject *) _PyGC_ImmortalizeHeap(void); | |||
PyAPI_FUNC(PyObject *) _PyGC_TransitiveImmortalize(PyObject *obj); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This name is awkward, I'd expect some variation on "immortalize transitively".)
if (weaklist != NULL) { \ | ||
if (stdlib_list != NULL) { \ | ||
PyObject *list = user_weaklist; \ | ||
if (PySequence_Contains(stdlib_list, name)) { \ | ||
list = stdlib_weaklist; \ | ||
} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be (hyper-)correct you probably need to check for list != NULL
after this, since it might just be possible that stdlib_list
is not NULL but stdlib_weaklist
or user_weaklist
is NULL.
finalize_modules_clear_weaklist(PyThreadState *tstate, | ||
PyInterpreterState *interp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the interpreter state easily found from the thread state? So you'd only need a tstate arg.
// detect those modules which have been held alive. | ||
PyObject *weaklist = finalize_remove_modules(modules, verbose); | ||
// This prepares two lists, the user defined list of modules as well | ||
// as stdlib list of modules. The user modules will be destroyed first in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// as stdlib list of modules. The user modules will be destroyed first in | |
// as the stdlib list of modules. The stdlib modules will be destroyed after all user modules in |
(And probably reflow.)
PyObject *key, *value; | ||
|
||
|
||
/* First, clear only names starting with a single underscore */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stray comment?
Also, no phase deletes __builtins__
, right?
} | ||
} | ||
} | ||
|
||
void | ||
_PyModule_Clear(PyObject *m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this is no longer used. Is that right?
@@ -1994,7 +1995,9 @@ _Py_NewReference(PyObject *op) | |||
#ifdef Py_REF_DEBUG | |||
_Py_RefTotal++; | |||
#endif | |||
Py_SET_REFCNT(op, 1); | |||
/* Do not use Py_SET_REFCNT to skip the Immortal Instance check. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* Do not use Py_SET_REFCNT to skip the Immortal Instance check. This | |
/* Do not use Py_SET_REFCNT -- it skips the Immortal Instance check. This |
} | ||
if (from_prev) { | ||
_PyGCHead_SET_PREV(from_next, from_prev); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change needed at all?
And please use 4-space indent.
} | ||
|
||
PyObject * | ||
_PyGC_ImmortalizeHeap(void) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this name. This function don't relating to "Heap" at all.
And please move the {
to next line.
Py_TYPE(FROM_GC(gc))->tp_traverse( | ||
FROM_GC(gc), (visitproc)immortalize_object, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code objects are not tracked by GC.
And most tuples are not tracked too.
So we need to find code and tuples via function objects, module global dict, class namespace dict, etc.
This is an optimization on top of PR19474.
It combines PR31488, PR31489, and PR31490 into a single change to measure the combined performance benefits.
These results do not change too much from what was already achieved independently by these optimizations (as some of the immortalized instances start overlaping with each other). That being said, performance will keep scaling as the application scales as well. The current microbenchmarks do not measure applications that contain hundreds of imports or thousands of interned strings. Nonetheless, it is still worthwhile to consider all of these improvements in conjunction when thinking about larger scale applications.
Benchmark Results
Overall: 0% faster compared to the main branch and the highest number of non-stat significant benchmarks (18)
pyperformance results
https://bugs.python.org/issue40255