-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reference count contention with nested functions #124218
Comments
2a seems like the best solution. The dict version tag is being removed in 3.14, so most of that can be used for the ID. Just to be clear, the per-thread reference count would be in addition to the normal reference count and primarily to avoid contention when creating closures? |
Yes, that's right. |
Currently, we only use per-thread reference counting for heap type objects and the naming reflects that. In an upcoming change we will extend it to code objects and certain dictionaries used in `PyFunctionObject` to avoid scaling bottlenecks when creating nested functions. Rename some of the files and functions in preparation for this change.
Currently, we only use per-thread reference counting for heap type objects and the naming reflects that. In an upcoming change we will extend it to code objects and certain dictionaries used in `PyFunctionObject` to avoid scaling bottlenecks when creating nested functions. Rename some of the files and functions in preparation for this change.
Currently, we only use per-thread reference counting for heap type objects and the naming reflects that. We will extend it to a few additional types in an upcoming change to avoid scaling bottlenecks when creating nested functions. Rename some of the files and functions in preparation for this change.
Use per-thread refcounting for the reference from function objects to their corresponding code object. This can be a source of contention when frequently creating nested functions. Deferred refcounting alone isn't a great fit here because these references are on the heap and may be modified by other libraries.
Use per-thread refcounting for the reference from function objects to their corresponding code object. This can be a source of contention when frequently creating nested functions. Deferred refcounting alone isn't a great fit here because these references are on the heap and may be modified by other libraries.
Use per-thread refcounting for the reference from function objects to their corresponding code object. This can be a source of contention when frequently creating nested functions. Deferred refcounting alone isn't a great fit here because these references are on the heap and may be modified by other libraries.
Use per-thread refcounting for the reference from function objects to their corresponding code object. This can be a source of contention when frequently creating nested functions. Deferred refcounting alone isn't a great fit here because these references are on the heap and may be modified by other libraries.
Use per-thread refcounting for the reference from function objects to their corresponding code object. This can be a source of contention when frequently creating nested functions. Deferred refcounting alone isn't a great fit here because these references are on the heap and may be modified by other libraries.
…iltins Use per-thread refcounting for the reference from function objects to the globals and builtins dictionaries.
…iltins Use per-thread refcounting for the reference from function objects to the globals and builtins dictionaries.
…iltins Use per-thread refcounting for the reference from function objects to the globals and builtins dictionaries.
…#125713) Use per-thread refcounting for the reference from function objects to the globals and builtins dictionaries.
This replaces `_PyEval_BuiltinsFromGlobals` with `_PyDict_LoadBuiltinsFromGlobals`, which returns a new reference instead of a borrowed reference. Internally, the new function uses per-thread reference counting when possible to avoid contention on the refcount fields on the builtins module.
This replaces `_PyEval_BuiltinsFromGlobals` with `_PyDict_LoadBuiltinsFromGlobals`, which returns a new reference instead of a borrowed reference. Internally, the new function uses per-thread reference counting when possible to avoid contention on the refcount fields on the builtins module.
…iltins (python#125713) Use per-thread refcounting for the reference from function objects to the globals and builtins dictionaries.
…GH-125847) This replaces `_PyEval_BuiltinsFromGlobals` with `_PyDict_LoadBuiltinsFromGlobals`, which returns a new reference instead of a borrowed reference. Internally, the new function uses per-thread reference counting when possible to avoid contention on the refcount fields on the builtins module.
Creating nested functions can be a source of reference count contention in the free-threaded build. Consider the following (contrived) example:
Creating many
square
functions concurrently causes reference count contention on thefunc_code
,func_globals
, andfunc_builtins
fields:cpython/Include/cpython/funcobject.h
Lines 11 to 16 in 21d2a9a
The code object and builtins and globals dictionaries are already configured to support deferred reference counting, but the references in
PyFunctionObject
are not_PyStackRef
s -- they are normal "counted" references.Note that this is an issue for nested functions (closures), but not module-level functions or methods because those are typically created infrequently.
I outline a few possibly ways to address this below. My preference is for 2a below.
Option 1: Use deferred reference counting in
PyFunctionObject
Variant 1a: Use
_PyStackRef
inPyFunctionObject
Instead of
PyObject *func_code
we have_PyStackRef func_code
. We use this strategy effectively in a number of other places, including the frame object and generators.The downside of this approach is that the fields of
PyFunctionObject
are exposed in public headers (cpython/funcobject.h
), even though they are not documented. Changing the type offunc_code
,func_globals
, andfunc_builtins
risks breaking backwards compatibility with some C API extensions.Variant 1b: Use
PyObject*
and new bitfieldInstead of using
_PyStackRef
, we can keep the fields asPyObject *
and store whether the field uses a deferred reference in a separate field. This was the approach I took by thenogil-3.9
fork.This has fewer compatibility issues than using
_PyStackRef
, but there are still compatibility hazards. It would not be safe for extensions to changefunc_code
/func_globals
/func_builtins
with something likePy_SETREF(func->func_globals, new_globals)
because the reference counting semantics are different.Option 2: Use per-thread reference counting
We already use per-thread reference counting for the references from instances to their types (i.e.,
ob_type
), if the type is a heap type. Storing the reference counts per-thread avoids most of the reference count contention on the object. This also avoids the compatibility issues with option 1 because you can use a per-thread incref with a normalPy_DECREF
-- the only risk is performance, not correctness.The challenge with this approach is that we need some quick and reliable way to index the per-thread reference count array. For heap types, we added a new field
unique_id
in the free-threaded build. We can do something similar for code objects, but the globals and builtins are just "normal" dictionaries and I don't think we want to add a new field for every dictionary.Variant 2a: Allocate space for identifier when creating globals and builtins.
When we create globals and builtins dictionaries we allocate space for an extra
Py_ssize_t
unique id at the end after thePyDictObject
. The type would still justPyDict_Type
, so Python and the rest of the C API would just see a normal dictionary. We can identify these special dictionaries using a bit inob_gc_bits
or by stealing another bit fromma_version_tag
.If the globals or builtins dictionaries are replaced by user defined dictionaries, than things would still work, they'd just might have scaling bottlenecks.
Variant 2b: Use a hash table for per-thread references
We can use a hash table to map
PyObject*
to their per-thread reference counts. This is less efficient than having a unique index into the per-thread reference count array, but avoids the need for an extra field.Linked PRs
The text was updated successfully, but these errors were encountered: