Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference count contention with nested functions #124218

Closed
colesbury opened this issue Sep 18, 2024 · 2 comments
Closed

Reference count contention with nested functions #124218

colesbury opened this issue Sep 18, 2024 · 2 comments
Labels
performance Performance or resource usage topic-free-threading

Comments

@colesbury
Copy link
Contributor

colesbury commented Sep 18, 2024

Creating nested functions can be a source of reference count contention in the free-threaded build. Consider the following (contrived) example:

from concurrent.futures import ThreadPoolExecutor

def func(n):
    sum = 0
    for i in range(n):
        def square(x):
            return x ** 2
        sum += square(i)

with ThreadPoolExecutor(max_workers=8) as executor:
    future = executor.submit(func, 100000)
    print(future.result())

Creating many square functions concurrently causes reference count contention on the func_code, func_globals, and func_builtins fields:

#define _Py_COMMON_FIELDS(PREFIX) \
PyObject *PREFIX ## globals; \
PyObject *PREFIX ## builtins; \
PyObject *PREFIX ## name; \
PyObject *PREFIX ## qualname; \
PyObject *PREFIX ## code; /* A code object, the __code__ attribute */ \

The code object and builtins and globals dictionaries are already configured to support deferred reference counting, but the references in PyFunctionObject are not _PyStackRefs -- they are normal "counted" references.

Note that this is an issue for nested functions (closures), but not module-level functions or methods because those are typically created infrequently.

I outline a few possibly ways to address this below. My preference is for 2a below.

Option 1: Use deferred reference counting in PyFunctionObject

Variant 1a: Use _PyStackRef in PyFunctionObject

Instead of PyObject *func_code we have _PyStackRef func_code. We use this strategy effectively in a number of other places, including the frame object and generators.

The downside of this approach is that the fields of PyFunctionObject are exposed in public headers (cpython/funcobject.h), even though they are not documented. Changing the type of func_code, func_globals, and func_builtins risks breaking backwards compatibility with some C API extensions.

Variant 1b: Use PyObject* and new bitfield

Instead of using _PyStackRef, we can keep the fields as PyObject * and store whether the field uses a deferred reference in a separate field. This was the approach I took by the nogil-3.9 fork.

This has fewer compatibility issues than using _PyStackRef, but there are still compatibility hazards. It would not be safe for extensions to change func_code/func_globals/func_builtins with something like Py_SETREF(func->func_globals, new_globals) because the reference counting semantics are different.

Option 2: Use per-thread reference counting

We already use per-thread reference counting for the references from instances to their types (i.e., ob_type), if the type is a heap type. Storing the reference counts per-thread avoids most of the reference count contention on the object. This also avoids the compatibility issues with option 1 because you can use a per-thread incref with a normal Py_DECREF -- the only risk is performance, not correctness.

The challenge with this approach is that we need some quick and reliable way to index the per-thread reference count array. For heap types, we added a new field unique_id in the free-threaded build. We can do something similar for code objects, but the globals and builtins are just "normal" dictionaries and I don't think we want to add a new field for every dictionary.

Variant 2a: Allocate space for identifier when creating globals and builtins.

When we create globals and builtins dictionaries we allocate space for an extra Py_ssize_t unique id at the end after the PyDictObject. The type would still just PyDict_Type, so Python and the rest of the C API would just see a normal dictionary. We can identify these special dictionaries using a bit in ob_gc_bits or by stealing another bit from ma_version_tag.

If the globals or builtins dictionaries are replaced by user defined dictionaries, than things would still work, they'd just might have scaling bottlenecks.

Variant 2b: Use a hash table for per-thread references

We can use a hash table to map PyObject* to their per-thread reference counts. This is less efficient than having a unique index into the per-thread reference count array, but avoids the need for an extra field.

Linked PRs

@colesbury colesbury added performance Performance or resource usage topic-free-threading labels Sep 18, 2024
@colesbury colesbury changed the title Address reference count contention with nested functions Reference count contention with nested functions Sep 18, 2024
@markshannon
Copy link
Member

__name__ and __qualname__ can be dealt with by faster-cpython/ideas#674
That doesn't help with the other three, though.

2a seems like the best solution. The dict version tag is being removed in 3.14, so most of that can be used for the ID.

Just to be clear, the per-thread reference count would be in addition to the normal reference count and primarily to avoid contention when creating closures?

@colesbury
Copy link
Contributor Author

Just to be clear, the per-thread reference count would be in addition to the normal reference count and primarily to avoid contention when creating closures?

Yes, that's right.

colesbury added a commit to colesbury/cpython that referenced this issue Oct 1, 2024
Currently, we only use per-thread reference counting for heap type
objects and the naming reflects that. In an upcoming change we will
extend it to code objects and certain dictionaries used in
`PyFunctionObject` to avoid scaling bottlenecks when creating nested
functions.

Rename some of the files and functions in preparation for this change.
colesbury added a commit to colesbury/cpython that referenced this issue Oct 1, 2024
Currently, we only use per-thread reference counting for heap type
objects and the naming reflects that. In an upcoming change we will
extend it to code objects and certain dictionaries used in
`PyFunctionObject` to avoid scaling bottlenecks when creating nested
functions.

Rename some of the files and functions in preparation for this change.
colesbury added a commit to colesbury/cpython that referenced this issue Oct 1, 2024
colesbury added a commit to colesbury/cpython that referenced this issue Oct 1, 2024
colesbury added a commit that referenced this issue Oct 1, 2024
Currently, we only use per-thread reference counting for heap type objects and
the naming reflects that. We will extend it to a few additional types in an
upcoming change to avoid scaling bottlenecks when creating nested functions.

Rename some of the files and functions in preparation for this change.
colesbury added a commit to colesbury/cpython that referenced this issue Oct 8, 2024
Use per-thread refcounting for the reference from function objects to
their corresponding code object. This can be a source of contention when
frequently creating nested functions. Deferred refcounting alone isn't a
great fit here because these references are on the heap and may be
modified by other libraries.
colesbury added a commit to colesbury/cpython that referenced this issue Oct 8, 2024
Use per-thread refcounting for the reference from function objects to
their corresponding code object. This can be a source of contention when
frequently creating nested functions. Deferred refcounting alone isn't a
great fit here because these references are on the heap and may be
modified by other libraries.
colesbury added a commit to colesbury/cpython that referenced this issue Oct 9, 2024
Use per-thread refcounting for the reference from function objects to
their corresponding code object. This can be a source of contention when
frequently creating nested functions. Deferred refcounting alone isn't a
great fit here because these references are on the heap and may be
modified by other libraries.
colesbury added a commit to colesbury/cpython that referenced this issue Oct 9, 2024
Use per-thread refcounting for the reference from function objects to
their corresponding code object. This can be a source of contention when
frequently creating nested functions. Deferred refcounting alone isn't a
great fit here because these references are on the heap and may be
modified by other libraries.
colesbury added a commit that referenced this issue Oct 15, 2024
Use per-thread refcounting for the reference from function objects to
their corresponding code object. This can be a source of contention when
frequently creating nested functions. Deferred refcounting alone isn't a
great fit here because these references are on the heap and may be
modified by other libraries.
colesbury added a commit to colesbury/cpython that referenced this issue Oct 18, 2024
…iltins

Use per-thread refcounting for the reference from function objects to
the globals and builtins dictionaries.
colesbury added a commit to colesbury/cpython that referenced this issue Oct 18, 2024
…iltins

Use per-thread refcounting for the reference from function objects to
the globals and builtins dictionaries.
colesbury added a commit to colesbury/cpython that referenced this issue Oct 18, 2024
…iltins

Use per-thread refcounting for the reference from function objects to
the globals and builtins dictionaries.
colesbury added a commit that referenced this issue Oct 21, 2024
…#125713)

Use per-thread refcounting for the reference from function objects to
the globals and builtins dictionaries.
colesbury added a commit to colesbury/cpython that referenced this issue Oct 22, 2024
This replaces `_PyEval_BuiltinsFromGlobals` with
`_PyDict_LoadBuiltinsFromGlobals`, which returns a new reference
instead of a borrowed reference. Internally, the new function uses
per-thread reference counting when possible to avoid contention on the
refcount fields on the builtins module.
colesbury added a commit that referenced this issue Oct 24, 2024
This replaces `_PyEval_BuiltinsFromGlobals` with
`_PyDict_LoadBuiltinsFromGlobals`, which returns a new reference
instead of a borrowed reference. Internally, the new function uses
per-thread reference counting when possible to avoid contention on the
refcount fields on the builtins module.
ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025
…iltins (python#125713)

Use per-thread refcounting for the reference from function objects to
the globals and builtins dictionaries.
ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025
…GH-125847)

This replaces `_PyEval_BuiltinsFromGlobals` with
`_PyDict_LoadBuiltinsFromGlobals`, which returns a new reference
instead of a borrowed reference. Internally, the new function uses
per-thread reference counting when possible to avoid contention on the
refcount fields on the builtins module.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage topic-free-threading
Projects
None yet
Development

No branches or pull requests

2 participants