Add unstable C API function for enabling deferred reference counting (`PyUnstable_Object_EnableDeferredRefcount`) #42

colesbury · 2024-09-18T18:21:12Z

In the free-threaded build, marking an object as supporting deferred reference counting allows the interpreter to skip reference count modifications for values pushed and popped from the interpreter's evaluation stack and local variables. This is important for avoiding contention on reference count modifications that inhibit scaling. See also PEP 703: "Deferred Reference Counting". The tradeoff is that these objects are only collected when the tracing GC runs.

In the free-threaded build, we currently have an internal-only function _PyObject_SetDeferredRefcount that allows an object to have deferred references to it. We use this on methods, top-level functions, modules, heap types, and a few other types of objects.

I think we should expose this as an unstable API for C extensions that "hints" the runtime to make this tradeoff:

PR: gh-123619: Add an unstable C API function for enabling deferred reference counting python/cpython#123635 (by @ZeroIntensity)

By "hint", I mean that the runtime is free to ignore this suggestion. In particular, the the function is a no-op in the default build.

Motivation

Some C API extensions like nanobind create their own function and method-like objects. The nanobind.nb_func and nanobind.nb_method behave like functions and methods, but they are not subclasses of PyFunction_Type, PyCFunction_Type, or PyMethod_Type.

Without deferred reference counting (or immortalization) enabled, calling nanobind functions from multiple threads doesn't scale at all -- nanobind would not be very useful in the free-threaded build.

I think this is likely to come up in other extensions as well -- Peter's PR was created in response to a separate issue by @Yiling-J, for example.

Alternatives

Nanobind's support for free-threading currently relies on making nb_func and nb_method objects immortal in the free-threaded build immortal by basically copy-pasting _Py_SetImmortal. That's obviously not great - I'd much rather that nanobind is able to use an official, unstable API than to modify internals in a way that's likely to break in new CPython versions.

Note that in 3.13 deferred reference counting and immortalization are entangled due to time limitations, but this is no longer the case in main/3.14.

Python API?

I don't think we should expose this as a Python API, at least for now.

cc @wjakob, @ngoldbaum, @ZeroIntensity

The text was updated successfully, but these errors were encountered:

Yiling-J · 2024-09-19T00:21:59Z

My motivation is to make my cache package scalable, but it's still unclear to me how method is expected to perform under free threading. Does this mean cache.get(key) is contended by default, and that I need to use the new C API to ensure scalability? I’ve already shared my concerns in the issue, and let me paste them here:

Python users might not realize the potential bottleneck when using free-threading Python. If they don't consult the C API documentation, they may be unaware that methods could become a scalability issue and need to be manually optimized using the C API.

ZeroIntensity · 2024-09-19T00:38:02Z

FWIW, your case is somewhat specific. A bound method object (client.get) is not something that's generally accessed across threads, so there's no DRC on them by default.

encukou · 2024-09-19T09:57:28Z

I suggest PyUnstable_Object_EnableDeferredRefcount; it's not setting the refcount number.
(For a more stable function I'd suggest naming it after the intent rather than implementation, like _HintShared, but that's not too important at this point.)

I think it should be easier to add/remove reasons why deferred refcounting can't be enabled.
Even for PyUnstable_ functions, it's nice if they survive intact for a few releases :)

Suggestion:

return 0 if successful (breaking convention; usually this is 1)
return a positive value if nothing happened (GIL was enabled, object not owned by this thread, etc.). The specific reasons & values can change even in patch releases, and are only documented in the source.
reserve returning -1 with exception set for other errors

Keep in mind PyUnstable_ functions need docs and tests :)

vstinner · 2024-09-19T10:38:27Z

I think that it's reasonable addition, but I'm not sure about the API. I'm waiting to see a concrete API :-) I don't understand well when such call can fail.

colesbury · 2024-09-19T16:06:07Z

@encukou:

Yeah, I agree that PyUnstable_Object_EnableDeferredRefcount is a better name
I agree about different return codes for success vs. no-op, but why use 0 for success instead of 1? I'd expect 0 if nothing happened, such as in the default build, and 1 for success. I don't have a super strong preference here, but I also don't see a reason for breaking with convention.

@vstinner: the linked PR (python/cpython#123635) has a concrete API. The API only fails if the pre-conditions are not met:

Is the object tracked by the GC? The deferred reference counting implementation relies on the GC being able to find all deferred references, so it's important for correctness that the object is tracked by the GC.
Is it it thread-safe to modify the reference count? In the PR, this amounts to a check that the object is owned by the current thread and that the shared reference count is zero. That's basically a heuristic. As I'm writing this, I think that maybe we should instead try to make the API thread-safe in more cases by using atomic operations instead of trying to enforce this precondition.

@encukou:

I think "object not tracked by the GC" is better treated as an error, rather than as "nothing happened" because it's an indication of incorrect usage.

encukou · 2024-09-23T17:17:19Z

I agree about different return codes for success vs. no-op, but why use 0 for success instead of 1?

Because there's only one zero, but a lot of positive values :)

You could also do it with a pointer-to-result argument: int PyUnstable_Object_EnableDeferredRefcount(PyObject *obj, int *failure_reason) would return 0 if nothing hapened, and store the reason in that case.

I think "object not tracked by the GC" is better treated as an error, rather than as "nothing happened" because it's an indication of incorrect usage.

IMO, that's a detail handled by whoever defines the type. For whoever uses the object, it should be OK to ignore the detail -- the indication that nothing happened should be enough.

vstinner · 2024-09-23T17:39:56Z

Why not setting an exception with an error message, rather than an error code?

ZeroIntensity · 2024-09-23T17:56:33Z

Bikeshedding: I like HintDeferredRefcount more if we're going to emphasize "nothing happened" return values.

failure_reason is interesting, but how will the user convert that into something human readable? Something along the lines of PyUnstable_Object_EnableDeferredRefcountStrError?

For whoever uses the object, it should be OK to ignore the detail -- the indication that nothing happened should be enough.

I think we're almost certainly going to get users that don't understand how DRC works, and only know that it helps speed up threads, so they'll just apply it to every object they see. If trying to enable it on a very clearly incorrect object doesn't raise an exception, we're bound to have users putting no-ops everywhere -- wouldn't it be better to prevent that?

encukou · 2024-09-23T20:55:57Z

Why not setting an exception with an error message, rather than an error code?

Because

allocating an exception is a rather heavy operation
the failure reason is only really useful for debugging; to handle it in production you should always either show a warning or do nothing
since you shouldn't fail, the failure definitely shouldn't bubble up to the caller

I think we're almost certainly going to get users that don't understand how DRC works, and only know that it helps speed up threads, so they'll just apply it to every object they see. If trying to enable it on a very clearly incorrect object doesn't raise an exception, we're bound to have users putting no-ops everywhere -- wouldn't it be better to prevent that?

I wouldn't worry about users doing misguided PyUnstable calls, especially if there's a clear “this was a no-op” warning indicator to check.
I'm more worried about some extension starting to fail because some value that used to be GC-tracked changed to un-tracked -- either because the type itself changed, or a new type shows up in some use cases.

ZeroIntensity · 2024-09-23T21:38:34Z

At this point, do we even need three different return codes? If we go with failure_reason, then there's nothing that can raise an exception. We could just use -1 for "nothing happened" and then 0 for success.

vstinner · 2024-09-24T07:05:21Z

I see 3 cases:

Return 1: the object now uses deferred ref counting
Return 0: the object was already using deferred ref counting
Return -1: error, the object cannot use deferred ref counting -- see failure_reason for details

ZeroIntensity · 2024-09-24T10:57:10Z

Return 0: the object was already using deferred ref counting

This could be a failure_reason as well, but while we're here: do we need a PyUnstable equivalent of _PyObject_HasDeferredRefcount? (@colesbury)

colesbury · 2024-09-24T19:33:14Z

The int *failure_reason is not a common pattern in CPython. It's far more common to set an exception and return -1. I don't see a strong enough reason to deviate from that pattern.

I expect extensions to use this on instances of their own types during object construction. That's how we use it internally too and I think we should be explicit about that in the documentation. For what it's worth, I do not expect this to be widely used, but it is important for the few extensions that will make use of it.

I see the three cases as:

Return 1: the object now uses deferred ref counting
Return 0: the interpreter ignored the hint (e.g., we are in a GIL-enabled build, the object already uses deferred ref counting, or other unspecified cases)
Return -1: error, the call was made at the wrong place or on the wrong type of object. For example, the object is not GC tracked.

I think the -1 cases are analogous to PyErr_BadInternalCall(), but we can do a bit better with the error messages than that. Bubbling the error up to Python isn't perfect -- I agree that you should never see these in production -- but this seems to be the most common way to signify misuse of a Python C API.

I don't think we should expose _PyObject_HasDeferredRefcount yet.

vstinner · 2024-10-07T13:07:39Z

It seems like we have an agreement on the API, so let's vote.

Vote:

API: int PyUnstable_Object_EnableDeferredRefcount(PyObject *obj)

Return 1: the object now uses deferred ref counting
Return 0: the interpreter ignored the hint (e.g., we are in a GIL-enabled build, the object already uses deferred ref counting, or other unspecified cases)
Return -1: error, set an exception, the call was made at the wrong place or on the wrong type of object. For example, the object is not GC tracked.

encukou · 2024-10-07T14:31:53Z

+0 for this as unstable API, with the mentioned docs note:

I expect extensions to use this on instances of their own types during object construction. That's how we use it internally too and I think we should be explicit about that in the documentation.

(Without this, an extension that would change one of its types from GC-tracked to untracked would be making a backwards incompatible change.)

markshannon · 2024-10-08T08:03:14Z

Return 1: the object now uses deferred ref counting
Return 0: the interpreter ignored the hint (e.g., we are in a GIL-enabled build, the object already uses deferred ref counting, or other unspecified cases)

Does this mean it is permitted for an implementation to always return 0, or is it obliged to check if the object was already using deferred reference counting and return 1 if it wasn't?

(We plan to use deferred reference counting for the default build)

markshannon · 2024-10-08T08:09:50Z

Is an implementation allowed to return 1 if an object was already using deferred reference counting?

The specification is vague.

markshannon · 2024-10-08T08:19:43Z

I think this would be better:

Return 1: the object now uses deferred ref counting
Return 0: the object does not use deferred ref counting (e.g. that object cannot support deferred ref counting, or the implementation does not support deferred ref counting)

I don't see a need for this function to return -1 and raise an exception.

If calling PyUnstable_Object_EnableDeferredRefcount is supposed to be just a hint, then we can ignore the hint.
There is no need to raise an exception.

vstinner · 2024-10-14T10:28:17Z

@erlend-aasland @mdboom @serhiy-storchaka: You didn't vote on this API yet. What's your call on it?

vstinner · 2024-10-14T10:29:53Z

@markshannon:

I don't see a need for this function to return -1 and raise an exception.

The current implementation fails if PyType_IS_GC() is false: if the object is not traked by the GC.

In general, it's better to have an error path.

markshannon · 2024-10-14T10:54:37Z

The current implementation fails if PyType_IS_GC() is false: if the object is not tracked by the GC.

So? _PyObject_SetDeferredRefcount can return 0 if it can't defer an object.

In general, it's better to have an error path.

Why?

This supposed point of this function is to hint that reclamation of an object can be deferred. There is no reason for setting an exception, it just makes the function harder to use.

erlend-aasland · 2024-10-14T11:15:17Z

@erlend-aasland @mdboom @serhiy-storchaka: You didn't vote on this API yet. What's your call on it?

I removed my vote since there is ongoing discussion.

vstinner · 2024-10-14T14:02:50Z

@colesbury @ZeroIntensity: What's your opinion on the "error case"? Should the function return -1 and set an exception, or return 0 and "ignore the error"?

ZeroIntensity · 2024-10-14T14:26:15Z

As a user, exception messages are helpful when debugging--I'm all too familiar with digging through source to find why something is failing. No-ops, to me, are only useful if there's absolutely nothing I can do about it, such as on a GIL-enabled build (rather than the other case of trying to use it on an untracked type, in which case I should just remove the call entirely).

If calling PyUnstable_Object_EnableDeferredRefcount is supposed to be just a hint, then we can ignore the hint.

PyUnstable_Object_EnableDeferredRefcount is a hint, and may no-op depending on the state of the interpreter, but it should validate its arguments. (It's also good to have an error path in case there's a need for allocations in the future, so the MemoryError can propagate.)

colesbury · 2024-10-14T15:11:37Z

I had a slight preference for -1 for certain unsupported cases, like calling it on a non-GC type. If you're doing that, you are almost certainly doing something wrong, but I don't think it matters very much. I am also okay with @markshannon's suggestion.

markshannon · 2024-10-14T15:33:13Z

Why is calling PyUnstable_Object_EnableDeferredRefcount on a non-GC type doing something wrong?

PyUnstable_Object_EnableDeferredRefcount(obj) is a hint to the the interpreter that a particular object is likely to be long lived.
It has nothing to do with the internals of the current implementation of the cycle GC.

colesbury · 2024-10-14T15:37:56Z

It's a hint to use deferred reference counting and we don't support that on non-GC types.

markshannon · 2024-10-16T08:32:43Z

It's a hint to use deferred reference counting and we don't support that on non-GC types.

There are a couple of problems with that:

Unless you know the internals of the cycle GC, you can't tell what that does in practical terms
The user has to know which objects are "GC types"

What a user should know, and doesn't depend on the implementation, is whether an object is likely to be long-lived and whether it is OK if its reclamation is deferred.
How about changing the name to PyUnstable_Object_AllowedDeferredReclamation?

Or, if we don't want that yet, keep _PyObject_SetDeferredRefcount private?

ZeroIntensity · 2024-10-16T11:12:05Z

The user has to know which objects are "GC types"

This still applies to the no-op case. Arguably, that makes it more difficult for users to figure out why the function isn't working. Is it possible for this function to work on non-GC types in the future?

markshannon · 2024-10-16T12:44:18Z

Arguably, that makes it more difficult for users to figure out why the function isn't working.

Just because the VM doesn't do anything with that hint doesn't mean the function has failed. It always works.

colesbury · 2024-10-16T13:24:49Z

In the interest of moving this along, let's remove the -1 return code from the proposal.

How about changing the name to PyUnstable_Object_AllowedDeferredReclamation

No - we have multiple mechanisms that defer reclamation and this is one specific mechanism.

Or, if we don't want that yet, keep _PyObject_SetDeferredRefcount private?

This is addressed in the "Motivations" and "Alternatives". I don't think that's a good idea.

ZeroIntensity · 2024-10-16T15:37:07Z

Bikeshedding: HintDeferredRefcount makes it more clear that it's a hint--calling something named Enable that possibly doesn't enable anything seems counterintuitive.

markshannon · 2024-10-16T16:02:53Z

No - we have multiple mechanisms that defer reclamation and this is one specific mechanism.

Would it be better to have an API that says you are OK with deferred reclamation and let the VM decide which mechanism to use?

colesbury · 2024-10-17T17:18:21Z

Some of the deferred reclamation mechanisms are primarily about thread-safety ¹ as opposed to avoiding reference count contention. I don't think the VM has enough information to decide. I would rather have this API concretely do one thing.

For example, at one point we had _PyGC_BITS_SHARED_INLINE, which delayed tp_dealloc invocation so that we could make some racing reads and writes safe, but it didn't do anything for addressing reference count contention. It also turned out to have problems and got removed, but we may end up using something similar in the future. ↩

ZeroIntensity · 2024-10-21T14:15:47Z

I've given in and updated the PR to no-op on non-GC types. Are there any other concerns about the API?

encukou · 2024-11-04T14:59:23Z

With Mark being strongly for the no-fail API, and Sam not opposed to it, I think it's good to go.

(FWIW, I think the discussion was a bit too thorough, and unstable API shouldn't need C API WG approval at all. But, you do get better API by asking -- or at least better-considered API.)

colesbury mentioned this issue Sep 19, 2024

gh-123619: Add an unstable C API function for enabling deferred reference counting python/cpython#123635

Merged

colesbury changed the title ~~Add unstable C API function for enabling deferred reference counting (PyUnstable_Object_SetDeferredRefcount)~~ Add unstable C API function for enabling deferred reference counting (PyUnstable_Object_EnableDeferredRefcount) Sep 24, 2024

colesbury mentioned this issue Oct 1, 2024

[BUG]: Data race reported by TSAN with free-threading wjakob/nanobind#740

Closed

vstinner added the vote label Oct 7, 2024

Add unstable C API function for enabling deferred reference counting (PyUnstable_Object_EnableDeferredRefcount) #42

Add unstable C API function for enabling deferred reference counting (PyUnstable_Object_EnableDeferredRefcount) #42

Comments

colesbury commented Sep 18, 2024

Motivation

Alternatives

Python API?

Yiling-J commented Sep 19, 2024

ZeroIntensity commented Sep 19, 2024

encukou commented Sep 19, 2024

vstinner commented Sep 19, 2024

colesbury commented Sep 19, 2024

encukou commented Sep 23, 2024

vstinner commented Sep 23, 2024

ZeroIntensity commented Sep 23, 2024

encukou commented Sep 23, 2024

ZeroIntensity commented Sep 23, 2024

vstinner commented Sep 24, 2024

ZeroIntensity commented Sep 24, 2024

colesbury commented Sep 24, 2024

vstinner commented Oct 7, 2024 • edited by erlend-aasland Loading

encukou commented Oct 7, 2024

markshannon commented Oct 8, 2024

markshannon commented Oct 8, 2024 • edited Loading

markshannon commented Oct 8, 2024

vstinner commented Oct 14, 2024

vstinner commented Oct 14, 2024

markshannon commented Oct 14, 2024

erlend-aasland commented Oct 14, 2024

vstinner commented Oct 14, 2024

ZeroIntensity commented Oct 14, 2024

colesbury commented Oct 14, 2024

markshannon commented Oct 14, 2024

colesbury commented Oct 14, 2024

markshannon commented Oct 16, 2024

ZeroIntensity commented Oct 16, 2024

markshannon commented Oct 16, 2024

colesbury commented Oct 16, 2024

ZeroIntensity commented Oct 16, 2024

markshannon commented Oct 16, 2024

colesbury commented Oct 17, 2024

Footnotes

ZeroIntensity commented Oct 21, 2024

encukou commented Nov 4, 2024

Add unstable C API function for enabling deferred reference counting (`PyUnstable_Object_EnableDeferredRefcount`) #42

Add unstable C API function for enabling deferred reference counting (`PyUnstable_Object_EnableDeferredRefcount`) #42

vstinner commented Oct 7, 2024 •

edited by erlend-aasland

Loading

markshannon commented Oct 8, 2024 •

edited

Loading