-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C API] Avoid accessing PyObject and PyVarObject members directly: add Py_SET_TYPE() and Py_IS_TYPE(), disallow Py_TYPE(obj)=type #83754
Comments
Today, CPython is leaking too many implementation through its public C API. We cannot easily change the "default" C API, but we can enhance the "limited" C API (when Py_LIMITED_API macro is defined). Example of leaking implementation details: memory allocator, garbage collector, structure layouts, etc. Making PyObject an opaque structure would allow in the long term of modify structures to implement more efficient types (ex: list specialized for small integers), and it can prepare CPython to experiment tagged pointers. Longer rationale:
I propose to incremental evolve the existing limited C API towards opaque PyObject, by trying to reduce the risk of breakage. We may test changes on PyQt which uses the limited C API. Another idea would be to convert some C extensions of the standard library to the limited C API. It would ensure that the limited C API contains enough functions to be useful, but would also notify us directly if the API is broken. |
First issues that I met when I tried that:
|
In September 2018, Neil Schemenauer did an experiment:
More recent discussion on the capi-sig list: https://mail.python.org/archives/list/[email protected]/thread/JPUNPN3AILGXOA3C2TTSLMOFNSWJE3QX/ See also my notes: Wikipedia article: https://en.wikipedia.org/wiki/Tagged_pointer |
In the limited C API, Py_REFCNT() should be converted to: static inline Py_ssize_t _Py_REFCNT(const PyObject *ob)
{ return ob->ob_refcnt; }
#define Py_REFCNT(ob) _Py_REFCNT(_PyObject_CAST(ob)) It would enforce the usage of newly added Py_SET_REFCNT() (PR 18389) and advertise that the object is not modified (const). That would only be the first step towards a really opaque Py_REFCNT() function. |
TODO: Add Py_IS_TYPE() macro: #define Py_IS_TYPE(ob, tp) (Py_TYPE(ob) == (tp)) For example, replace: #define PyBool_Check(x) (Py_TYPE(x) == &PyBool_Type) with: #define PyBool_Check(x) Py_IS_TYPE(x, &PyBool_Type) IMHO it makes the code more readable. |
Py_TYPE() is commonly used to render the type name in an error message. Example: PyErr_Format(PyExc_TypeError,
"cannot convert '%.200s' object to bytearray",
Py_TYPE(arg)->tp_name); This code has multiple issues:
In September 2018, I created bpo-34595: "PyUnicode_FromFormat(): add %T format for an object type name". But there was disagreement, so I rejected my change. I started "bpo-34595: How to format a type name?" thread on python-dev: I didn't continue this work (until now), since it wasn't my priority. |
Make PyObject an opaque structure is also a first step towards the more ambitious project "HPy" project which is fully opaque: This API is written from scratch and currently implemented on top on the existing C API. The following article is a nice introduction to the overall idea: From my point of view, the long term goal would be to get better performance on PyPy and having a single API for C extension which would be efficient on all Python implementations (not only CPython). Currently, the C API is not only a performance issue to run C extensions on PyPy. It's also an issue in CPython. Because the C API leaks too many implementation details, we cannot experiment optimizations. |
To make PyObject opaque, we would have to convert Py_INCREF() and Py_DECREF() to opaque function calls. Example: #define Py_XINCREF(op) Py_IncRef(op)
#define Py_XDECREF(op) Py_DecRef(op) Benchmarks should be run to measure to overhead and balance the advantages and drawbacks. |
Would a Py_TYPE_IS() macro help code readability? For example:
#define Future_CheckExact(obj) (Py_TYPE(obj) == &FutureType)
would become:
#define Future_CheckExact(obj) (Py_TYPE_IS(obj, &FutureType)) Py_TYPE_IS() would be more efficient for tagged pointers. I'm not sure about the macro name. Neil used Py_IS_TYPE(obj, type). Note: Py_TYPE_EQ(obj, type) name sounds confusing since the first parameter is an object, whereas the second one is a type. |
You have merged so much PRs today. What they do? PyObject cannot just be made an opaque structure. The user code reads and writes its fields directly and via macros. This change would break working code. We can encourage the user code to prepare to making PyObject an opaque structure. We need to provide a stable C API for access of PyObject fields for this. Note that there is a performance penalty of using functions instead of direct access, so you should have very good reasons to do this. |
I merged changes which prepares CPython code base to make PyObject opaque. I only merged changes which should have no impact on performance, but prepare the API to make the structure opaque. Right now, Py_SET_REFNCT() stills access directly to PyObject.ob_refcnt. But it becomes possible to make Py_SET_REFNCT() an opaque function call. Do you see any issue with the changes that I already merged? Using PGO+LTO, static inline functions should be as efficient as the previous code using Py_REFCNT() & cie macros.
I'm trying to modifying the limited C API to make it possible: all access to PyObject fields should go through macros or function calls. The question is now how which fields are accessed and how.
For the short term, I don't plan to make PyObject opaque, so I don't plan to enforce usage of Py_TYPE(), Py_SET_REFCNT(), etc.
Yeah, replacing Py_REFCNT() macro with an opaque function call is likely to have an impact on performance. It should be properly measure, I'm well aware of that, I already wrote it in a previous comment ;-) I don't plan to push change such right now. And I will wait for the review of my peers (like you) for such change ;-) |
"static inline" functions are not opaque - as they get inlined into 3rd-party compiled code, we can't change anything they reference, and so the structure layout is still fixed and has to be visible to the user's compiler. I'm not totally against the changes, but it's worth pointing out that you aren't achieving what the issue title claims, so it's really just code cleanliness (and/or introducing macro-users to static inline functions ;) ). |
I'm well aware of that :-) But once the CPython code base will stop accessing directly PyObject fields directly, it would become possible to experiment changing PyObject layout, at least testing it in CPython. First changes are just to prepare the code base to experiment the real change. But as Serhiy pointed out, the second part will have an impact on performance and so should be carefully benchmarked to balance advantages and drawbacks, even if it's only done in the limited C API. |
FYI, I am working on to add Py_IS_TYPE macro. :) |
Hi, guys. Is there value in adding |
"obj == Py_None" is a very common pattern. You have check how it is done in HPy: https://github.com/pyhandle/hpy See also bpo-39511: "[subinterpreters] Per-interpreter singletons (None, True, False, etc.)". |
Thanks, I will check it. |
See also bpo-44378: "Py_IS_TYPE(): cast discards ‘const’ qualifier from pointer target type". If Py_TYPE() is converted again to a static inline function which takes a "const PyObject*" type, Py_IS_TYPE() can be modified again at the same time to use Py_TYPE(). |
I checked again the list of broken projects listed previously. Fixed:
Fix proposed: Broken:
|
At commit cb15afc, I am able to rename PyObject members (to make sure that the structure is not accessed directly), I only had to modify header files:
And just two more C files which corner cases:
-- I did the same with PyVarObject, rename the ob_size member. I had to modify header files:
But I had to modify the following function of the array module: static int
array_buffer_getbuf(arrayobject *self, Py_buffer *view, int flags)
{
...
if ((flags & PyBUF_ND)==PyBUF_ND) {
view->shape = &((PyVarObject*)self)->ob_size;
}
...
return 0;
} I'm not sure how to patch this function. -- This experience doesn't check usage of sizeof(PyObject) and sizeof(PyVarObject) which would break if these structures become opaque. sizeof() issues are listed in previous comments. |
Oh and obviously, it's not possible possible to define structures which *include* PyObject or PyVarObject if PyObject and PyVarObject become opaque. Example: typedef struct {
PyObject ob_base;
Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject; This C code requires the PyObject structure to be fully defined (not being opaque). A new C API and ABI where structures *don't* include PyObject or PyVarObject should be designed to allocate their members "before" the PyObject* pointer value. Something like the current PyGC_Head structure which is excluded from PyObject and stored *before* the "PyObject*" pointer. Simplified code which allocates memory for an object implementin the GC protocol: static PyObject *
_PyObject_GC_Malloc(size_t basicsize)
{
...
size_t size = sizeof(PyGC_Head) + basicsize;
...
PyGC_Head *g = (PyGC_Head *)PyObject_Malloc(size);
...
PyObject *op = (PyObject *)(g + 1);
return op;
} |
I changed the issue title to restrict its scope: "[C API] Avoid accessing PyObject and PyVarObject members directly: add Py_SET_TYPE() and Py_IS_TYPE(), disallow Py_TYPE(obj)=type". Making PyObject and PyVarObject structures opaque is a broader topic which should be splited into sub-issues. "Py_TYPE(obj)=type;" is now disallowed. I consider that the work of this issue is now completed and I close the issue. Thanks everyone who help to fix these tedious issues! You can continue to use this issue if you need my help to adapt your C extensions to Py_SET_TYPE()/Py_SET_SIZE(). See also the upgrade_pythoncapi.py script of the pythoncapi_compat project which helps to port your C extensions without losing support for old Python versions: See also the Py_TYPE() change announcement on the capi-sig list: |
I wrote an article about these changes: It elaborates the rationale for making these changes. |
FYI this regression was handled last year in bpo-44348 "test_exceptions.ExceptionTests.test_recursion_in_except_handler stack overflow on Windows debug builds" and fixed at 2021-09-07 by using the trashcan mecanism in the BaseException deallocator function: New changeset fb30509 by Victor Stinner in branch 'main': |
Since Py_TYPE() was changed to an inline static function in PEP670, `Py_TYPE(obj) = new_type` must be replaced with the new function `Py_SET_TYPE(obj, new_type)`, available since Python 3.9. For backward compatibility, this macro can be used: ````c++ #if PY_VERSION_HEX < 0x030900A4 && !defined(Py_SET_TYPE) static inline void _Py_SET_TYPE(PyObject *ob, PyTypeObject *type) { ob->ob_type = type; } #define Py_SET_TYPE(ob, type) _Py_SET_TYPE((PyObject*)(ob), type) #endif ```` See python/cpython#83754
Don't access directly PyObject.ob_type, but use the Py_TYPE() macro instead.
Don't access directly PyObject.ob_type, but use the Py_TYPE() macro instead.
Don't access directly PyObject.ob_type, but use the Py_TYPE() macro instead.
Don't access directly PyObject.ob_type, but use the Py_TYPE() macro instead.
Don't access directly PyObject.ob_type, but use the Py_TYPE() macro instead.
Don't access directly PyObject.ob_type, but use the Py_TYPE() macro instead.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: