Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyCode_Type.tp_basictype change in Python 3.11 broke Cython #93585

Closed
vstinner opened this issue Jun 7, 2022 · 9 comments
Closed

PyCode_Type.tp_basictype change in Python 3.11 broke Cython #93585

vstinner opened this issue Jun 7, 2022 · 9 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@vstinner
Copy link
Member

vstinner commented Jun 7, 2022

PyCode_Type.tp_basicsize was changed in Python 3.11 by the commit 2bde682 (by @brandtbucher):

 PyTypeObject PyCode_Type = {
     PyVarObject_HEAD_INIT(&PyType_Type, 0)
     "code",
-    sizeof(PyCodeObject),
-    0,
+    offsetof(PyCodeObject, co_code_adaptive),
+    sizeof(_Py_CODEUNIT),
     (destructor)code_dealloc,           /* tp_dealloc */
     0,                                  /* tp_vectorcall_offset */
     0,                                  /* tp_getattr */

This change broke the gevent project which uses Cython to access the PyCodeObject structure: cython/cython#4827

Cython expects that PyCode_Type.tp_basicsize = sizeof(PyCodeObject) which was true until Python 3.10, but is false in Python 3.11.

Python defines 8 built-in types which use a PyVarObject. Python is inconsistent on tp_basicsize on these types (current values in the main branch):

  • PyByteArray_Type.tp_basicsize = sizeof(PyByteArrayObject). PyByteArrayObject has no flexible array but a char *ob_bytes; member.
  • PyBytes_Type.tp_basicsize = PyBytesObject_SIZE = (offsetof(PyBytesObject, ob_sval) + 1). PyBytesObject structure ends with the flexible array char ob_sval[1].
  • PyCode_Type.tp_basicsize = offsetof(PyCodeObject, co_code_adaptive). PyCodeObject structure ends with a flexible array: char co_code_adaptive[1].
  • PyLong_Type.tp_basicsize = offsetof(PyLongObject, ob_digit). PyLongObject structure ends with a flexible array: digit ob_digit[1].
  • PyType_Type.tp_basicsize = sizeof(PyHeapTypeObject). PyHeapTypeObject structure has no flexible array.
  • PyList_Type.tp_basicsize = sizeof(PyListObject). PyListObject structure has no flexible array but PyObject **ob_item.
  • PyTuple_Type.tp_basicsize = sizeof(PyTupleObject) - sizeof(PyObject *). PyTupleObject structure ends with a flexible array: PyObject *ob_item[1].
  • PyMemoryView_Type.tp_basicsize = offsetof(PyMemoryViewObject, ob_array). PyMemoryViewObject structure ends with a flexible array: Py_ssize_t ob_array[1].

See also the discussion about the undefined behavior related to flexible arrays like this one: #84301

@vstinner
Copy link
Member Author

vstinner commented Jun 7, 2022

The Python test suite pass with this PyCode_Type.tp_basicsize change:

diff --git a/Objects/codeobject.c b/Objects/codeobject.c
index c2b29be1fe..18d25a5cba 100644
--- a/Objects/codeobject.c
+++ b/Objects/codeobject.c
@@ -1903,7 +1903,7 @@ static struct PyMethodDef code_methods[] = {
 PyTypeObject PyCode_Type = {
     PyVarObject_HEAD_INIT(&PyType_Type, 0)
     "code",
-    offsetof(PyCodeObject, co_code_adaptive),
+    sizeof(PyCodeObject),
     sizeof(_Py_CODEUNIT),
     (destructor)code_dealloc,           /* tp_dealloc */
     0,                                  /* tp_vectorcall_offset */

@encukou
Copy link
Member

encukou commented Jun 7, 2022

Cython expects that PyCode_Type.tp_basicsize = sizeof(PyCodeObject) which was true until Python 3.10, but is false in Python 3.11.

As far as I can see, this is a wrong assumption to make.

@vstinner
Copy link
Member Author

vstinner commented Jun 7, 2022

As far as I can see, this is a wrong assumption to make.

Well, I would like to say that yes, Cython should not make this assumption.

For now, I propose to restore Python 3.10 behavior so gevent and Cython just works on Python 3.11 (with PyCode_Type).

Once Cython will be updated to no longer make this assumption or once PyCodeObject structure is modified to use a C99 flexible array of size zero with the syntax char co_code_adaptive[], it will be safe to change PyCode_Type.tp_basicsize again to offsetof(PyCodeObject, co_code_adaptive) (to reduce the memory consumption).

My concern is to get a working gevent (and Cython) on Python 3.11 right now (before Python 3.11 final release). We can revisit PyCode_Type.tp_basicsize in Python 3.12. See also issue #84301 about C99 flexible arrays.

Updating Cython is slow. Getting new releases of projects using Cython, with C code re-generated by updated Cython, is even longer. So I propose #93587 as a practical fix for now.

@encukou
Copy link
Member

encukou commented Jun 7, 2022

Updating Cython is slow. Getting new releases of projects using Cython, with C code re-generated by updated Cython, is even longer.

So slow it can't be done in 4 months?
AFAIK, not all projects that use Cython need to be updated -- only those that use PyCodeObject, and those should be prepared to update.

@vstinner
Copy link
Member Author

vstinner commented Jun 7, 2022

So slow it can't be done in 4 months?

Right. Getting greenlet + gevent fixed before Python 3.11 final release looks complicated. There are multiple issues: see below. Fixing all of them will take time, whereas #93587 is a simple onliner fix which would make my life easier for now :-) As I wrote, we can revisit the code in Python 3.12.

gevent is a common dependency and so we should fix in Fedora to discover next issues with Python 3.11. For now, there are likely more issues which are not know yet, simply because we cannot build Python packages depending on gevent. So it would be better to get a working gevent before Python 3.11 final release.

Right, wasting 2 bytes per code object is not perfect, but 2 bytes of memory is small, compared to the number of bugs related to greenlet, gevent, Cython, C API changes, etc. (I'm not sure if it's 2 bytes or 8 bytes, it may depend on the alignment.)

greenlet

I added Python 3.11 support to greenlet 2.0 alpha (commit). I asked if it would be possible to switch the 2.0 version to the beta phase: issue. So far (for 2 weeks), there is no reply.

@hroncok is worried by the alpha status and would prefer to use a stable greenlet version. So I adopted a new approach: I proposed to add a 1.1.3 release and I proposed a PR for that (mostly written by @hroncok).

Cython issue with PyCode_Type

See cython/cython#4827

gevent uses PyCode_Type.

Add Python 3.11 support to gevent

I wrote gevent/gevent#1872 to add Python 3.11 support to gevent. But I'm not a Cython expert and I don't know how to implement set_f_code() on Python 3.11. This function needs to set _PyInterpreterFrame.f_code, but I don't know how to get the _PyInterpreterFrame structure in Cython.

@hroncok
Copy link
Contributor

hroncok commented Jun 7, 2022

As far as I am concerned, this can be fixed in Cython. There is no rush to revert this in CPython. The problem of getting gevent to work is not blocked by this particular problem, but on other problems.

@mdboom
Copy link
Contributor

mdboom commented Jun 7, 2022

IIUC, we are already in a world where updating Cython and regenerating will be required for Python 3.11 (cython/cython#4584), so expected Cython to update for this doesn't make things any worse in that regard.

@da-woods
Copy link
Contributor

da-woods commented Jun 7, 2022

Cython expects that PyCode_Type.tp_basicsize = sizeof(PyCodeObject) which was true until Python 3.10, but is false in Python 3.11.

As far as I can see, this is a wrong assumption to make.

As far as I am concerned, this can be fixed in Cython. There is no rush to revert this in CPython. The problem of getting gevent to work is not blocked by this particular problem, but on other problems.

IIUC, we are already in a world where updating Cython and regenerating will be required for Python 3.11 (cython/cython#4584), so expected Cython to update for this doesn't make things any worse in that regard.

Agree with all of the above.

This is mainly a sanity check for "are we running with the same version that we're compiled with" that isn't quite right for variable sized objects, so it's a pretty solvable problem.

@encukou
Copy link
Member

encukou commented Jun 8, 2022

My concern is to get a working gevent (and Cython) on Python 3.11 right now (before Python 3.11 final release).

It turns out that there is a less invasive fix we can do in gevent instead. No need to change CPython for this. I'll close the issue & PR.
Thank you for helping, everyone!

@encukou encukou closed this as completed Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants