You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PyPy is an alternative implementation of the Python language. PyPy’s C API
compatibility layer has some performance issues. Carl Friedrich
Bolz-Tereick and I are working on a way to make PyPy’s C
API interactions much faster. It’s looking very promising. Here’s a sketch of
how it works.
The C API as lingua franca
Python is pretty widely-used. For years, CPython was the only implementation,
and CPython was not designed to be fast. The Python community needed some
programs to go faster and determined that the best path forward was to write
some modules in C and interact with them from Python. Worked just fine.
Then other Python runtimes like PyPy came along. PyPy includes a JIT compiler
and its execution of normal Python code is very fast, at least until it hits
a call from Python to a C extension function. Then things go a little bit
sideways. First and foremost because the PyPy JIT can’t “see into” the native
code; it’s generated outside of the JIT by some other compiler and is therefore
opaque. And second of all because the binding API for the aforementioned C
modules (“The C API”) requires a totally different object and memory
representation than PyPy has internally.
PyPy has its own object model, runtime, and moving garbage collector, all to
get better performance. Unfortunately, this means that whenever you call a C
API function from PyPy, it has to stop what it’s doing, set up some C API
scaffolding, do the C API call, and then take down the scaffolding.
For example, the C API is centered around PyObject pointers. PyPy does not
normally use PyObjects. It has to allocate a PyObject, make it point into
the PyPy heap, call the C API function, and then (potentially) free the
PyObject. (This ignores GIL stuff and exception checking, which is also an
issue.)
Fig. 1 - C extensions deal in PyObject* so any runtime that wants to
interface with them has to also deal in PyObject*.
That’s a lot of overhead. (And there’s more, too. See Antonio Cuni’s excellent
blog
post.)
And it’s a hard problem that has bitten multiple alternative Python
runtimes.
In addition to the overhead of boxing into a PyObject, the underlying C
function that the C extension calls may not even need the PyObject to exist
in the first place. For example, a lot of C API functions are structured like
this:
In this example, the PyObject* code inc is only a wrapper around another
function inc_impl that works directly on C integers.
Fig. 2 - The runtimes still have to manufacture PyObject* even if the
underlying C code doesn’t know anything about Python. The unboxing is an
another source of overhead, unfortunately.
All of the bits in the middle between the JIT and the C implementation (the
entire inc function, really) are “wasted work” because the work is not needed
for the actual execution of the user’s program.
So even if the PyPy JIT is doing great work and has eliminated memory
allocation in Python code—PyPy could have unboxed some heap allocated Python
integer object into a C long, for example—it still has to heap allocate a
PyObject for the C API… only to throw it away soon after.
If there was a way to communicate to PyPy that inc expects a long and is
going to unbox it into a C long (and will also return a C long), it
wouldn’t need to do any of these shenanigans.
And yes, ideally there wouldn’t be a C API call at all. But sometimes you have
to (perhaps because you have no control over the code), and you might as well
speed up that call. Let’s see if it can be done.
Potsdam
This is where I come into this. I was at the ECOOP conference in 2022 where
Carl Friedrich introduced me to some other implementors
of alternative Python runtimes. I got to talk to the authors of PyPy and ZipPy
and GraalPython over some coffee and beer. They’re really nice.
They’ve been collectively working on a project called HPy. HPy is a new design
for a C API Python that takes alternative runtimes into account. As part of
this design, they were investigating a way to pipe type
information from C extension
modules through the C API and into a place where the host runtime can read it.
It’s a tricky problem because not only is there a C API, but also a C ABI (note
the “B” for “binary”). While an API is an abstract contract between caller and
callee for how to call a function, an ABI is more concrete. In the case of the
C ABI, it means not changing struct layouts or sizes, adding function
parameters, things like that. This is kind of a tight constraint and it wasn’t
clear what the best backward-compatible way to add type information was.
Sometime either in this meeting or shortly after, I had an idea for how to do
it without changing the API or ABI and I decided to take a stab at implementing
it for Cinder (the Python
runtime I was working on at the time).
The solution: sketchy C things?
In order to better understand the problems, let’s take a look at the kind of
type information we want to add. This is the kind of type metadata we want to
add to each typed method, represented as a C struct.
In this artificially limited example, we store the type information for one
argument (but more could be added in the future), the type information for the
return value, and the underlying (non-PyObject*) C function pointer.
But it’s not clear where to put that in a PyMethodDef. The existing
PyMethodDef struct looks like this. It contains a little bit of metadata and
a C function pointer (the PyObject* one). In an ideal world, we would “just”
add the type metadata to this struct and be done with it. But we can’t change
its size for ABI reasons.
structPyMethodDef{constchar*ml_name;/* The name of the built-in function/method */PyCFunctionml_meth;/* The C function that implements it */intml_flags;/* Combination of METH_xxx flags, which mostly
describe the args expected by the C func */constchar*ml_doc;/* The __doc__ attribute, or NULL */};typedefstructPyMethodDefPyMethodDef;
What to do? Well, I decided to get a little weird with it and see if we could
sneak in a pointer to the metadata somehow. My original idea was to put the
entire PyPyTypedMethodMetadata struct behind the PyMethodDef struct (kind
of like how malloc works), but that wouldn’t work so well: PyMethodDefs are
commonly statically allocated in arrays, and we can’t change the layout of
those arrays.
But what we can do is point the ml_name field to a buffer inside another
struct.
Then, when we notice that a method is marked as typed (with a new METH_TYPED
flag we can add to the ml_flags bitset), we can read backwards to find the
PyPyTypedMethodMetadata struct. Here’s how you might do that in C:
structPyPyTypedMethodMetadata{intarg_type;intret_type;void*underlying_func;constcharml_name[100];// New field!};typedefstructPyPyTypedMethodMetadataPyPyTypedMethodMetadata;
PyPyTypedMethodMetadata GetTypedSignature(PyMethodDefdef) { assert(def->ml_flags&METH_TYPED);// A new type of flag! return(PyPyTypedMethodMetadata*)(def->ml_name-offsetof(PyPyTypedMethodMetadata,ml_name)); }
And here’s a diagram to illustrate this because it’s really weird and
confusing.
I started off with a mock
implementation
of this in C (no Python C API, just fake structures to sketch it out) and it
worked. So I implemented a hacky version of it in Cinder, but never shipped it
because my integration with Cinder was a little too hacky. I wrote up the
ideas for posterity in
case someone wanted to take up the project.
A year later, nobody else had, so I decided to poke Carl Friedrich and see if
we could implement it in PyPy. We’ll see how that implementation looks in a
minute. But first, an aside on where C extensions come from.
Where do all the C extensions come from?
Well, in PyPy, there are none in the standard library. PyPy has been almost
entirely written in Python so that the code is visible to the JIT. But people
like using Python packages, and some Python packages contain C extensions.
There are a couple of different ways to write a C extension. The “simplest” (as
in, all the components are visible and there is no magic and there are no
external dependencies) is to hand-write it. If you don’t want to do that, you
can also use a binding generator to write the glue code for you. I have the
most experience with Cython, but other binding generators like nanobind,
pybind11, and even CPython’s own Argument Clinic exist too!
Hand-written
Let’s recall the inc/inc_impl function from earlier. That’s a reasonable
example of a function that could be integrated as a hand-written C extension to
Python. In order to make it callable from Python, we have to make a full C
extension module. In this case, that’s just a list of function pointers and how
to call them.
staticPyMethodDefmytypedmod_methods[]={ {"inc",inc,METH_O,"Add one to an int"}, {NULL,NULL,0,NULL}};
staticstructPyModuleDefmytypedmod_definition={ PyModuleDef_HEAD_INIT,"mytypedmod", "A C extension module with type information exposed.",-1, mytypedmod_methods, NULL, NULL, NULL, NULL};
We have an array of PyMethodDef structs, one for each method we want to wrap.
Then we have a PyModuleDef to define the module, which can also contain
attributes and some other stuff. Then we provide a sort of __new__ function
for the module, in the form of a PyInit_ function. This is found by dlopen
in the C extension loader built into Python.
It’s possible to manually augment this module by adding a
PyPyTypedMethodMetadata struct and a METH_TYPED flag. It’s a little
cumbersome, but if it speeds up interactions with the module… well, extension
authors might be cajoled into adding the type information or at least accepting
a pull request.
But not all extensions are hand-written. Many are generated by binding
generators like Cython. And Cython is interesting because it can generate the
type signatures automatically…
Cython
Unlike many other binding generators for Python, Cython provides a
fully-featured Python-like programming language that compiles to C. The types
obey different rules than in normal Python code and can be used for
optimization. Cython also has primitive types. Let’s see an example.
In this snippet of Cython code, we make a function that adds two machine
integers and returns a machine integer.
cpdefintadd(inta,intb):returna+b
Cython will generate a very fast C function that adds two machine integers.
Calls to this from Cython are type checked at compile time and will be as fast
as your C compiler allows:
Since we used cpdef instead of cdef, Cython will also generate a wrapper C
extension function so that this function can be called from Python.
This means that the generated Cython wrapper code looks like (a much uglier
version of) below. You don’t need to understand or really even read the big
blob of cleaned-up and annotated generated code below. You just need to say
“ooh” and “aah” and “wow, so many if-statements and so much allocation and so
many function calls.”
And it’s also a little worse than the METH_O example above since it has to
unwrap an array of fastcall args and do some argument processing.
staticPyObject*add_and_box(CYTHON_UNUSEDPyObject*__pyx_self,int__pyx_v_a,int__pyx_v_b){intresult=add(__pyx_v_a,__pyx_v_b,0);// Check if an error occurred (unnecessary in this case)if(result==((int)-1)&&PyErr_Occurred()){returnNULL;}// Wrap result in PyObject*returnPyLong_FromLong(result);}
staticPyObjectadd_python(PyObject__pyx_self, PyObjectconst__pyx_args, Py_ssize_t__pyx_nargs, PyObject__pyx_kwds){ // Check how many arguments were given PyObjectvalues[2]={0,0}; if(__pyx_nargs==2){ values[1]=__pyx_args[1]; values[0]=__pyx_args[0]; }elseif(__pyx_nargs==1){ values[0]=__pyx_args[0]; } // Check if any keyword arguments were given Py_ssize_tkw_args=__Pyx_NumKwargs_FASTCALL(__pyx_kwds); // Match up mix of position/keyword args to parameters if(__pyx_nargs==0){ // ... }elseif(__pyx_nargs==1){ // ... }elseif(__pyx_nargs==2){ // ... }else{ // ... } // Unwrap PyObject* args into C int int__pyx_v_a=PyLong_AsLong(values[0]); // Check for error (unnecessary if we know it's an int) if((__pyx_v_a==(int)-1)&&PyErr_Occurred()){ returnNULL; } int__pyx_v_b=PyLong_AsLong(values[1]); // Check for error (unnecessary if we know it's an int) if((__pyx_v_b==(int)-1)&&PyErr_Occurred()){ returnNULL; } // Call generated C implementation of add returnadd_and_box(__pyx_self,__pyx_v_a,__pyx_v_b); }
Now, to be clear: this is probably the fastest thing possible for interfacing
with CPython. Cython has been worked on for years and years and it’s very
fast. But CPython isn’t the only runtime in town and the other runtimes have
different performance characteristics, as we explored above.
Since so many C extension are generated with Cython, there’s a big opportunity:
if we manage to get the Cython compiler to emit typed metadata for the
functions it compiles, those functions could become much faster under
runtimes such as PyPy.
In order to justify such a code change, we have to see how much faster the
typed metadata makes things. So let’s benchmark.
A small, useless benchmark
Let’s try benchmarking the interpreter interaction with the native module with
a silly benchmark. It’s a little silly because it’s not super common (in use
cases I am familiar with anyway) to call C code in a hot loop like this without
writing the loop in C as well. But it’ll be a good reference for the maximum
amount of performance we can win back.
We’ll try running it with CPython first because CPython doesn’t have this
problem making PyObjects—that is just the default object representation in
the runtime.
Okay so the text output is a little fudged since I actually measured this with
hyperfine, but you get the idea. CPython takes a very respectable 850ms to go
back and forth with C 10 million times.
Now let’s see how PyPy does on time, since it’s doing a lot more work at the
boundary.
Yeah, okay, so all that extra unnecessary work that PyPy does (before our
changes) ends up really adding up. Our benchmark of inc takes three times as
long as CPython. Oof. But this post is all about adding types. What if we add
types to the C module and measure with our changes to PyPy?
static PyMethodDef mytypedmod_methods[] = { - {"inc", inc, METH_O, "Add one to an int"}, + {inc_sig.ml_name, inc, METH_O | METH_TYPED, "Add one to an int"}, {NULL, NULL, 0, NULL}};
168ms! To refresh your memory, that’s 5x faster than CPython and 13x
faster than baseline PyPy. I honestly did not believe my eyes when I saw this
number. And Carl Friedrich and I think there is still room for more
improvements like doing the signature/metadata finding inside the JIT instead
of calling that C function.
This is extraordinarily promising.
But as I said before, most applications don’t consist of a Python program
calling a C function and only a C function in a tight loop. It would be
important to profile how this change affects a representative workload. That
would help motivate the inclusion of these type signatures in a binding
generator such as Cython.
In the meantime, let’s take a look at how the changes look in the PyPy
codebase.
Implementing in PyPy: PyPy internals
PyPy is comprised of two main parts:
A Python interpreter
A tool to transform interpreters into JIT compilers
This means that instead of writing fancy JIT compiler changes to get this to
work, we wrote an interpreter change. Their cpyext (C API) handling code
already contains a little “interpreter” of sorts to make calls to C extensions.
It looks at ml_flags to distinguish between METH_O and METH_FASTCALL, for
example.
So we added a new case that looks like this pseudocode:
diff --git a/tmp/interp.py b/tmp/typed-interp.py
index 900fa9c..b973f13 100644
--- a/tmp/interp.py
+++ b/tmp/typed-interp.py
@@ -1,7 +1,17 @@
def make_c_call(meth, args):
if meth.ml_flags & METH_O:
assert len(args) == 1
+ if meth.ml_flags & METH_TYPED:
+ return handle_meth_typed(meth, args[0])
return handle_meth_o(meth, args[0])
if meth.ml_flags & METH_FASTCALL:
return handle_meth_fastcall(meth, args)
# ...
+
+def handle_meth_typed(meth, arg):
+ sig = call_scary_c_function_to_find_sig(meth)
+ if isinstance(arg, int) and sig.arg_type == int and sig.ret_type == int:
+ unboxed_arg = convert_to_unboxed(arg)
+ unboxed_result = call_f_func(sig.underlying_func, unboxed_arg)
+ return convert_to_boxed(unboxed_result)
+ # ...
Since the JIT probably already knows about the types of the arguments to the C
extension function (and probably has also unboxed them), all of the
intermediate checks and allocation can be elided. This makes for much less
work!
Type information for faster Python C extensions
https://ift.tt/ZT57nBJ
Max Bernstein
PyPy is an alternative implementation of the Python language. PyPy’s C API compatibility layer has some performance issues. Carl Friedrich Bolz-Tereick and I are working on a way to make PyPy’s C API interactions much faster. It’s looking very promising. Here’s a sketch of how it works.
The C API as lingua franca
Python is pretty widely-used. For years, CPython was the only implementation, and CPython was not designed to be fast. The Python community needed some programs to go faster and determined that the best path forward was to write some modules in C and interact with them from Python. Worked just fine.
Then other Python runtimes like PyPy came along. PyPy includes a JIT compiler and its execution of normal Python code is very fast, at least until it hits a call from Python to a C extension function. Then things go a little bit sideways. First and foremost because the PyPy JIT can’t “see into” the native code; it’s generated outside of the JIT by some other compiler and is therefore opaque. And second of all because the binding API for the aforementioned C modules (“The C API”) requires a totally different object and memory representation than PyPy has internally.
PyPy has its own object model, runtime, and moving garbage collector, all to get better performance. Unfortunately, this means that whenever you call a C API function from PyPy, it has to stop what it’s doing, set up some C API scaffolding, do the C API call, and then take down the scaffolding.
For example, the C API is centered around
PyObject
pointers. PyPy does not normally usePyObject
s. It has to allocate aPyObject
, make it point into the PyPy heap, call the C API function, and then (potentially) free thePyObject
. (This ignores GIL stuff and exception checking, which is also an issue.)Fig. 1 - C extensions deal in
PyObject*
so any runtime that wants to interface with them has to also deal inPyObject*
.That’s a lot of overhead. (And there’s more, too. See Antonio Cuni’s excellent blog post.) And it’s a hard problem that has bitten multiple alternative Python runtimes.
In addition to the overhead of boxing into a
PyObject
, the underlying C function that the C extension calls may not even need thePyObject
to exist in the first place. For example, a lot of C API functions are structured like this:In this example, the
PyObject*
codeinc
is only a wrapper around another functioninc_impl
that works directly on C integers.Fig. 2 - The runtimes still have to manufacture
PyObject*
even if the underlying C code doesn’t know anything about Python. The unboxing is an another source of overhead, unfortunately.All of the bits in the middle between the JIT and the C implementation (the entire
inc
function, really) are “wasted work” because the work is not needed for the actual execution of the user’s program.So even if the PyPy JIT is doing great work and has eliminated memory allocation in Python code—PyPy could have unboxed some heap allocated Python integer object into a C long, for example—it still has to heap allocate a
PyObject
for the C API… only to throw it away soon after.If there was a way to communicate to PyPy that
inc
expects along
and is going to unbox it into a Clong
(and will also return a Clong
), it wouldn’t need to do any of these shenanigans.And yes, ideally there wouldn’t be a C API call at all. But sometimes you have to (perhaps because you have no control over the code), and you might as well speed up that call. Let’s see if it can be done.
Potsdam
This is where I come into this. I was at the ECOOP conference in 2022 where Carl Friedrich introduced me to some other implementors of alternative Python runtimes. I got to talk to the authors of PyPy and ZipPy and GraalPython over some coffee and beer. They’re really nice.
They’ve been collectively working on a project called HPy. HPy is a new design for a C API Python that takes alternative runtimes into account. As part of this design, they were investigating a way to pipe type information from C extension modules through the C API and into a place where the host runtime can read it.
It’s a tricky problem because not only is there a C API, but also a C ABI (note the “B” for “binary”). While an API is an abstract contract between caller and callee for how to call a function, an ABI is more concrete. In the case of the C ABI, it means not changing struct layouts or sizes, adding function parameters, things like that. This is kind of a tight constraint and it wasn’t clear what the best backward-compatible way to add type information was.
Sometime either in this meeting or shortly after, I had an idea for how to do it without changing the API or ABI and I decided to take a stab at implementing it for Cinder (the Python runtime I was working on at the time).
The solution: sketchy C things?
In order to better understand the problems, let’s take a look at the kind of type information we want to add. This is the kind of type metadata we want to add to each typed method, represented as a C struct.
In this artificially limited example, we store the type information for one argument (but more could be added in the future), the type information for the return value, and the underlying (non-
PyObject*
) C function pointer.But it’s not clear where to put that in a
PyMethodDef
. The existingPyMethodDef
struct looks like this. It contains a little bit of metadata and a C function pointer (thePyObject*
one). In an ideal world, we would “just” add the type metadata to this struct and be done with it. But we can’t change its size for ABI reasons.What to do? Well, I decided to get a little weird with it and see if we could sneak in a pointer to the metadata somehow. My original idea was to put the entire
PyPyTypedMethodMetadata
struct behind thePyMethodDef
struct (kind of like howmalloc
works), but that wouldn’t work so well:PyMethodDef
s are commonly statically allocated in arrays, and we can’t change the layout of those arrays.But what we can do is point the
ml_name
field to a buffer inside another struct.Then, when we notice that a method is marked as typed (with a new
METH_TYPED
flag we can add to theml_flags
bitset), we can read backwards to find thePyPyTypedMethodMetadata
struct. Here’s how you might do that in C:And here’s a diagram to illustrate this because it’s really weird and confusing.
I started off with a mock implementation of this in C (no Python C API, just fake structures to sketch it out) and it worked. So I implemented a hacky version of it in Cinder, but never shipped it because my integration with Cinder was a little too hacky. I wrote up the ideas for posterity in case someone wanted to take up the project.
A year later, nobody else had, so I decided to poke Carl Friedrich and see if we could implement it in PyPy. We’ll see how that implementation looks in a minute. But first, an aside on where C extensions come from.
Where do all the C extensions come from?
Well, in PyPy, there are none in the standard library. PyPy has been almost entirely written in Python so that the code is visible to the JIT. But people like using Python packages, and some Python packages contain C extensions.
There are a couple of different ways to write a C extension. The “simplest” (as in, all the components are visible and there is no magic and there are no external dependencies) is to hand-write it. If you don’t want to do that, you can also use a binding generator to write the glue code for you. I have the most experience with Cython, but other binding generators like nanobind, pybind11, and even CPython’s own Argument Clinic exist too!
Hand-written
Let’s recall the
inc
/inc_impl
function from earlier. That’s a reasonable example of a function that could be integrated as a hand-written C extension to Python. In order to make it callable from Python, we have to make a full C extension module. In this case, that’s just a list of function pointers and how to call them.We have an array of
PyMethodDef
structs, one for each method we want to wrap. Then we have aPyModuleDef
to define the module, which can also contain attributes and some other stuff. Then we provide a sort of__new__
function for the module, in the form of aPyInit_
function. This is found bydlopen
in the C extension loader built into Python.It’s possible to manually augment this module by adding a
PyPyTypedMethodMetadata
struct and aMETH_TYPED
flag. It’s a little cumbersome, but if it speeds up interactions with the module… well, extension authors might be cajoled into adding the type information or at least accepting a pull request.But not all extensions are hand-written. Many are generated by binding generators like Cython. And Cython is interesting because it can generate the type signatures automatically…
Cython
Unlike many other binding generators for Python, Cython provides a fully-featured Python-like programming language that compiles to C. The types obey different rules than in normal Python code and can be used for optimization. Cython also has primitive types. Let’s see an example.
In this snippet of Cython code, we make a function that adds two machine integers and returns a machine integer.
Cython will generate a very fast C function that adds two machine integers. Calls to this from Cython are type checked at compile time and will be as fast as your C compiler allows:
Since we used
cpdef
instead ofcdef
, Cython will also generate a wrapper C extension function so that this function can be called from Python.This means that the generated Cython wrapper code looks like (a much uglier version of) below. You don’t need to understand or really even read the big blob of cleaned-up and annotated generated code below. You just need to say “ooh” and “aah” and “wow, so many if-statements and so much allocation and so many function calls.”
And it’s also a little worse than the
METH_O
example above since it has to unwrap an array of fastcall args and do some argument processing.Now, to be clear: this is probably the fastest thing possible for interfacing with CPython. Cython has been worked on for years and years and it’s very fast. But CPython isn’t the only runtime in town and the other runtimes have different performance characteristics, as we explored above.
Since so many C extension are generated with Cython, there’s a big opportunity: if we manage to get the Cython compiler to emit typed metadata for the functions it compiles, those functions could become much faster under runtimes such as PyPy.
In order to justify such a code change, we have to see how much faster the typed metadata makes things. So let’s benchmark.
A small, useless benchmark
Let’s try benchmarking the interpreter interaction with the native module with a silly benchmark. It’s a little silly because it’s not super common (in use cases I am familiar with anyway) to call C code in a hot loop like this without writing the loop in C as well. But it’ll be a good reference for the maximum amount of performance we can win back.
We’ll try running it with CPython first because CPython doesn’t have this problem making
PyObject
s—that is just the default object representation in the runtime.Okay so the text output is a little fudged since I actually measured this with
hyperfine
, but you get the idea. CPython takes a very respectable 850ms to go back and forth with C 10 million times.Now let’s see how PyPy does on time, since it’s doing a lot more work at the boundary.
Yeah, okay, so all that extra unnecessary work that PyPy does (before our changes) ends up really adding up. Our benchmark of
inc
takes three times as long as CPython. Oof. But this post is all about adding types. What if we add types to the C module and measure with our changes to PyPy?Here are the changes to the C module:
And now let’s run it with our new patched PyPy:
168ms! To refresh your memory, that’s 5x faster than CPython and 13x faster than baseline PyPy. I honestly did not believe my eyes when I saw this number. And Carl Friedrich and I think there is still room for more improvements like doing the signature/metadata finding inside the JIT instead of calling that C function.
This is extraordinarily promising.
But as I said before, most applications don’t consist of a Python program calling a C function and only a C function in a tight loop. It would be important to profile how this change affects a representative workload. That would help motivate the inclusion of these type signatures in a binding generator such as Cython.
In the meantime, let’s take a look at how the changes look in the PyPy codebase.
Implementing in PyPy: PyPy internals
PyPy is comprised of two main parts:
This means that instead of writing fancy JIT compiler changes to get this to work, we wrote an interpreter change. Their
cpyext
(C API) handling code already contains a little “interpreter” of sorts to make calls to C extensions. It looks atml_flags
to distinguish betweenMETH_O
andMETH_FASTCALL
, for example.So we added a new case that looks like this pseudocode:
Since the JIT probably already knows about the types of the arguments to the C extension function (and probably has also unboxed them), all of the intermediate checks and allocation can be elided. This makes for much less work!
To check out the actual changes to PyPy, look at this stack of commits.
Next steps
This project isn’t merged or finished. While we have a nice little test suite and a microbenchmark, ideally we would do some more:
Let us know if you have any ideas!
via Max Bernstein
January 15, 2024 at 09:41AM
The text was updated successfully, but these errors were encountered: