-
-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use PyObject_Vectorcall in rect #3048
Use PyObject_Vectorcall in rect #3048
Conversation
This speeds up Rect.collideobjects and Rect.collideobjectsall. See https://docs.python.org/3/c-api/call.html#c.PyObject_Vectorcall This function is new in 3.9, but we can support it on lower versions using the recently vendored pythoncapi-compat header.
I believe #3023 already covers the performance gain that can be achieved with this PR, while also using nicer API (imo). Even if we need to support limited API in the future, it is trivial to slap in something like |
So I'm annoyed that the Python docs didn't mention that. Yeah I check the source and yeah CallOneArg uses vectorcall. You mentioned performance as an afterthought in that PR but didn't put much weight on it. This PR at least gives you a benchmark of performance gain from your PR in one area. CallOneArg: PyObject *
PyObject_CallOneArg(PyObject *func, PyObject *arg)
{
EVAL_CALL_STAT_INC_IF_FUNCTION(EVAL_CALL_API, func);
assert(arg != NULL);
PyObject *_args[2];
PyObject **args = _args + 1; // For PY_VECTORCALL_ARGUMENTS_OFFSET
args[0] = arg;
PyThreadState *tstate = _PyThreadState_GET();
size_t nargsf = 1 | PY_VECTORCALL_ARGUMENTS_OFFSET;
return _PyObject_VectorcallTstate(tstate, func, args, nargsf, NULL);
} Vectorcall: PyObject *
PyObject_Vectorcall(PyObject *callable, PyObject *const *args,
size_t nargsf, PyObject *kwnames)
{
PyThreadState *tstate = _PyThreadState_GET();
return _PyObject_VectorcallTstate(tstate, callable,
args, nargsf, kwnames);
} Vectorcall is a tiny bit less indirect and so would likely be faster, but probably not measurably. (I don't think PY_VECTORCALL_ARGUMENTS_OFFSET will help our use case here, we call full functions and not bound methods, which was the example given in the pep about this optimization). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am approving this PR because I don't have any problem with it per se.
However I still think that my PR is also mergeable and can supersede this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I guess it's the same if we merge this or the other, as any change needed will be minimal either way. I got these results on my laptop (about 20% faster).
CURRENT MAIN
0.33377814292907715
0.32143235206604004
NEW
0.2770962715148926
0.26911282539367676
This speeds up Rect.collideobjects and Rect.collideobjectsall.
See https://docs.python.org/3/c-api/call.html#c.PyObject_Vectorcall
This function is new in Python 3.9, but we get support for it on lower versions automatically using the recently vendored pythoncapi-compat header.
Performance testing script:
I see collideobjectsall going from 0.32 seconds to 0.25 seconds (22% improvement), and collideobjects going from 0.26 seconds to 0.22 seconds (15% improvement).
That's a pretty good improvement given this PR only changes one C-API function call!