Use PyObject_Vectorcall in rect #3048

Starbuck5 · 2024-08-11T07:46:06Z

This speeds up Rect.collideobjects and Rect.collideobjectsall.

See https://docs.python.org/3/c-api/call.html#c.PyObject_Vectorcall

This function is new in Python 3.9, but we get support for it on lower versions automatically using the recently vendored pythoncapi-compat header.

Performance testing script:

from pygame import Rect
import random
import time

random.seed(36)


class Obj:
    def __init__(self, x, y, w, h):
        self.xa = Rect(x, y, w, h)


r = Rect(-20, -20, 100, 100)
objs = [
    Obj(
        random.randint(-100, -100),
        random.randint(-100, 100),
        random.randint(-100, 100),
        random.randint(-100, 100),
    )
    for _ in range(5000)
]

start = time.time()

for _ in range(1000):
    colliding_objs = r.collideobjectsall(objs, key=lambda e: e.xa)

print(time.time() - start)
print(len(colliding_objs))

# Sort list so rects that actually collide are at the end of the list,
# so collideobjects below takes significant time.
objs.sort(key=lambda e: int(r.colliderect(e.xa)))

start = time.time()

for _ in range(1000):
    colliding_obj = r.collideobjects(objs, key=lambda e: e.xa)

print(time.time() - start)
print(colliding_obj.xa)

I see collideobjectsall going from 0.32 seconds to 0.25 seconds (22% improvement), and collideobjects going from 0.26 seconds to 0.22 seconds (15% improvement).

That's a pretty good improvement given this PR only changes one C-API function call!

This speeds up Rect.collideobjects and Rect.collideobjectsall. See https://docs.python.org/3/c-api/call.html#c.PyObject_Vectorcall This function is new in 3.9, but we can support it on lower versions using the recently vendored pythoncapi-compat header.

ankith26 · 2024-08-12T13:41:05Z

I believe #3023 already covers the performance gain that can be achieved with this PR, while also using nicer API (imo).

Even if we need to support limited API in the future, it is trivial to slap in something like #define PyObject_CallOneArg(self, obj) PyObject_Vectorcall((self), &(obj), 1, NULL) in one of the headers

Starbuck5 · 2024-08-13T06:18:09Z

So I'm annoyed that the Python docs didn't mention that. Yeah I check the source and yeah CallOneArg uses vectorcall.

You mentioned performance as an afterthought in that PR but didn't put much weight on it. This PR at least gives you a benchmark of performance gain from your PR in one area.

CallOneArg:

PyObject *
PyObject_CallOneArg(PyObject *func, PyObject *arg)
{
    EVAL_CALL_STAT_INC_IF_FUNCTION(EVAL_CALL_API, func);
    assert(arg != NULL);
    PyObject *_args[2];
    PyObject **args = _args + 1;  // For PY_VECTORCALL_ARGUMENTS_OFFSET
    args[0] = arg;
    PyThreadState *tstate = _PyThreadState_GET();
    size_t nargsf = 1 | PY_VECTORCALL_ARGUMENTS_OFFSET;
    return _PyObject_VectorcallTstate(tstate, func, args, nargsf, NULL);
}

Vectorcall:

PyObject *
PyObject_Vectorcall(PyObject *callable, PyObject *const *args,
                     size_t nargsf, PyObject *kwnames)
{
    PyThreadState *tstate = _PyThreadState_GET();
    return _PyObject_VectorcallTstate(tstate, callable,
                                      args, nargsf, kwnames);
}

Vectorcall is a tiny bit less indirect and so would likely be faster, but probably not measurably. (I don't think PY_VECTORCALL_ARGUMENTS_OFFSET will help our use case here, we call full functions and not bound methods, which was the example given in the pep about this optimization).

ankith26

I am approving this PR because I don't have any problem with it per se.

However I still think that my PR is also mergeable and can supersede this PR.

itzpr3d4t0r

LGTM, I guess it's the same if we merge this or the other, as any change needed will be minimal either way. I got these results on my laptop (about 20% faster).

CURRENT MAIN
0.33377814292907715
0.32143235206604004

NEW
0.2770962715148926
0.26911282539367676

Use PyObject_Vectorcall in rect

a761a11

This speeds up Rect.collideobjects and Rect.collideobjectsall. See https://docs.python.org/3/c-api/call.html#c.PyObject_Vectorcall This function is new in 3.9, but we can support it on lower versions using the recently vendored pythoncapi-compat header.

Starbuck5 requested a review from a team as a code owner August 11, 2024 07:46

Starbuck5 added rect pygame.rect Performance Related to the speed or resource usage of the project labels Aug 11, 2024

Starbuck5 mentioned this pull request Aug 11, 2024

Change/Expand Rect.collideobjectsall for optimization #2787

Closed

ankith26 approved these changes Aug 14, 2024

View reviewed changes

itzpr3d4t0r approved these changes Aug 18, 2024

View reviewed changes

itzpr3d4t0r added this to the 2.5.2 milestone Aug 18, 2024

itzpr3d4t0r merged commit 201f934 into pygame-community:main Aug 18, 2024
25 checks passed

Starbuck5 deleted the use-vectorcall-rect branch August 29, 2024 06:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use PyObject_Vectorcall in rect #3048

Use PyObject_Vectorcall in rect #3048

Starbuck5 commented Aug 11, 2024 •

edited

Loading

ankith26 commented Aug 12, 2024

Starbuck5 commented Aug 13, 2024

ankith26 left a comment

itzpr3d4t0r left a comment

Use PyObject_Vectorcall in rect #3048

Use PyObject_Vectorcall in rect #3048

Conversation

Starbuck5 commented Aug 11, 2024 • edited Loading

ankith26 commented Aug 12, 2024

Starbuck5 commented Aug 13, 2024

ankith26 left a comment

Choose a reason for hiding this comment

itzpr3d4t0r left a comment

Choose a reason for hiding this comment

Starbuck5 commented Aug 11, 2024 •

edited

Loading