Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PyObject_Vectorcall in rect #3048

Merged
merged 1 commit into from
Aug 18, 2024

Conversation

Starbuck5
Copy link
Member

@Starbuck5 Starbuck5 commented Aug 11, 2024

This speeds up Rect.collideobjects and Rect.collideobjectsall.

See https://docs.python.org/3/c-api/call.html#c.PyObject_Vectorcall

This function is new in Python 3.9, but we get support for it on lower versions automatically using the recently vendored pythoncapi-compat header.

Performance testing script:

from pygame import Rect
import random
import time

random.seed(36)


class Obj:
    def __init__(self, x, y, w, h):
        self.xa = Rect(x, y, w, h)


r = Rect(-20, -20, 100, 100)
objs = [
    Obj(
        random.randint(-100, -100),
        random.randint(-100, 100),
        random.randint(-100, 100),
        random.randint(-100, 100),
    )
    for _ in range(5000)
]

start = time.time()

for _ in range(1000):
    colliding_objs = r.collideobjectsall(objs, key=lambda e: e.xa)

print(time.time() - start)
print(len(colliding_objs))

# Sort list so rects that actually collide are at the end of the list,
# so collideobjects below takes significant time.
objs.sort(key=lambda e: int(r.colliderect(e.xa)))

start = time.time()

for _ in range(1000):
    colliding_obj = r.collideobjects(objs, key=lambda e: e.xa)

print(time.time() - start)
print(colliding_obj.xa)

I see collideobjectsall going from 0.32 seconds to 0.25 seconds (22% improvement), and collideobjects going from 0.26 seconds to 0.22 seconds (15% improvement).

That's a pretty good improvement given this PR only changes one C-API function call!

This speeds up Rect.collideobjects and Rect.collideobjectsall.

See https://docs.python.org/3/c-api/call.html#c.PyObject_Vectorcall

This function is new in 3.9, but we can support it on lower versions using the recently vendored pythoncapi-compat header.
@Starbuck5 Starbuck5 requested a review from a team as a code owner August 11, 2024 07:46
@Starbuck5 Starbuck5 added rect pygame.rect Performance Related to the speed or resource usage of the project labels Aug 11, 2024
@ankith26
Copy link
Member

I believe #3023 already covers the performance gain that can be achieved with this PR, while also using nicer API (imo).

Even if we need to support limited API in the future, it is trivial to slap in something like #define PyObject_CallOneArg(self, obj) PyObject_Vectorcall((self), &(obj), 1, NULL) in one of the headers

@Starbuck5
Copy link
Member Author

So I'm annoyed that the Python docs didn't mention that. Yeah I check the source and yeah CallOneArg uses vectorcall.

You mentioned performance as an afterthought in that PR but didn't put much weight on it. This PR at least gives you a benchmark of performance gain from your PR in one area.

CallOneArg:

PyObject *
PyObject_CallOneArg(PyObject *func, PyObject *arg)
{
    EVAL_CALL_STAT_INC_IF_FUNCTION(EVAL_CALL_API, func);
    assert(arg != NULL);
    PyObject *_args[2];
    PyObject **args = _args + 1;  // For PY_VECTORCALL_ARGUMENTS_OFFSET
    args[0] = arg;
    PyThreadState *tstate = _PyThreadState_GET();
    size_t nargsf = 1 | PY_VECTORCALL_ARGUMENTS_OFFSET;
    return _PyObject_VectorcallTstate(tstate, func, args, nargsf, NULL);
}

Vectorcall:

PyObject *
PyObject_Vectorcall(PyObject *callable, PyObject *const *args,
                     size_t nargsf, PyObject *kwnames)
{
    PyThreadState *tstate = _PyThreadState_GET();
    return _PyObject_VectorcallTstate(tstate, callable,
                                      args, nargsf, kwnames);
}

Vectorcall is a tiny bit less indirect and so would likely be faster, but probably not measurably. (I don't think PY_VECTORCALL_ARGUMENTS_OFFSET will help our use case here, we call full functions and not bound methods, which was the example given in the pep about this optimization).

Copy link
Member

@ankith26 ankith26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving this PR because I don't have any problem with it per se.

However I still think that my PR is also mergeable and can supersede this PR.

Copy link
Member

@itzpr3d4t0r itzpr3d4t0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I guess it's the same if we merge this or the other, as any change needed will be minimal either way. I got these results on my laptop (about 20% faster).

CURRENT MAIN
0.33377814292907715
0.32143235206604004

NEW
0.2770962715148926
0.26911282539367676

@itzpr3d4t0r itzpr3d4t0r added this to the 2.5.2 milestone Aug 18, 2024
@itzpr3d4t0r itzpr3d4t0r merged commit 201f934 into pygame-community:main Aug 18, 2024
25 checks passed
@Starbuck5 Starbuck5 deleted the use-vectorcall-rect branch August 29, 2024 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Related to the speed or resource usage of the project rect pygame.rect
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants