Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change/Expand Rect.collideobjectsall for optimization #2787

Closed
itzpr3d4t0r opened this issue Apr 2, 2024 · 9 comments
Closed

Change/Expand Rect.collideobjectsall for optimization #2787

itzpr3d4t0r opened this issue Apr 2, 2024 · 9 comments
Labels
Performance Related to the speed or resource usage of the project rect pygame.rect

Comments

@itzpr3d4t0r
Copy link
Member

itzpr3d4t0r commented Apr 2, 2024

The current Rect.collideobjectsall() method is slow, like really slow, especially when compared with all the other multicollision methods:
image

After analysis, it became evident that the bulk of the runtime is attributed to parsing and invoking the key argument, which only accepts a callable. Strikingly, this is also the slowest scenario:

from pygame import Rect

class Obj:
    def __init__(self, x, y, w, h):
        self.xa = Rect(x, y, w, h)

r = Rect(0, 0, 100, 100)
objs = [
   Obj(
            randint(-100, -100),
            randint(-100, 100),
            randint(-100, 100),
            randint(-100, 100),
        )
        for _ in range(100)
]

colliding_objs = r.collideobjectsall(objs, key=lambda e: e.xa)

Surprisingly, invoking a lambda function to access an object's attribute in C is markedly slower than passing the attribute's name as a string directly and retrieving the attribute.

Hence, I propose either of the following solutions:

  • Behaviour Change: Expand the key keyword argument to accept strings as well, enabling direct attribute retrieval in such instances.
  • Additional Behaviour: Introduce an extra keyword argument, from_attr, which exclusively accepts strings for direct attribute retrieval.

In practice, this would allow the following alternatives:

colliding_objs = r.collideobjectsall(objs, key='xa')
# OR
colliding_objs = r.collideobjectsall(objs, from_attr='xa')

Either of these would work and make it possible to massively optimize the algorithm. From my testing this change alone brings a 190% improvement on average:
image

PS. this is also true for .collideobjects() since it shares most of this function's behaviour.

@itzpr3d4t0r itzpr3d4t0r added Performance Related to the speed or resource usage of the project rect pygame.rect labels Apr 2, 2024
@ankith26
Copy link
Member

ankith26 commented Apr 2, 2024

Behaviour Change: Expand the key keyword argument to accept strings as well, enabling direct attribute retrieval in such instances.

I would be okay with this, as it wouldn't break existing code. Callables should continue to be handled the way they are, we are just adding an extra string handle path.

@Starbuck5
Copy link
Member

I'm skeptical of adding API surface like this for optimization. It's more branches, more code to test, more to maintain, and it provides no benefit to existing users. Especially since this is a niche function.

Plus, I've found a far easier solution--

Rewrite it in Python.

from pygame import Rect
import random
import time

random.seed(36)

use_python = False

class Obj:
    def __init__(self, x, y, w, h):
        self.xa = Rect(x, y, w, h)

r = Rect(-20, -20, 100, 100)
objs = [
   Obj(
            random.randint(-100, -100),
            random.randint(-100, 100),
            random.randint(-100, 100),
            random.randint(-100, 100),
        )
        for _ in range(5000)
]

start = time.time()

if use_python:
    for _ in range(1000):
        colliding_indices = r.collidelistall([obj.xa for obj in objs])
        colliding_objs = [objs[colliding_index] for colliding_index in colliding_indices]

else:
    for _ in range(1000):
        colliding_objs = r.collideobjectsall(objs, key=lambda e: e.xa)

print(time.time() - start)

use_python = True does the same thing as collideobjectsall but is more than twice as fast. 0.14 seconds versus 0.34 seconds, Python 3.12.

@itzpr3d4t0r
Copy link
Member Author

itzpr3d4t0r commented Apr 3, 2024

Some thoughts:

  • Now that we added it in the C api why leave it a half baked and honestly bad implementation
  • People like fast stuff
  • People shouldn't have to circumvent our C limitations (that we put in place by implementing it) with a version dependant python function
  • It would have benefit with existing users since I would also implement faster looping for both cases
  • There is almost no cost test wise, just a handful of more lines.
  • Python 3.12 is cool but we can't expect people to both use it and implement our functions in python just because they're bad, since it would also imply people benchmarking code that 99% of people don't do at this scale.
  • I've seen projects like Isometria use this function in many places so we can't say it's that nieche... at least one big project uses it.

@itzpr3d4t0r
Copy link
Member Author

use_python = True does the same thing as collideobjectsall but is more than twice as fast. 0.14 seconds versus 0.34 seconds, Python 3.12.

Your program on my machine (Python 3.12):

  • collideobjectsall with lambda: 0.3343813419342041 s
  • python way: 0.1553936004638672 s
  • collideobjectsall with string: 0.0875248908996582 s

So my proposed change is still 77.5% faster than the python alternative while not requiring more python.

@Starbuck5
Copy link
Member

You don’t get it? We should replace our implementation of this function with a Python one. There’s no reason our implementation has to be written in C. Might be a little tricky, but we’ve done a couple similar things before.

This improves our implementation’s speed for existing code by taking a complex C function and rewriting it in 2 lines of Python. This would be a huge win.

It’s faster on 3.9 too, but it’s more faster on 3.12. And maybe even better with future versions!

@itzpr3d4t0r
Copy link
Member Author

Alright, Hopefully someone will do this eventually. Actually your idea of using collidelistall will get even faster with my PR: #2786 so the future is bright for a python implementation. Now that we're on this, should i remove the optimization for collideobjects/collideobjectsall in this PR or no? I guess a ltl speedup with no drawbacks isn't really an issue but i'm not rly attacched to it or anything so it's whatever.

@Starbuck5
Copy link
Member

Hmm, my example was unrealistic because I didn't show how it could be made compatible with the existing arguments. It didn't use the lambda function.

This is more realistic, but doesn't show nearly as good of a speedup:

from pygame import Rect
import random
import time

random.seed(36)

use_python = True

class Obj:
    def __init__(self, x, y, w, h):
        self.xa = Rect(x, y, w, h)

r = Rect(-20, -20, 100, 100)
objs = [
   Obj(
            random.randint(-100, -100),
            random.randint(-100, 100),
            random.randint(-100, 100),
            random.randint(-100, 100),
        )
        for _ in range(5000)
]

start = time.time()

def collideobjectsall(objects, key):
    colliding_indices = r.collidelistall([key(obj) for obj in objects])
    return [objects[colliding_index] for colliding_index in colliding_indices]

if use_python:
    for _ in range(1000):
        collideobjectsall(objs, key=lambda e: e.xa)

else:
    for _ in range(1000):
        colliding_objs = r.collideobjectsall(objs, key=lambda e: e.xa)

print(time.time() - start)

It is still faster to do it in Python in Python 3.12, but not in Python 3.9.

Python 3.9 "use_python"            0.36 s
Python 3.9 C implementation        0.31 s
Python 3.12 "use_python"           0.26 s
Python 3.12 C implementation       0.34 s

Also good point these will all be sped up with your optimization PR, even the Python version.

I'm bummed I didn't test it correctly and it's not as good as I thought, but I still think it's a good opportunity to pursue.

@itzpr3d4t0r
Copy link
Member Author

I mean I guess we could drop the python 3.12 version down to about 0.15 with the optimization but if optimizing C still gives us 0.08s of runtime AND it's python version independent then i think it's better to bite the bullet and implement the C version. I still think you're making it look worse than it actually is on the C side but these are my 2 cents anyway.

@Starbuck5
Copy link
Member

I've managed to get a good improvement inside the current status quo here by using vectorcall: #3048

@itzpr3d4t0r itzpr3d4t0r closed this as not planned Won't fix, can't repro, duplicate, stale Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Related to the speed or resource usage of the project rect pygame.rect
Projects
None yet
Development

No branches or pull requests

3 participants