Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize vector parsing in math.c #2443

Merged

Conversation

Starbuck5
Copy link
Member

@Starbuck5 Starbuck5 commented Sep 10, 2023

Strategy-- The vector compatible check and sequence_vectorcoords functions duplicated some checks, so a unified function could optimize the overall runtime.

Conclusions: I had hoped to optimize vector_generic_math, but the results are mixed as you can see below. The runs are too variable for me to confidently say it's a speedup or slowdown in that area. However, this is definitely a speedup for Vector methods that take in and process vectors, especially when called with sequences instead of other vectors. Vector methods are now 7% faster on average when taking vector arguments, 18% faster when taking sequence arguments. Vector methods are now 5.5% faster on average when taking vector arguments, 20% faster when taking sequence arguments.

There are more instances of pgVectorCompatible_Check + PySequence_AsVectorCoords coupled in the code, but in harder to test areas, like in the elementwise proxy, so I'm leaving those be for now.

Benchmarking results (several runs before/after averaged):

vec2_1.move_towards(vec2_2, 4): 0.8714175000000001 (3.349% faster)
vec2_1.move_towards(vec2like, 4): 1.06382 (12.487% faster)
vec2_1.move_towards_ip(vec2_2, 4): 0.7681425000000001 (3.353% faster)
vec2_1.move_towards_ip(vec2like, 4): 0.89942 (17.049% faster)
vec3_1.move_towards(vec3_2, 4): 0.9133575 (0.082% faster)
vec3_1.move_towards(vec3like, 4): 1.0817775 (16.579% faster)
vec3_1.move_towards_ip(vec3_2, 4): 0.7777825 (4.29% faster)
vec3_1.move_towards_ip(vec3like, 4): 0.9343374999999999 (20.735% faster)
vec2_1.cross(vec2_2): 0.4231025 (9.083% faster)
vec2_1.cross(vec2like): 0.6134875 (21.03% faster)
vec3_1.cross(vec3_2): 0.488765 (6.727% faster)
vec3_1.cross(vec3like): 0.7261725 (23.767% faster)
vec2_1.angle_to(vec2_2): 0.5835575 (8.587% faster)
vec2_1.angle_to(vec2like): 0.7513475 (22.943% faster)
vec3_1.angle_to(vec3_2): 0.5254425 (14.304% faster)
vec3_1.angle_to(vec3like): 0.7385475 (23.836% faster)
vec3_1.rotate(34, vec3_2): 0.9605474999999999 (1.951% faster)
vec3_1.rotate_ip(34, vec3_2): 0.85734 (3.164% faster)
vec2_1 + vec2_2: 0.27509249999999996 (3.111% faster)
vec2_1 + vec2like: 0.477885 (25.886% faster)
vec2_1 * vec2_2: 0.2107525 (2.365% faster)
vec2_1 * vec2like: 0.4136725 (26.987% faster)
vec2_1 * 2.3: 0.356165 (-9.052% faster)
vec3_1 + vec3_2: 0.3149825 (2.445% faster)
vec3_1 + vec3like: 0.52228 (30.443% faster)
vec3_1 * vec3_2: 0.245065 (8.35% faster)
vec3_1 * vec3like: 0.4576 (31.846% faster)
vec3_1 * 2.3: 0.37668 (-7.955% faster)

Average change = 11.705 %
Benchmarking script
import timeit

import pygame

vec2_1 = pygame.Vector2(36,6.4)
vec2_2 = pygame.Vector2(4,5)
vec2like = (50, 60)

vec3_1 = pygame.Vector3(36,6.4,-99)
vec3_2 = pygame.Vector3(4,5,56)
vec3like = (50, 60, 20.4)

def bench(statement, globals=globals(), number=5000000):
    return round(timeit.timeit(statement, globals=globals, number=number), 5)

# Put previous output results here to automatically generate comparison
prevresults = []

thisresults = []

thischanges = []

def printbench(statement):
    v2 = bench(statement)
    thisresults.append(v2)

    if prevresults:
        v1 = prevresults[len(thisresults)-1]
        change = -round((v2 - v1) / abs(v1) * 100, 3)
        thischanges.append(change)
        print(f"{statement}: {v2} ({change}% faster)")
    else:
        print(f"{statement}: {v2}")

printbench("vec2_1.move_towards(vec2_2, 4)")
printbench("vec2_1.move_towards(vec2like, 4)")
printbench("vec2_1.move_towards_ip(vec2_2, 4)")
printbench("vec2_1.move_towards_ip(vec2like, 4)")

printbench("vec3_1.move_towards(vec3_2, 4)")
printbench("vec3_1.move_towards(vec3like, 4)")
printbench("vec3_1.move_towards_ip(vec3_2, 4)")
printbench("vec3_1.move_towards_ip(vec3like, 4)")

printbench("vec2_1.cross(vec2_2)")
printbench("vec2_1.cross(vec2like)")

printbench("vec3_1.cross(vec3_2)")
printbench("vec3_1.cross(vec3like)")

printbench("vec2_1.angle_to(vec2_2)")
printbench("vec2_1.angle_to(vec2like)")

printbench("vec3_1.angle_to(vec3_2)")
printbench("vec3_1.angle_to(vec3like)")

printbench("vec3_1.rotate(34, vec3_2)")
printbench("vec3_1.rotate_ip(34, vec3_2)")

printbench("vec2_1 + vec2_2")
printbench("vec2_1 + vec2like")
printbench("vec2_1 * vec2_2")
printbench("vec2_1 * vec2like")
printbench("vec2_1 * 2.3")

printbench("vec3_1 + vec3_2")
printbench("vec3_1 + vec3like")
printbench("vec3_1 * vec3_2")
printbench("vec3_1 * vec3like")
printbench("vec3_1 * 2.3")

if thischanges:
    print()
    print("Average change =", round(sum(thischanges)/len(thischanges), 3), "%")
    print()

print(thisresults)

@Starbuck5 Starbuck5 requested a review from a team as a code owner September 10, 2023 02:25
@Starbuck5 Starbuck5 added Performance Related to the speed or resource usage of the project math pygame.math labels Sep 10, 2023
src_c/math.c Outdated Show resolved Hide resolved
src_c/math.c Show resolved Hide resolved
src_c/math.c Outdated Show resolved Hide resolved
src_c/math.c Show resolved Hide resolved
It combines the functionality of pgVectorCompatible_Check and PySequence_AsVectorCoords, which optimizes the overall runtime because those two functions have duplicate checks.
@Starbuck5 Starbuck5 force-pushed the Add-unified-vector-check-and-get branch from 9ab246c to f6e5d4c Compare September 11, 2023 03:27
Copy link
Member

@itzpr3d4t0r itzpr3d4t0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR. I'm sure we can build on it further in the future, the whole module is full of places we can work on.
I'd like to suggest you use mean(timeit.repeat(...)) with a high repeat number for benchmarking instead of a single timeit that accumulates runtimes. This is because it yields smoother and better results overall since outliers won't count towards increasing the times in a suboptimal way.

And a side note.
Generally speaking, the idea of having 2 pointers, one with a stack allocated array in the case of a sequence being passed and a single pointer that switches between the array and an actual vector's coords memory address is a valid optimization across the whole module. It avoids having to use memcpy in the vector case, which is probably what matters most.

src_c/math.c Outdated Show resolved Hide resolved
src_c/math.c Show resolved Hide resolved
src_c/math.c Show resolved Hide resolved
src_c/math.c Show resolved Hide resolved
@Starbuck5
Copy link
Member Author

I'd like to suggest you use mean(timeit.repeat(...)) with a high repeat number for benchmarking instead of a single timeit that accumulates runtimes. This is because it yields smoother and better results overall since outliers won't count towards increasing the times in a suboptimal way.

I swapped out my timeit.timeit call with min(timeit.repeat(statement, globals=globals, repeat=1000, number=1000)) * 1000 and got the following results:

Results
vec2_1.move_towards(vec2_2, 4): 0.16205 (6.774% faster)
vec2_1.move_towards(vec2like, 4): 0.1976 (14.809% faster)
vec2_1.move_towards_ip(vec2_2, 4): 0.14465 (6.269% faster)
vec2_1.move_towards_ip(vec2like, 4): 0.1703 (19.088% faster)
vec3_1.move_towards(vec3_2, 4): 0.165325 (7.212% faster)
vec3_1.move_towards(vec3like, 4): 0.20437500000000003 (18.987% faster)
vec3_1.move_towards_ip(vec3_2, 4): 0.145225 (9.658% faster)
vec3_1.move_towards_ip(vec3like, 4): 0.18039999999999998 (22.124% faster)
vec2_1.cross(vec2_2): 0.080625 (12.029% faster)
vec2_1.cross(vec2like): 0.1172 (22.653% faster)
vec3_1.cross(vec3_2): 0.09242500000000001 (11.555% faster)
vec3_1.cross(vec3like): 0.1359 (27.034% faster)
vec2_1.angle_to(vec2_2): 0.1089 (13.037% faster)
vec2_1.angle_to(vec2like): 0.144625 (23.872% faster)
vec3_1.angle_to(vec3_2): 0.10072500000000001 (16.428% faster)
vec3_1.angle_to(vec3like): 0.141625 (24.285% faster)
vec3_1.rotate(34, vec3_2): 0.1841 (5.674% faster)
vec3_1.rotate_ip(34, vec3_2): 0.16610000000000003 (5.504% faster)
vec2_1 + vec2_2: 0.05325 (2.473% faster)
vec2_1 + vec2like: 0.093575 (21.349% faster)
vec2_1 * vec2_2: 0.041475 (4.983% faster)
vec2_1 * vec2like: 0.08097499999999999 (24.797% faster)
vec2_1 * 2.3: 0.07072500000000001 (-12.934% faster)
vec3_1 + vec3_2: 0.053875 (18.556% faster)
vec3_1 + vec3like: 0.097925 (31.521% faster)
vec3_1 * vec3_2: 0.042475 (24.455% faster)
vec3_1 * vec3like: 0.08814999999999999 (32.94% faster)
vec3_1 * 2.3: 0.06772500000000001 (1.24% faster)

Average change = 14.87 %

I'm happy to see it shows a higher speedup, with a +14.87% average now.

@Starbuck5
Copy link
Member Author

@itzpr3d4t0r

And a side note.
Generally speaking, the idea of having 2 pointers, one with a stack allocated array in the case of a sequence being passed and a single pointer that switches between the array and an actual vector's coords memory address is a valid optimization across the whole module. It avoids having to use memcpy in the vector case, which is probably what matters most.

You've suggested this in 4 separate comments on this PR, I replied when you brought it up in Ankith's thread: #2443 (comment). My initial testing actually found the code was slower without the memcpy, somehow-- maybe the compiler is optimizing a fixed side memcpy away? Assuming this PR gets merged, you can try this yourself and maybe show a speedup and maybe make a follow up PR.

Copy link
Member

@ankith26 ankith26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the PR 🎉

Left a review for your consideration, resolve at will

src_c/math.c Show resolved Hide resolved
@Starbuck5 Starbuck5 merged commit c69d9a6 into pygame-community:main Sep 14, 2023
@Starbuck5 Starbuck5 deleted the Add-unified-vector-check-and-get branch September 14, 2023 04:58
@Starbuck5
Copy link
Member Author

Alright @itzpr3d4t0r now it's on you: further optimizations 😄 📈

@Starbuck5 Starbuck5 added this to the 2.4.0 milestone Sep 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
math pygame.math Performance Related to the speed or resource usage of the project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants