Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SIMD versions of the invert transform #2534

Merged
merged 13 commits into from
Apr 12, 2024
Merged

Conversation

MyreMylar
Copy link
Member

Related to #2476.

This adds SSE2/Neon and AVX2 versions of the invert transform.

In quick performance testing the SSE2 version is about 12 times faster at Walt inversions, and AVX2 is about 13 times faster. I'd expect AVX2 performance to do better at larger image sizes. Though; there is only one operation in the invert transform and SSE2 can do four pixels at a time so there isn't as much in it as in some operations.

@MyreMylar MyreMylar requested a review from a team as a code owner October 27, 2023 15:38
@MyreMylar MyreMylar added the transform pygame.transform label Oct 27, 2023
@MyreMylar MyreMylar self-assigned this Oct 27, 2023
@MyreMylar
Copy link
Member Author

I've been quick visual and performance testing with this test program:

import pygame
import timeit


def invert_walt(walt_image):
    return pygame.transform.invert(walt_image)


pygame.init()


display_surf = pygame.display.set_mode((800, 350))
walt = pygame.image.load("images/walt.png").convert_alpha()


walt_opaque = walt.convert()
walt_inverse = invert_walt(walt)


running = True

while running:
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            running = False

    display_surf.fill((255, 150, 255))
    display_surf.blit(walt, (0, 0))
    display_surf.blit(walt_inverse, (400, 0))

    pygame.display.flip()

time_taken = timeit.timeit(
    "invert_walt(walt)",
    setup="import pygame\n\n"
    "pygame.init()\n\n"
    "display_surf = pygame.display.set_mode((800, 350))\n\n"
    "walt = pygame.image.load('images/walt.png').convert_alpha()\n\n"
    "def invert_walt(walt_image): "
    "    return pygame.transform.invert(walt_image)",
    number=1000,
)
print("time taken to do 1000 walt inverts:", time_taken)

It is not compulsory to test with Walter - but it is traditional.

For better CPI and throughput
# Conflicts:
#	src_c/simd_transform.h
#	src_c/simd_transform_sse2.c
src_c/simd_transform_avx2.c Outdated Show resolved Hide resolved
src_c/simd_transform_avx2.c Outdated Show resolved Hide resolved
src_c/simd_transform_sse2.c Outdated Show resolved Hide resolved
@itzpr3d4t0r itzpr3d4t0r added SIMD Performance Related to the speed or resource usage of the project labels Nov 12, 2023
# Conflicts:
#	src_c/simd_transform.h
#	src_c/simd_transform_avx2.c
#	src_c/simd_transform_sse2.c
Copy link
Member

@itzpr3d4t0r itzpr3d4t0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MyreMylar MyreMylar added this to the 2.4.0 milestone Nov 26, 2023
@Starbuck5 Starbuck5 removed this from the 2.4.0 milestone Dec 25, 2023
@Starbuck5 Starbuck5 added this to the 2.5.0 milestone Dec 25, 2023
@itzpr3d4t0r itzpr3d4t0r requested a review from Starbuck5 February 4, 2024 16:55
Copy link
Member

@Starbuck5 Starbuck5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this out locally, gives pixel perfect results for all my test cases before and after this PR.

import random
import hashlib

import pygame

random.seed(36)


expected_hashes = [
    "82c38da441d3325c0ae6fa3fffa74b8b760a26d346a9fc7441c4433efc1210c7",
    "3b05aaa0469f40bfef54207d34f48a868c3a6fa665a76f8a6a8e7f34feb47935",
    "0192eeaf93728e40ab4e11bb5c9d85302ff927b48832fb95fbaceb69e9108813",
    "e8e41e3b0c08e81f29062520b0abee1f3104fc8a90966fbd29a2b46acd097791",
    "340cf63be38ad92cb7a4d836f82d2dcedc550c14ec68c783f203fd127184f6cc",
    "35f3d8ed7268d3f3fb54ebed760e08c85b09ae98960107e3c26f559ba4cba80f",
    "3f4dc52b63a567fd77b431c67be01a8cd0a2e5c5d18961a3f8d7d26faaf46976",
    "266377ffd231505911509d03a75f6ee757375faf5e965da3aa020322209a4486",
    "6d3ec34adca4c1337e000ec339901aef45039009c3355328eca28a5ba5df55e1",
    "edc66664ce5448c742e5848cce2dc2cbaed0383a16a46093551c40c5df37e384",
    "0fecf8c40a3ba2c67c382b2900e29cea152ecd1f74d212464d0f8b883131c57e",
    "f4ad7e15493f8c3862a5aa74ba093c4ab2d6b715939c286c52d61c636d03c1e1",
]

surf_height = 5
offset = (3, 7)


def populate_surf(surf):
    for y in range(surf.get_height()):
        for x in range(surf.get_width()):
            surf.set_at(
                (x, y),
                (
                    random.randint(0, 255),
                    random.randint(0, 255),
                    random.randint(0, 255),
                    random.randint(0, 255),
                ),
            )


hashes = []

for pixel_width in range(4, 16):

    dest = pygame.Surface((pixel_width, surf_height), pygame.SRCALPHA)
    populate_surf(dest)

    print(dest, dest.get_pitch())

    dest = dest.premul_alpha()

    dest = pygame.transform.invert(dest)

    sha256 = hashlib.sha256()
    sha256.update(pygame.image.tobytes(dest, "RGBA"))
    digest = sha256.hexdigest()

    assert digest == expected_hashes.pop(0)

    hashes.append(digest)
    # print(digest)

print(hashes)

@MyreMylar MyreMylar merged commit 9a5ff18 into main Apr 12, 2024
30 checks passed
@MyreMylar MyreMylar deleted the simd-transform-invert branch April 12, 2024 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Related to the speed or resource usage of the project SIMD transform pygame.transform
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants