Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVX2 premul_alpha() #2615

Merged
merged 7 commits into from
Mar 17, 2024
Merged

Conversation

itzpr3d4t0r
Copy link
Member

@itzpr3d4t0r itzpr3d4t0r commented Dec 18, 2023

I’ve added an AVX2 implementation to Surface.premul_alpha(). While I’m not completely satisfied with the results, it’s a useful addition for those using this feature. I welcome any suggestions for a more efficient solution.

This new implementation shows a 40/42% improvement over the SSE2 version.
Figure_1

===| premul_alpha old |===
total: 4294.049899792299
mean: 0.002147024949896149
median: 0.0014896499997121282
stdev: 0.0020567606671152102

===| premul_alpha AVX2 |===
total: 3017.063199866243
mean: 0.0015085315999331214
median: 0.0008754000009503216
stdev: 0.0016191627889381673

using this test program (and a separate one for plotting and spitting the results):

import pygame
from timeit import repeat
import json

pygame.init()

MAX_SIZE = 2000
REP = 10
NUM = 1

FILENAME = "premul_alpha AVX2 new"
TITLE = "premul_alpha AVX2 new"
stmt = "surf.premul_alpha()"


screen = pygame.display.set_mode((500, 500))

img = pygame.image.load("equilibrium_core.png").convert_alpha()

data = {
    "title": TITLE,
    "data": [],
}

for size in range(1, MAX_SIZE + 1):

    surf = pygame.transform.scale(img, (size, size))

    if size % 10 == 0:
        print(f"Testing size: {size}")

    G = globals()

    time = min(repeat(stmt, number=NUM, repeat=REP, globals=G))
    data["data"].append(time)

with open(FILENAME + ".json", "w") as f:
    json.dump(data, f)

pygame.quit()

@itzpr3d4t0r itzpr3d4t0r added Performance Related to the speed or resource usage of the project SIMD Surface pygame.Surface labels Dec 18, 2023
@itzpr3d4t0r itzpr3d4t0r requested a review from a team as a code owner December 18, 2023 16:44
@MyreMylar
Copy link
Member

Good work on this, but I would put the bias back in favour of more accuracy for this particular function - it is not generally useful in a tight loop like blit - as you should be loading & preparing all your surfaces outside of your once-per-frame code paths, and then using them inside them repeatedly. premul_alpha() is definitely a prep function. It shouldn't be slow but I'd prefer accurate and swift over, foot-to-the-metal-who-cares.

For example, it bothers me to see results like:

import pygame

surf = pygame.Surface((100, 100), flags=pygame.SRCALPHA)
surf.fill((255, 255, 255, 128))
surf = surf.premul_alpha()
print(surf.get_at((0, 0)))

Result:

Color(127, 127, 127, 128)  # R, G & B always 1 less, whatever alpha is set to.

...where it is always consistently out by 1 on all channels when alpha multiplying with white. +5% performance on a non-loop targeted function doesn't seem worth it to me and will probably just lead to us getting bugs logged saying 'why are the colours slightly wrong?'. Why not just have it as accurate as we can make it, keep the tests the same - and still take the +35% win?

Anyway, that is my feeling.

Copy link
Member

@MyreMylar MyreMylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, LGTM now. Thanks for this, a good speedup!

@Starbuck5
Copy link
Member

Reading through this, I found a bug in the Surface handling, but the bug is shared with all other premultiply algorithms in my testing, so it doesn't block this being merged I suppose. #2750

Copy link
Member

@Starbuck5 Starbuck5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite the bug I found (shared with all the other implementations too), this looks good to me to press forward with. Cool work!

I'm really digging my random-pixel-data-correctness scripts these days for testing out these kind of PRs.

Test script if anyone is curious
import random
import hashlib

import pygame

random.seed(36)


expected_hashes = [
    "6f91733c0d654da4a4b71daef558760a5757f9e177b113196a84ca12771294b2",
    "d7ca61fa3bcc5078003c6d69042e6d33efd6a3c6e1500e2e1fd6bcec1ab5ef3d",
    "9d24d81a9a388b512b87801837ff69b0ee1ffd9d0c6f81c1daaf0c447cf4305b",
    "f18db382f3f31e4548d8de50d9ff277a1985a97fc043ecca7508b5ccd4e395d8",
    "c35a4490d953ecf1bd96f8350431aded9eb9550e5397ef18d337b776dc740e3d",
    "80a1a01f6a736f8a573b12ceb207a3b794ea219e836a8737ea93908188bee216",
    "6328d82098089db56e8edc4fd807e5e0ebc66a132eee56ed51b91ae17959c39f",
    "52413343fdacf7ffb9989c6be97e1a7f10d5355cbc3a7a6c89db41631cfdd4c2",
    "1600f4fc020bd4a8e881e2d9a82356611ebbef0e05954762222e4a26abcd93cf",
    "e08f168aaad5ba2be4d13600b951ee42daf96746c9b92d702b83d4a7bcc018d3",
    "64117f5bfd443c079f50a70b004a18f871565c9b14b40e4a74d2141d6f4e5ef5",
    "9329cb0bcf338574be7d1e18684844bfd9b885720516da92a3762cc6a3714e3d",
]

surf_height = 5
offset = (3, 7)


def populate_surf(surf):
    for y in range(surf.get_height()):
        for x in range(surf.get_width()):
            surf.set_at(
                (x, y),
                (
                    random.randint(0, 255),
                    random.randint(0, 255),
                    random.randint(0, 255),
                    random.randint(0, 255),
                ),
            )


hashes = []

for pixel_width in range(4, 16):

    dest = pygame.Surface((pixel_width, surf_height), pygame.SRCALPHA)
    populate_surf(dest)

    print(dest, dest.get_pitch())

    dest = dest.premul_alpha()

    sha256 = hashlib.sha256()
    sha256.update(pygame.image.tobytes(dest, "RGBA"))
    digest = sha256.hexdigest()

    assert digest == expected_hashes.pop(0)

    hashes.append(digest)
    # print(digest)

print(hashes)

@itzpr3d4t0r itzpr3d4t0r added this to the 2.5.0 milestone Mar 17, 2024
@itzpr3d4t0r itzpr3d4t0r merged commit d732a44 into pygame-community:main Mar 17, 2024
30 checks passed
@itzpr3d4t0r itzpr3d4t0r deleted the avx2-premul-alpha branch March 17, 2024 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Related to the speed or resource usage of the project SIMD Surface pygame.Surface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants