-
-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVX2 premul_alpha() #2615
AVX2 premul_alpha() #2615
Conversation
Good work on this, but I would put the bias back in favour of more accuracy for this particular function - it is not generally useful in a tight loop like blit - as you should be loading & preparing all your surfaces outside of your once-per-frame code paths, and then using them inside them repeatedly. For example, it bothers me to see results like: import pygame
surf = pygame.Surface((100, 100), flags=pygame.SRCALPHA)
surf.fill((255, 255, 255, 128))
surf = surf.premul_alpha()
print(surf.get_at((0, 0))) Result:
...where it is always consistently out by 1 on all channels when alpha multiplying with white. +5% performance on a non-loop targeted function doesn't seem worth it to me and will probably just lead to us getting bugs logged saying 'why are the colours slightly wrong?'. Why not just have it as accurate as we can make it, keep the tests the same - and still take the +35% win? Anyway, that is my feeling. |
634aef9
to
c2f18a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, LGTM now. Thanks for this, a good speedup!
Reading through this, I found a bug in the Surface handling, but the bug is shared with all other premultiply algorithms in my testing, so it doesn't block this being merged I suppose. #2750 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite the bug I found (shared with all the other implementations too), this looks good to me to press forward with. Cool work!
I'm really digging my random-pixel-data-correctness scripts these days for testing out these kind of PRs.
Test script if anyone is curious
import random
import hashlib
import pygame
random.seed(36)
expected_hashes = [
"6f91733c0d654da4a4b71daef558760a5757f9e177b113196a84ca12771294b2",
"d7ca61fa3bcc5078003c6d69042e6d33efd6a3c6e1500e2e1fd6bcec1ab5ef3d",
"9d24d81a9a388b512b87801837ff69b0ee1ffd9d0c6f81c1daaf0c447cf4305b",
"f18db382f3f31e4548d8de50d9ff277a1985a97fc043ecca7508b5ccd4e395d8",
"c35a4490d953ecf1bd96f8350431aded9eb9550e5397ef18d337b776dc740e3d",
"80a1a01f6a736f8a573b12ceb207a3b794ea219e836a8737ea93908188bee216",
"6328d82098089db56e8edc4fd807e5e0ebc66a132eee56ed51b91ae17959c39f",
"52413343fdacf7ffb9989c6be97e1a7f10d5355cbc3a7a6c89db41631cfdd4c2",
"1600f4fc020bd4a8e881e2d9a82356611ebbef0e05954762222e4a26abcd93cf",
"e08f168aaad5ba2be4d13600b951ee42daf96746c9b92d702b83d4a7bcc018d3",
"64117f5bfd443c079f50a70b004a18f871565c9b14b40e4a74d2141d6f4e5ef5",
"9329cb0bcf338574be7d1e18684844bfd9b885720516da92a3762cc6a3714e3d",
]
surf_height = 5
offset = (3, 7)
def populate_surf(surf):
for y in range(surf.get_height()):
for x in range(surf.get_width()):
surf.set_at(
(x, y),
(
random.randint(0, 255),
random.randint(0, 255),
random.randint(0, 255),
random.randint(0, 255),
),
)
hashes = []
for pixel_width in range(4, 16):
dest = pygame.Surface((pixel_width, surf_height), pygame.SRCALPHA)
populate_surf(dest)
print(dest, dest.get_pitch())
dest = dest.premul_alpha()
sha256 = hashlib.sha256()
sha256.update(pygame.image.tobytes(dest, "RGBA"))
digest = sha256.hexdigest()
assert digest == expected_hashes.pop(0)
hashes.append(digest)
# print(digest)
print(hashes)
I’ve added an AVX2 implementation to Surface.premul_alpha(). While I’m not completely satisfied with the results, it’s a useful addition for those using this feature. I welcome any suggestions for a more efficient solution.
This new implementation shows a 40/42% improvement over the SSE2 version.
using this test program (and a separate one for plotting and spitting the results):