Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pygame.Surface.fblits slower in 2.5.0.dev2 vs 2.4.1 #2821

Closed
oddbookworm opened this issue Apr 26, 2024 · 13 comments · Fixed by #2844
Closed

pygame.Surface.fblits slower in 2.5.0.dev2 vs 2.4.1 #2821

oddbookworm opened this issue Apr 26, 2024 · 13 comments · Fixed by #2844
Labels
needs testing Performance Related to the speed or resource usage of the project Surface pygame.Surface
Milestone

Comments

@oddbookworm
Copy link
Member

@bigwhoopgames mentioned on discord that he's getting a 25%-33% fps drop when using the 2.5.0.dev2 prerelease vs the 2.4.1 release. I did some testing by giving him special wheels with specific changes to see if that fixed it, but didn't really see much difference. List of things I tried:

Image provided by Big Whoop showing some comparisons from a performance profile
image

@oddbookworm oddbookworm added Performance Related to the speed or resource usage of the project Surface pygame.Surface needs testing labels Apr 26, 2024
@oddbookworm oddbookworm added this to the 2.5.0 milestone Apr 26, 2024
@bigwhoopgames
Copy link

Tested with my youtube particles tutorial. Running on 2.4.1 ~20K particles at 60FPS and on 2.5.0 getting about 6800.

https://github.com/bigwhoopgames/youtube-particles/blob/master/particles.py

Line 88 needs changing to work with python 3.12, just wrap values passed to random as ints.

@itzpr3d4t0r
Copy link
Member

itzpr3d4t0r commented Apr 26, 2024

Could it be SDL's fault? We got SDL 2.28.5 on 2.4.1 and SDL 2.30.2 on 2.5.0 -dev2. It's basically the only thing that changed between our releases with fblits. Can anyone build 2.5.0 with SDL 2.28.5 to check it?

EDIT:
Confirmed, i tested installing the latest pygame-ce version with SDL 2.28.5 and the fps in my test program are back to pygame-ce 2.4.1 territory. (got 400 fps in 2.4.1 and 260-270 in 2.5.0 dev2 and 350-400 with 2.5.0 with the old SDL version).

Unfortunately there's no reference in SDL's releases page about any change regarding blitting: https://github.com/libsdl-org/SDL/releases.

@oddbookworm
Copy link
Member Author

How hard would making an SDL reproducer be so I can report it upstream to them?

@robertpfeiffer
Copy link
Contributor

Do you even need to recompile?
Just compile with SDL SDL 2.28.5 and substitute with 2.30.2 through the dynamic API replacer

SDL3_DYNAMIC_API=/my/actual/libSDL2.so.30.2 python3 benchmark.py

@Starbuck5
Copy link
Member

Information from discord-

itzpr said

To be specific I restricted the issue between 2.28.5 and 2.29.3
Couldn't really install 2.29.1 but i guess 2.29.0 introduced the issue

oddbookworm posted a C reproducer attempt

#include <stdio.h>
#include <sys/time.h>

#define SDL_MAIN_HANDLED
#include "SDL2/SDL.h"

long long timeInMilliseconds(void) {
    struct timeval tv;

    gettimeofday(&tv,NULL);
    return (((long long)tv.tv_sec)*1000)+(tv.tv_usec/1000);
}

Uint32 ColorToUint(int R, int G, int B)
{
    return (Uint32)((R << 16) + (G << 8) + (B << 0));
}

void blitManyTimes(SDL_Surface* target, SDL_Surface* surf, SDL_Rect* rect, size_t iterations)
{
    for (size_t i = 0U; i < iterations; ++i)
    {
        if (SDL_BlitSurface(surf, rect, target, rect) < 0)
        {
            printf("Failed to perform blit\n");
            return;
        }
    }
}

int main()
{
    printf("SDL %d.%d.%d\n", SDL_MAJOR_VERSION, SDL_MINOR_VERSION, SDL_PATCHLEVEL);

    if (SDL_Init(SDL_INIT_VIDEO) < 0)
    {
        printf("Failed to initialize SDL2\n");
        return -1;
    }

    SDL_Window* window = SDL_CreateWindow("SDL2 Window", SDL_WINDOWPOS_CENTERED, SDL_WINDOWPOS_CENTERED, 680, 480, 0);

    if (!window)
    {
        printf("Failed to create window\n");
        return -1;
    }

    SDL_Surface* windowSurface = SDL_GetWindowSurface(window);

    if (!windowSurface)
    {
        printf("Failed to get surface from window");
        return -1;
    }

    SDL_Surface* surf = SDL_CreateRGBSurface(0, 100, 100, 32, 0, 0, 0, 0);
    if (!surf)
    {
        printf("Failed to create surface");
        return -1;
    }

    SDL_Rect rect = { .x = 0, .y = 0, .w = 100, .h = 100 };
    SDL_Color color = { .r = 255, .g = 0, .b = 0, .a = 255 };

    if (SDL_FillRect(surf, &rect, ColorToUint(color.r, color.g, color.b)) < 0)
    {
        printf("Failed to fill surface");
        return -1;
    }

    long long start = timeInMilliseconds();
    blitManyTimes(windowSurface, surf, &rect, 1000000);
    long long end = timeInMilliseconds();

    printf("Total time elapsed: %ld milliseconds\n", end - start);

    return 0;
}

He said he didn't see any difference until he switched 100x100 surfs to 10x20.

@Starbuck5
Copy link
Member

Starbuck5 commented Apr 29, 2024

Here is a more isolated Python reproducer of the issue. Still using fblits, but I expect this would have the same effect if running a for loop of blits in Python, just harder to spot since that loop would have more overhead.

import random
import time

import pygame

random.seed(36)

width = 600
height = 400

screen = pygame.Surface((width, height))

def make_particle():
    p_surf = pygame.Surface((5,5))
    p_surf.fill((random.randint(0,255), random.randint(0,255), random.randint(0,255)))

    return (p_surf, (random.randint(0, width), random.randint(0, height)))

particles = [make_particle() for _ in range(1000000)]

start = time.time()
screen.fblits(particles)
print(time.time() - start)

Previously this runs in like a quarter second, now it runs in about a second.

I tracked it down to between SDL 2.29.2 and 2.29.3

I really don't see what it would be in there: libsdl-org/SDL@prerelease-2.29.2...prerelease-2.29.3

@itzpr3d4t0r
Copy link
Member

Ran a test that runs fblits on 100 positions on increasingly bigger surfaces and confirmed my suspect that this fblits/blitting got slower only with "smaller" surfaces, kinda confirming my idea that it has to do with rect clipping or something rather than the actual blit routine:
image

@Starbuck5
Copy link
Member

I bisected the performance regression to libsdl-org/SDL@a6d5c1f (surprisingly, nothing about blitting in there... I guess this makes a logging function more expensive and that something in blit calls it)

We still need a clear C reproducer and to report this to SDL, I'd appreciate someone doing.

@itzpr3d4t0r
Copy link
Member

itzpr3d4t0r commented May 2, 2024

Ran some tests by removing parts of pgSurface_Blit's if statements

pygame-ce/src_c/surface.c

Lines 3816 to 3953 in 7d73fc8

pgSurface_Blit(pgSurfaceObject *dstobj, pgSurfaceObject *srcobj,
SDL_Rect *dstrect, SDL_Rect *srcrect, int blend_flags)
{
SDL_Surface *src = pgSurface_AsSurface(srcobj);
SDL_Surface *dst = pgSurface_AsSurface(dstobj);
SDL_Surface *subsurface = NULL;
int result, suboffsetx = 0, suboffsety = 0;
SDL_Rect orig_clip, sub_clip;
Uint8 alpha;
Uint32 key;
/* passthrough blits to the real surface */
if (((pgSurfaceObject *)dstobj)->subsurface) {
PyObject *owner;
struct pgSubSurface_Data *subdata;
subdata = ((pgSurfaceObject *)dstobj)->subsurface;
owner = subdata->owner;
subsurface = pgSurface_AsSurface(owner);
suboffsetx = subdata->offsetx;
suboffsety = subdata->offsety;
while (((pgSurfaceObject *)owner)->subsurface) {
subdata = ((pgSurfaceObject *)owner)->subsurface;
owner = subdata->owner;
subsurface = pgSurface_AsSurface(owner);
suboffsetx += subdata->offsetx;
suboffsety += subdata->offsety;
}
SDL_GetClipRect(subsurface, &orig_clip);
SDL_GetClipRect(dst, &sub_clip);
sub_clip.x += suboffsetx;
sub_clip.y += suboffsety;
SDL_SetClipRect(subsurface, &sub_clip);
dstrect->x += suboffsetx;
dstrect->y += suboffsety;
dst = subsurface;
}
else {
pgSurface_Prep(dstobj);
subsurface = NULL;
}
pgSurface_Prep(srcobj);
if ((blend_flags != 0 && blend_flags != PYGAME_BLEND_ALPHA_SDL2) ||
((SDL_GetColorKey(src, &key) == 0 || _PgSurface_SrcAlpha(src) == 1) &&
/* This simplification is possible because a source subsurface
is converted to its owner with a clip rect and a dst
subsurface cannot be blitted to its owner because the
owner is locked.
*/
dst->pixels == src->pixels && srcrect != NULL &&
surface_do_overlap(src, srcrect, dst, dstrect))) {
/* Py_BEGIN_ALLOW_THREADS */
result = pygame_Blit(src, srcrect, dst, dstrect, blend_flags);
/* Py_END_ALLOW_THREADS */
}
/* can't blit alpha to 8bit, crashes SDL */
else if (PG_SURF_BytesPerPixel(dst) == 1 &&
(SDL_ISPIXELFORMAT_ALPHA(src->format->format) ||
((SDL_GetSurfaceAlphaMod(src, &alpha) == 0 && alpha != 255)))) {
/* Py_BEGIN_ALLOW_THREADS */
if (PG_SURF_BytesPerPixel(src) == 1) {
result = pygame_Blit(src, srcrect, dst, dstrect, 0);
}
else {
SDL_PixelFormat *fmt = src->format;
SDL_PixelFormat newfmt;
newfmt.palette = 0; /* Set NULL (or SDL gets confused) */
#if SDL_VERSION_ATLEAST(3, 0, 0)
newfmt.bits_per_pixel = fmt->bits_per_pixel;
newfmt.bytes_per_pixel = fmt->bytes_per_pixel;
#else
newfmt.BitsPerPixel = fmt->BitsPerPixel;
newfmt.BytesPerPixel = fmt->BytesPerPixel;
#endif
newfmt.Amask = 0;
newfmt.Rmask = fmt->Rmask;
newfmt.Gmask = fmt->Gmask;
newfmt.Bmask = fmt->Bmask;
newfmt.Ashift = 0;
newfmt.Rshift = fmt->Rshift;
newfmt.Gshift = fmt->Gshift;
newfmt.Bshift = fmt->Bshift;
newfmt.Aloss = 0;
newfmt.Rloss = fmt->Rloss;
newfmt.Gloss = fmt->Gloss;
newfmt.Bloss = fmt->Bloss;
src = PG_ConvertSurface(src, &newfmt);
if (src) {
result = SDL_BlitSurface(src, srcrect, dst, dstrect);
SDL_FreeSurface(src);
}
else {
result = -1;
}
}
/* Py_END_ALLOW_THREADS */
}
else if (blend_flags != PYGAME_BLEND_ALPHA_SDL2 &&
!(pg_EnvShouldBlendAlphaSDL2()) &&
SDL_GetColorKey(src, &key) != 0 &&
(PG_SURF_BytesPerPixel(dst) == 4 ||
PG_SURF_BytesPerPixel(dst) == 2) &&
_PgSurface_SrcAlpha(src) &&
(SDL_ISPIXELFORMAT_ALPHA(src->format->format)) &&
!pg_HasSurfaceRLE(src) && !pg_HasSurfaceRLE(dst) &&
!(src->flags & SDL_RLEACCEL) && !(dst->flags & SDL_RLEACCEL)) {
/* If we have a 32bit source surface with per pixel alpha
and no RLE we'll use pygame_Blit so we can mimic how SDL1
behaved */
result = pygame_Blit(src, srcrect, dst, dstrect, blend_flags);
}
else {
/* Py_BEGIN_ALLOW_THREADS */
result = SDL_BlitSurface(src, srcrect, dst, dstrect);
/* Py_END_ALLOW_THREADS */
}
if (subsurface) {
SDL_SetClipRect(subsurface, &orig_clip);
dstrect->x -= suboffsetx;
dstrect->y -= suboffsety;
}
else
pgSurface_Unprep(dstobj);
pgSurface_Unprep(srcobj);
if (result == -1)
PyErr_SetString(pgExc_SDLError, SDL_GetError());
if (result == -2)
PyErr_SetString(pgExc_SDLError, "Surface was lost");
return result != 0;
}
and I'm pretty sure SDL_GetColorKey is the issue, don't know why but removing both SDL_GetColorKey calls there brings back 2.4.1 performance.

@Starbuck5
Copy link
Member

Nice find!

Okay, I think I have the full picture now.

We call SDL_GetColorKey(src, &key), but only to check whether the surface has a colorkey or not.

int SDL_GetColorKey(SDL_Surface *surface, Uint32 *key)
{
    if (!surface) {
        return SDL_InvalidParamError("surface");
    }

    if (!(surface->map->info.flags & SDL_COPY_COLORKEY)) {
        return SDL_SetError("Surface doesn't have a colorkey");
    }

    if (key) {
        *key = surface->map->info.colorkey;
    }
    return 0;
}

Inside SDL, it calls SDL_SetError if the surface doesn't have a colorkey. But libsdl-org/SDL@a6d5c1f makes that way more expensive because it needs to check the environment for that variable now?

I confirmed the slowdown is coming from GetColorKey rather than rearranging the blit code paths by adding a bunch of them into the function like so:

int res = SDL_GetColorKey(src, &key);
int res2 = SDL_GetColorKey(src, &key);
int res3 = SDL_GetColorKey(src, &key);
int res4 = SDL_GetColorKey(src, &key);
int res5 = SDL_GetColorKey(src, &key);

Luckily, there is a newer function called SDL_HasColorKey, which is available in SDL 2.0.9 and later. We support SDL 2.0.10 and above right now, so that works. There are several places in our code where we use GetColorKey == 0 that we could easily replace with HasColorKey. However, there are also places where GetColorKey == 0 is tested and the returned color key is actually used, so those blocks will be slightly less efficient than before if replaced with HasColorKey + GetColorKey (two functions instead of one)

@robertpfeiffer
Copy link
Contributor

That reminds me: I should post the result of my investigation into palettes and the bug on Android.

A palette can have the same colour twice, and if you load a paletted PNG where one colour has alpha transparency and the others do not, then SDL_image will just use the index of that colour as the colour key. This way, the loaded palette can have the same colour twice, with the same RGB values, and both time with an alpha of 255, even though the saved PNG had unique colours in the palette.

We should probably expost SDL_GetColorKey (to get the mapped value of the colour key) and SDL_HasColorKey to Python.

@Starbuck5
Copy link
Member

I just opened #2835 that starts to deal with this. There are other uses of SDL_GetColorKey in our code that need to be retrofitted, but the change made in 2835 is the most critical one and also the simplest, so I hope it's easy to review.

@ankith26
Copy link
Member

ankith26 commented May 7, 2024

If this is not fixed on the SDL side, we could just add something like the following and use it across our codebase when we don't care about the SDL error

int PG_GetColorKey(SDL_Surface *surface, Uint32 *key) {
    if (!SDL_HasColorKey(surface)) {
        return -1;
    }
    return SDL_GetColorKey(surface, key);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs testing Performance Related to the speed or resource usage of the project Surface pygame.Surface
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants