Core (LV::Video) Fix alpha blending of 32-bit videos (#230) #244

kaixiong · 2023-02-02T21:51:50Z

This is a rewrite of the buggy 32-bit LV::Video alpha blending code to deal with arithmetic underflows/overflows and the use of an uninitialized register in the MMX implementation (#230).

Take note that GCC/Clang x86-64 (recent only?) produces SSE instructions instead of MMX. GCC 12.2 uses the XMM registers while Clang sticks to MM but throwing in the use of pshuflw.

Perhaps it's time to move on and use SSE2 (introduced in 2000 to Pentium 4s) to work with 2 pixels at once. Or maybe even 4 pixels at once. This will require larger memory alignments and complicate the code a bit more to work with non-divisible row widths.

Here is a link to the plain C and SIMD code in Godbolt.

hartwork

@kaixiong I think despite the CI being green I found two bugs in the new version of blit_overlay_alphasrc. Happy to learn I overlooked or misunderstood something, let's see. Interesting stuff!

hartwork · 2023-02-07T01:09:06Z

libvisual/libvisual/private/lv_video_blit.cpp

+              uint16_t const c0 = static_cast<uint16_t> (src_pixel[0]) * src_alpha + static_cast<uint16_t> (dst_pixel[0]) * (255 - src_alpha);
+              uint16_t const c1 = static_cast<uint16_t> (src_pixel[1]) * src_alpha + static_cast<uint16_t> (dst_pixel[1]) * (255 - src_alpha);
+              uint16_t const c2 = static_cast<uint16_t> (src_pixel[2]) * src_alpha + static_cast<uint16_t> (dst_pixel[2]) * (255 - src_alpha);

-              destbuf[0] = (alpha * (srcbuf[0] - destbuf[0]) >> 8) + destbuf[0];
-              destbuf[1] = (alpha * (srcbuf[1] - destbuf[1]) >> 8) + destbuf[1];
-              destbuf[2] = (alpha * (srcbuf[2] - destbuf[2]) >> 8) + destbuf[2];
+              dst_pixel[0] = c0 >> 8;
+              dst_pixel[1] = c1 >> 8;
+              dst_pixel[2] = c2 >> 8;


Hi @kaixiong ,

the TL;DR version would be that I think that:

The >> 8 in dst_pixel[0] = c0 >> 8; and siblings is too much

~~The cast to 16 bits needs to happen elsewhere, i.e. not cast(a * b + cast(c * d)) but cast(a * b) + cast(c * d).~~ EDIT: mis-read the code, nevermind

Let me go into more detail on former:

My impression is that this tries to implement approach "A over B" at https://en.wikipedia.org/wiki/Alpha_compositing#Description , formalized there as:

A over B a_0 := a_a + a_b * (1.0 - a_a) C_0 := (C_a * a_a + C_b * a_b * (1.0 - a_a)) / a_0

Whereas for us:

A := src B := dst

With a_a and a_b ranging from 0 to 255 rather than 0.0 to 1.0 I end up with these formulas for us, note the output range annotation.

a_0 := a_a/255 + a_b/255 * (1.0 - a_a/255)) # range 0..1 == a_a/255 + a_b/255 * (255/255 - a_a/255)) == (a_a + a_b * (255/255 - a_a/255)))/ 255 == (a_a + a_b * (255 - a_a) / 255) / 255 # range 0..1 => (a_a + a_b * (255 - a_a) / 255) # range 0..255 C_0 := (C_a * a_a/255 + C_b * a_b/255 * (1.0 - a_a/255)) / (a_0/255) # range 0..1 == (C_a * a_a/255 + C_b * a_b/255 * (1.0 - a_a/255)) / a_0 * 255 == (C_a * a_a/255 + C_b * a_b/255 * (255/255 - a_a/255)) / a_0 * 255 == (C_a * a_a/255 + C_b * a_b/255 * ((255 - a_a) / 255)) / a_0 * 255 # range 0..1 => (C_a * a_a + C_b * a_b * ((255 - a_a) / 255)) / a_0 * 255 # range 0..255

Now the current code in the pull request seems to assume that the dst image has an alpha channel with full 255 opacity a_b for all pixels. I'd like to understand why and would like to suggest addition of a comment to the code, but I'll take it for granted below for a moment.

So Inserting a_b := 255 I get …

a_0 := (a_a + a_b * (255 - a_a) / 255) == (a_a + 255 * (255 - a_a) / 255) == (a_a + (255 - a_a)) == 255 C_0 := (C_a * a_a + C_b * a_b * ((255 - a_a) / 255)) / a_0 * 255 == (C_a * a_a + C_b * 255 * ((255 - a_a) / 255)) / 255 * 255 == (C_a * a_a + C_b * ((255 - a_a) )) == (src_pixel[0] * src_alpha + dst_pixel[0] * ((255 - src_alpha)))

Due to a_0 == 255 then dst_pixel[3] does not need to be written, confirmed.
However, C_0 is already in range 0..255 so the additional division in dst_pixel[0] = c0 >> 8; seems to be to much and should yield in an "almost black" picture, if I am not mistaken.

I'm happy to learn what I missed or to take this to a call, e.g. if my side here was hard to understand.

Best, Sebastian

@hartwork, just to be clear this is only meant to be a fix of the original code. The operation here doesn't actually implement 'over' (or 'atop'), it's a simple linear interpolation between the source and target colours, using the source alpha as parameter. The target alpha is unchanged.

At some point, I want to have alpha compositing with the complete set of operators but that'll have to come later. There are additional API design considerations in there due to the necessity of premultiplied alpha, which the Wikipedia article mentions.

Regarding the actual calculation itself, the right-shift by 8 bits is necessary. Alpha is a percentage i.e. value in [0.0, 1.0] that's been mapped to [0, 255]. So what we're calculating here is really:

c = c1 * alpha/255 + c2 * (1 - alpha/255)

Multiplying throughout by 255 to work with only integers, we have:

c * 255 = c1 * alpha + c2 * (255 - alpha)

To get back c, we divide by 255 in the final step. Here >> 8 was originally chosen for performance reasons (as the test mentions), so I kept it as it is.

Since c1, c2 and alpha each have a maximum value of 255 and their products are involved, we need to work in at least uint16_t.

@kaixiong thanks for elaborating, let me digest that more.

@kaixiong make sense to me now! 🙏

kaixiong · 2024-02-08T13:34:14Z

@hartwork, any chance you could look at this again?

hartwork · 2024-02-14T18:43:25Z

@hartwork, any chance you could look at this again?

@kaixiong I hope to find time to, in the coming days

hartwork

@hartwork, any chance you could look at this again?

@kaixiong I understood commit Core (LV::Video): Account for underflow/overflow in the alpha blending of 32-bit videos. now — looks good! 👍

Some of the other parts — in particular MMX — I'll have to just trust you and the test suite with.

3 of the 6 commits could be worth squashing into their respective fix target if you like (I consider that optional and just an idea):

Core (LV::Video): Use the correct source alpha.
Core (Tests): Fix wrong argument order in calls to Video::get_pixel_ptr().
Core (Tests): Fix building of tests.

A rebase onto latest master may also be nice (but optional), I don't expect any new conflicts.

What do you think?

hartwork · 2024-12-23T16:57:29Z

libvisual/libvisual/private/lv_video_blit.cpp

+              uint16_t const c0 = static_cast<uint16_t> (src_pixel[0]) * src_alpha + static_cast<uint16_t> (dst_pixel[0]) * (255 - src_alpha);
+              uint16_t const c1 = static_cast<uint16_t> (src_pixel[1]) * src_alpha + static_cast<uint16_t> (dst_pixel[1]) * (255 - src_alpha);
+              uint16_t const c2 = static_cast<uint16_t> (src_pixel[2]) * src_alpha + static_cast<uint16_t> (dst_pixel[2]) * (255 - src_alpha);

-              destbuf[0] = (alpha * (srcbuf[0] - destbuf[0]) >> 8) + destbuf[0];
-              destbuf[1] = (alpha * (srcbuf[1] - destbuf[1]) >> 8) + destbuf[1];
-              destbuf[2] = (alpha * (srcbuf[2] - destbuf[2]) >> 8) + destbuf[2];
+              dst_pixel[0] = c0 >> 8;
+              dst_pixel[1] = c1 >> 8;
+              dst_pixel[2] = c2 >> 8;


@kaixiong make sense to me now! 🙏

…ntrinsics (#230).

kaixiong · 2024-12-25T22:06:52Z

@hartwork

Thank you! I have squashed the commits down to 3.

However.. I have also reverted the formulas in blit_overlay_alphasrc():

There is no risk of underflows or overflows (claimed in my commit message) due to C/C++ integral promotion rules. Arithmetic involving uint8_ts and uint16_ts are first implicitly promoted to int, always. The added static_casts obscure this fact.
Worse, the casts achieve nothing. They don't even work as optimization or verification hints. MSVC/Clang/GCC loads and zero-extends the 8-bit colour and alpha values into 32-bit registers before carrying out any computations.
I used a Python script to compare the original and updated formulas for every combination of pixel values and alphas. The unfortunate fact is, the updated code has a worse mean error of -1 (versus 0 in the original), with only a marginal improvement in spread. Adding a correction factor of 255 into the numerator works but I can't explain why it's the optimal choice yet 😄. In any case, this makes the formulas a bit more complicated than what I had intended.

So to keep things simple for everyone, I'm sticking to the original formulas and calling the changes a clean-up instead of a bug-fix. What do you think?

hartwork

@kaixiong good point about integer promotion! If we stick to the original formula, let's add a comment with the intended math, as it takes too long to understand without something like it. What do you think?

libvisual/libvisual/private/lv_video_blit.cpp

…_alphasrc() per Sebastian (hartwork)'s suggestion.

hartwork

@kaixiong thank's for adding the comment. Approving to the extent that I can comprehend, e.g. not the parts with MMX. Let's get this merged 👍

kaixiong self-assigned this Feb 2, 2023

kaixiong added bug critical labels Feb 2, 2023

kaixiong added this to the 0.5.0_alpha1 milestone Feb 2, 2023

kaixiong force-pushed the alpha-blend-fixes branch from 1ec968b to 61bdede Compare February 5, 2023 02:14

kaixiong marked this pull request as ready for review February 5, 2023 02:26

kaixiong requested a review from hartwork February 5, 2023 02:42

hartwork reviewed Feb 7, 2023

View reviewed changes

kaixiong force-pushed the alpha-blend-fixes branch from 61bdede to 45bae1d Compare February 11, 2023 11:57

kaixiong force-pushed the alpha-blend-fixes branch from 45bae1d to 156039a Compare March 31, 2023 23:36

kaixiong force-pushed the alpha-blend-fixes branch from 156039a to 35ba3fb Compare February 8, 2024 13:25

hartwork approved these changes Dec 23, 2024

View reviewed changes

kaixiong force-pushed the alpha-blend-fixes branch 2 times, most recently from 3f21b65 to 41c799d Compare December 25, 2024 09:03

kaixiong added 3 commits December 26, 2024 05:20

Core (LV::Video): Clean up blit_overlay_alphasrc().

e5f954a

Core (LV::Video): Rewrite MMX alpha blending of 32-bit videos using i…

3a7d770

…ntrinsics (#230).

Core (Tests): Add test for LV::VideoBlit::blit_overlay_alphasrc().

8cbfb5c

kaixiong force-pushed the alpha-blend-fixes branch from 41c799d to 8cbfb5c Compare December 25, 2024 21:21

hartwork requested changes Dec 26, 2024

View reviewed changes

libvisual/libvisual/private/lv_video_blit.cpp Show resolved Hide resolved

Core (LV::Video): Add explanatory note on calculation in blit_overlay…

0e95069

…_alphasrc() per Sebastian (hartwork)'s suggestion.

hartwork approved these changes Dec 26, 2024

View reviewed changes

kaixiong merged commit c3e5d7b into master Dec 26, 2024
6 checks passed

kaixiong deleted the alpha-blend-fixes branch December 26, 2024 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core (LV::Video) Fix alpha blending of 32-bit videos (#230) #244

Core (LV::Video) Fix alpha blending of 32-bit videos (#230) #244

kaixiong commented Feb 2, 2023

hartwork left a comment •

edited

Loading

hartwork Feb 7, 2023 •

edited

Loading

kaixiong Feb 7, 2023 •

edited

Loading

hartwork Feb 7, 2023

hartwork Dec 23, 2024

kaixiong commented Feb 8, 2024

hartwork commented Feb 14, 2024

hartwork left a comment •

edited

Loading

hartwork Dec 23, 2024

kaixiong commented Dec 25, 2024

hartwork left a comment

hartwork left a comment

Core (LV::Video) Fix alpha blending of 32-bit videos (#230) #244

Core (LV::Video) Fix alpha blending of 32-bit videos (#230) #244

Conversation

kaixiong commented Feb 2, 2023

hartwork left a comment • edited Loading

Choose a reason for hiding this comment

hartwork Feb 7, 2023 • edited Loading

Choose a reason for hiding this comment

kaixiong Feb 7, 2023 • edited Loading

Choose a reason for hiding this comment

hartwork Feb 7, 2023

Choose a reason for hiding this comment

hartwork Dec 23, 2024

Choose a reason for hiding this comment

kaixiong commented Feb 8, 2024

hartwork commented Feb 14, 2024

hartwork left a comment • edited Loading

Choose a reason for hiding this comment

hartwork Dec 23, 2024

Choose a reason for hiding this comment

kaixiong commented Dec 25, 2024

hartwork left a comment

Choose a reason for hiding this comment

hartwork left a comment

Choose a reason for hiding this comment

hartwork left a comment •

edited

Loading

hartwork Feb 7, 2023 •

edited

Loading

kaixiong Feb 7, 2023 •

edited

Loading

hartwork left a comment •

edited

Loading