-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core (LV::Video) Fix alpha blending of 32-bit videos (#230) #244
Conversation
1ec968b
to
61bdede
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaixiong I think despite the CI being green I found two bugs in the new version of blit_overlay_alphasrc
. Happy to learn I overlooked or misunderstood something, let's see. Interesting stuff!
uint16_t const c0 = static_cast<uint16_t> (src_pixel[0]) * src_alpha + static_cast<uint16_t> (dst_pixel[0]) * (255 - src_alpha); | ||
uint16_t const c1 = static_cast<uint16_t> (src_pixel[1]) * src_alpha + static_cast<uint16_t> (dst_pixel[1]) * (255 - src_alpha); | ||
uint16_t const c2 = static_cast<uint16_t> (src_pixel[2]) * src_alpha + static_cast<uint16_t> (dst_pixel[2]) * (255 - src_alpha); | ||
|
||
destbuf[0] = (alpha * (srcbuf[0] - destbuf[0]) >> 8) + destbuf[0]; | ||
destbuf[1] = (alpha * (srcbuf[1] - destbuf[1]) >> 8) + destbuf[1]; | ||
destbuf[2] = (alpha * (srcbuf[2] - destbuf[2]) >> 8) + destbuf[2]; | ||
dst_pixel[0] = c0 >> 8; | ||
dst_pixel[1] = c1 >> 8; | ||
dst_pixel[2] = c2 >> 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kaixiong ,
the TL;DR version would be that I think that:
- The
>> 8
indst_pixel[0] = c0 >> 8;
and siblings is too much The cast to 16 bits needs to happen elsewhere, i.e. notEDIT: mis-read the code, nevermindcast(a * b + cast(c * d))
butcast(a * b) + cast(c * d)
.
Let me go into more detail on former:
My impression is that this tries to implement approach "A over B" at https://en.wikipedia.org/wiki/Alpha_compositing#Description , formalized there as:
A over B
a_0 := a_a + a_b * (1.0 - a_a)
C_0 := (C_a * a_a + C_b * a_b * (1.0 - a_a)) / a_0
Whereas for us:
A := src
B := dst
With a_a
and a_b
ranging from 0 to 255 rather than 0.0 to 1.0 I end up with these formulas for us, note the output range annotation.
a_0 := a_a/255 + a_b/255 * (1.0 - a_a/255)) # range 0..1
== a_a/255 + a_b/255 * (255/255 - a_a/255))
== (a_a + a_b * (255/255 - a_a/255)))/ 255
== (a_a + a_b * (255 - a_a) / 255) / 255 # range 0..1
=> (a_a + a_b * (255 - a_a) / 255) # range 0..255
C_0 := (C_a * a_a/255 + C_b * a_b/255 * (1.0 - a_a/255)) / (a_0/255) # range 0..1
== (C_a * a_a/255 + C_b * a_b/255 * (1.0 - a_a/255)) / a_0 * 255
== (C_a * a_a/255 + C_b * a_b/255 * (255/255 - a_a/255)) / a_0 * 255
== (C_a * a_a/255 + C_b * a_b/255 * ((255 - a_a) / 255)) / a_0 * 255 # range 0..1
=> (C_a * a_a + C_b * a_b * ((255 - a_a) / 255)) / a_0 * 255 # range 0..255
Now the current code in the pull request seems to assume that the dst
image has an alpha channel with full 255 opacity a_b
for all pixels. I'd like to understand why and would like to suggest addition of a comment to the code, but I'll take it for granted below for a moment.
So Inserting a_b := 255
I get …
a_0 := (a_a + a_b * (255 - a_a) / 255)
== (a_a + 255 * (255 - a_a) / 255)
== (a_a + (255 - a_a))
== 255
C_0 := (C_a * a_a + C_b * a_b * ((255 - a_a) / 255)) / a_0 * 255
== (C_a * a_a + C_b * 255 * ((255 - a_a) / 255)) / 255 * 255
== (C_a * a_a + C_b * ((255 - a_a) ))
== (src_pixel[0] * src_alpha + dst_pixel[0] * ((255 - src_alpha)))
Due to a_0 == 255
then dst_pixel[3]
does not need to be written, confirmed.
However, C_0
is already in range 0..255 so the additional division in dst_pixel[0] = c0 >> 8;
seems to be to much and should yield in an "almost black" picture, if I am not mistaken.
I'm happy to learn what I missed or to take this to a call, e.g. if my side here was hard to understand.
Best, Sebastian
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hartwork, just to be clear this is only meant to be a fix of the original code. The operation here doesn't actually implement 'over' (or 'atop'), it's a simple linear interpolation between the source and target colours, using the source alpha as parameter. The target alpha is unchanged.
At some point, I want to have alpha compositing with the complete set of operators but that'll have to come later. There are additional API design considerations in there due to the necessity of premultiplied alpha, which the Wikipedia article mentions.
Regarding the actual calculation itself, the right-shift by 8 bits is necessary. Alpha is a percentage i.e. value in [0.0, 1.0] that's been mapped to [0, 255]. So what we're calculating here is really:
c = c1 * alpha/255 + c2 * (1 - alpha/255)
Multiplying throughout by 255 to work with only integers, we have:
c * 255 = c1 * alpha + c2 * (255 - alpha)
To get back c
, we divide by 255 in the final step. Here >> 8
was originally chosen for performance reasons (as the test mentions), so I kept it as it is.
Since c1
, c2
and alpha
each have a maximum value of 255 and their products are involved, we need to work in at least uint16_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaixiong thanks for elaborating, let me digest that more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaixiong make sense to me now! 🙏
61bdede
to
45bae1d
Compare
45bae1d
to
156039a
Compare
156039a
to
35ba3fb
Compare
@hartwork, any chance you could look at this again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hartwork, any chance you could look at this again?
@kaixiong I understood commit Core (LV::Video): Account for underflow/overflow in the alpha blending of 32-bit videos.
now — looks good! 👍
Some of the other parts — in particular MMX — I'll have to just trust you and the test suite with.
3 of the 6 commits could be worth squashing into their respective fix target if you like (I consider that optional and just an idea):
Core (LV::Video): Use the correct source alpha.
Core (Tests): Fix wrong argument order in calls to Video::get_pixel_ptr().
Core (Tests): Fix building of tests.
A rebase onto latest master
may also be nice (but optional), I don't expect any new conflicts.
What do you think?
uint16_t const c0 = static_cast<uint16_t> (src_pixel[0]) * src_alpha + static_cast<uint16_t> (dst_pixel[0]) * (255 - src_alpha); | ||
uint16_t const c1 = static_cast<uint16_t> (src_pixel[1]) * src_alpha + static_cast<uint16_t> (dst_pixel[1]) * (255 - src_alpha); | ||
uint16_t const c2 = static_cast<uint16_t> (src_pixel[2]) * src_alpha + static_cast<uint16_t> (dst_pixel[2]) * (255 - src_alpha); | ||
|
||
destbuf[0] = (alpha * (srcbuf[0] - destbuf[0]) >> 8) + destbuf[0]; | ||
destbuf[1] = (alpha * (srcbuf[1] - destbuf[1]) >> 8) + destbuf[1]; | ||
destbuf[2] = (alpha * (srcbuf[2] - destbuf[2]) >> 8) + destbuf[2]; | ||
dst_pixel[0] = c0 >> 8; | ||
dst_pixel[1] = c1 >> 8; | ||
dst_pixel[2] = c2 >> 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaixiong make sense to me now! 🙏
3f21b65
to
41c799d
Compare
41c799d
to
8cbfb5c
Compare
Thank you! I have squashed the commits down to 3. However.. I have also reverted the formulas in
So to keep things simple for everyone, I'm sticking to the original formulas and calling the changes a clean-up instead of a bug-fix. What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaixiong good point about integer promotion! If we stick to the original formula, let's add a comment with the intended math, as it takes too long to understand without something like it. What do you think?
…_alphasrc() per Sebastian (hartwork)'s suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaixiong thank's for adding the comment. Approving to the extent that I can comprehend, e.g. not the parts with MMX. Let's get this merged 👍
This is a rewrite of the buggy 32-bit LV::Video alpha blending code to deal with arithmetic underflows/overflows and the use of an uninitialized register in the MMX implementation (#230).
Take note that GCC/Clang x86-64 (recent only?) produces SSE instructions instead of MMX. GCC 12.2 uses the XMM registers while Clang sticks to MM but throwing in the use of
pshuflw
.Perhaps it's time to move on and use SSE2 (introduced in 2000 to Pentium 4s) to work with 2 pixels at once. Or maybe even 4 pixels at once. This will require larger memory alignments and complicate the code a bit more to work with non-divisible row widths.
Here is a link to the plain C and SIMD code in Godbolt.