-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to pinwheel algorithm, should work on any size and should also be faster #4185
base: 0_15
Are you sure you want to change the base?
Conversation
DedeHai
commented
Oct 7, 2024
- should now work on larger setups, tested at vaious sizes from 8x8 - 32x32 and on 128x16
- Increased fixed point shift to 15bit
- fixed 'pixel holes' on large sizes
- speed should be improved too
- consolidated code to save flash (about -300 bytes)
- reduced 'pixel stepping' to 50% steps to fix pixel holes (increases number of loops but it has no measurable impact on speed)
- should now work on larger setups, tested at vaious sizes from 8x8 - 32x32 and on 128x16 - Increased fixed point shift to 15bit - fixed 'pixel holes' on large sizes - speed should be improved too - consolidated code to save flash (about -300 bytes) - reduced 'pixel stepping' to ~60% steps to fix pixel holes (increases number of loops but it has no measurable impact on speed)
- tuned parameters for 'no holes' there may be room for improvement, I did not find any.
@Brandon502 please check if this is better or worse |
@DedeHai With the latest commit I am see way more holes compared to your first version. 24x24 has holes and large matrices have quite a few. I struggled finding holes on your first version. |
wled00/FX_fcn.cpp
Outdated
int y = posy >> Fixed_Shift; | ||
if (x != lastX || y != lastY) { // only paint if pixel position is different | ||
currentR++; | ||
int requiredsteps = currentR * 8; // empirically found value (R is a rough estimation, always >= actual R) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Brandon502 can you try with requiredsteps
set to 9 instead of 8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried 9, 10, and 12. I haven't seen any holes using 12 up to 64x64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the larger the number, the more overdraws and lower FPS, there are two more parameters to tweak, I tried a few options comparing holes vs FPS and current state was my best finding on 32x32 but did not check many other sizes
wled00/FX_fcn.cpp
Outdated
// set pixel | ||
if (x != lastX || y != lastY) setPixelColorXY(x, y, col); // only paint if pixel position is different | ||
while ((posx < maxX) && (posy < maxY)) { | ||
int x = posx >> Fixed_Shift; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you check that shift works properly on signed integer?
The reason I initially used multiplication and division is: shifting a signed integer is undefined (or compiler defined) behaviour - depending on architecture you might end with either an arithmetic shift (preserving sign) or with a logical shift (deleting sign).
gcc is able to detect that mult/div by a constant with power of two can be done with a shift. But the compiler will pick the proper (arithmetic) shift.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is good info. I did notice a difference in negative and positive values (negative having less pixel holes) I will try with multiplications again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I sounded a bit like "lecturing". It's a lesson I've learned some time ago, after spending days to understand why negative values suddenly changed sign.....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
. I did notice a difference in negative and positive values (negative having less pixel holes)
This could also be an issue with rounding - negative values "jump" to a different side when cutting away the fraction part. So possibly there are more "off by 1" cases for negative values.
wled00/FX_fcn.cpp
Outdated
posx = (vW - 1) << (Fixed_Shift - 1); // X starting position = center (in fixed point) | ||
posy = (vH - 1) << (Fixed_Shift - 1); // Y starting position = center (in fixed point) | ||
inc_x = cosVal * (1 << Fixed_Shift); // X increment per step (fixed point) | ||
inc_y = sinVal * (1 << Fixed_Shift); // Y increment per step (fixed point) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as below - generally it's better to avoid shifting signed integers.
And for me, "1 << bits" is harder to read than "32768" which is a well-known power of two.
I would propose to stay with my original implementation, using the scaling factor instead of shifting values around.
You could even translate one into the other: constexpr int fixed_scale = 1U << fixed_shift;
. The "U" is important here to stay within "well-defined behaviour".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: this page from Microsoft explains the problem: https://learn.microsoft.com/en-us/cpp/cpp/left-shift-and-right-shift-operators-input-and-output?view=msvc-170
I think that c++20 (the c++ language definition from 2020) finally ended this by stating "Right-shift on signed integral types is an arithmetic right shift, which performs sign-extension.". But for older compilers, I would still not rely on "what happens when I shift a negative value".
Some more explanation is here: https://stackoverflow.com/questions/7522346/right-shift-and-signed-integer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read shifts easier than large numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then maybe constexpr int fixed_scale = 1U << fixed_shift;.
is a compromise that keeps both of us happy 😉 sorry again for me going into "lecture mode".
wled00/FX_fcn.cpp
Outdated
if (maxXY <= Pinwheel_Size_Big) return Pinwheel_Steps_Big; | ||
// else | ||
return Pinwheel_Steps_XL; | ||
return maxXY * 5; // theoretical value sqrt(2) * pi (1 pixel step in the corner) Note: (maxXY * 9)/2 works on most sizes but not all, there is room for speed optimization here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Brandon502 you can tweak the 5 here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we learned when we were optimizing before was that Esp32 and S3 performed differently with rounding. I was able to have lower number of rays on my setup with a esp32 but @softhack007 had holes in his S3 setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I remember.
Actually I could test with my 128x64 (hub75) -S3 setup once you have good parameters - just tell me where to tweak, and give me a few days (a lot of work@job coming this week).
how do you get the full preview? peek is always reduced for me. |
You need to change the "max preview pixels" in the code: Line 172 in 7deea9e
|
well that was easy. |
I spent way too much time checking different options ;)
So there always will be a trade-off between looks and performance. |
If you'd be using Bresenham's algorithm your gaps would've been known in advance. |
I don't think Bresenhams algorithm can solve the issue. I briefly looked at it in the beginning and while it is really good at efficiently drawing a line, it does not solve the over-drawing issue, the prevention of which is the actual cause for the gaps: it is easy to just fill the whole matrix, with lines that doe not leave gaps, but that leads to something like a 3.5x overdraw. Current algorithm brings it down to about 2.2x. If my latest Ideas work, it will bring it down to 1.9x. |
As we already know, painting a pixel is expensive. Iterating through an array of "painted" pixels is not. |
if having an array as a "buffer", it would even be easier to pre-calculate a map. This was suggested earlier by @Brandon502 but it eats quite a lot of ram, especially if doing a large matrix, like 32x32 and larger. If ram can be expended, this would by far be the best and fastest solution. If matrix size is restricted to 256x256, a byte array would be sufficient. |
Why not just use "mean" value of the surrounding pixels? The visual "error" would be the least. Which is more important than "actual" color rendered at the missing pixel. Getting "mean" value may be expensive (9x call to getPixelColorXY) if no other measures are taken. EDIT: Memory requirements are 4kB for 64x64 matrix if using unpacked |
using packed bits to keep track of painted pixels had passed my mind at one point. thanks for the ideas, that may get the best of both worlds: live painting with no over-painting with minimal use of ram. |
4k might be a lot for stack but 512 bytes is viable as it adds no overhead. If you are talking about 128x128 matrix (largest possible ATM without code change) that's still 2k which is somewhat safe to put on stack. But I would not go beyond 2k. |
I think it might become tricky to get this buffer into the stack - the stack is cleared when you leave the function - so it might be needed to already allocate the buffer in strip.service() and find a way to pass down the pointer until we arrive in segment::setPixelColor. The other problem is you might need "alloca". Because a definition like |
@Brandon502 The MM version of drawLine has an additional option to skip parts of the line (added for the new GEQ3D effect). Maybe this could be useful to skip inner parts of a line (like the previous odd-rays approach). Should be easy to port to upstream |
I took Bresenham's algorithm from drawLine. Right now a single ray does not overwrite any pixels. But each ray slightly overlaps the next. Since rays are much thicker a lot of the effects look much better in my opinion. And getPixelColor is likely much more accurate. |
BTW & FYI if you'd fork AC and add MM as a remote, you'd have no problem of pulling and pushing into both (provided you have write access) or submitting PR into both without merge issues. As for the development (and thinking) I am happy that it has turned this way. BTW the banding above is from a "still shot" it may be invisible when moving. As an optimisation for drawing: if you can determine the length of a ray you can draw shorter rays for those that overlap. |
@softhack007 et. al. I will steal this post to urge you to test 0_15 ABL so we can release b6 and soon after RC or 0.15 |
Looks really good and may likelly to be the better approach - got any code to share? I would like to compare it my current state |
I pushed it on this branch. Brandon502@761e2d7 still really rough, getPixelColor isn't working correctly on all sizes. |
- parameters (constexpr) are a compromise between "no zero pixels on all sizes" and speed - for 8x8, 16x16, 32x32 there are parameters with better optimization, improving speed above current 0.15 state - there is more room to tweak the 'center skipping' on larger sizes
- using Bresenham's line algorithm and block filling - on-the-fly angle calculation - bit map to avoid overdraw
change in mode_running_random() (FX Stream) may actually be a bugfix, still removing it again
I think I got it now, may need some fin-tuning. Only tested on S3. |
@DedeHai I may have fixed overdraw issue and the bit array may not be needed. 64x64 I had about 100 overdraws. Rectangles will likely have more than squares but need to test it more. Will also use your optimized calculations when I'm able to test later today. If I'm lucky gPC may work correctly but I'm not hopeful. |
Hi @Brandon502, in case you can't get gPC to work in all cases, I'd say don't spend too much time with it. To me it still looks like adding a dedicated pixel buffer for pinwheel (and maybe other mapping modes) is the way to go. We can address that topic in a new PR. |
I merged @DedeHai's optimizations in my branch and tweaked a few things. Removed bitmap it should no longer be needed. Added redraw avoidance which reduces redrawing a ton. Only 32 extra draws on 64x64 rainbow. This varies slightly on effects and panel shape. Here's latest loop/s on esp32. Not the biggest improvement from a few days ago, but no more floats or overdrawing may help other chips.
I changed dynamic ray numbers to static for testing on an actual panel. Latest changes here: https://github.com/Brandon502/WLED/tree/pinwheel2 @softhack007 this should be good to test on your setup when you have time. Hopefully no holes appear due to rounding differences. If 72 rays is too few for 64x64 you can increase to 90 or swap to Dedhai's dynamic function which I left in. |
@Brandon502 the calculation improvements I made (and the avoiding floats) will mostly affect ESP32C3 and ESP8266 as they do floats in software. |
Well heap fragmentation is generally problematic, but it really depends on the size of blocks and on the frequency of malloc/free. We have several features that contribute to fragmentation. My esp32 often becomes unstable (random crash) when max used head goes above 80%. So I'd recommend if you can do something against fragmentation (like holding a block as long as possible, or creating memory pools like effect data) then it's worth a try. |
I mainly tested on 64x64 but overdraw seems pretty dependent on how many rays there are. 72 seems perfect, hopefully it doesn't have banding on actual display. But once you put 100+ rays overdraw is minimized. I tried 720 as an extreme test and it didn't overdraw much.
I tried second + infill and it was worse. I think two dots doesn't necessarily over draw since the rays are apart there is no overlapping. They just might be slightly wider if there are no surrounding rays if you consider that overdraw. I tested on a lot of effects and haven't see any crazy draw numbers, although bitmap would obviously always be perfect.
I compared a few effects between our different versions and I can't really say one looks better or worse. They're probably not identical but they both look good to me. Every size seems to work on mine on esp32, can't speak for others chips.
When I tried your version DJ Lights doesn't work on 32x32 and 64x64 only sizes I tried. I lowered your step factor to 1 which works on square matrices, but rectangles easily breaks it. I don't think gPC is ever going to work on rectangles unless you have giant bands. |
Went through almost all the effects and there were quite a few that did draw multiple times in the same frame which breaks the bitmap approach. Also did weird things on my version before fixing. @blazoncek Not sure if this is a known bug, but effects like scanner and lighthouse don't always hit the edge of the strip. This affects all expand options but is really noticeable on pinwheel since the edge in going across the center. |
@Brandon502 did you note which FX draw multiple times? |
Some effects are weird! Android is one of them. 😄 |
This was one. Effects that draw the background as a palette then draw over that, Chase and Sparkle I believe, there were a few others I'm forgetting.
Pretty sure it is a flaw. Some loops are completely fine, then one will skip the last/first pixel. |
Tested on a 128x64 HUB75 display and can confirm no holes and 30+ fps on rainbow. Low ray count looks similar to high ray count on gradient type effects. The main down side of lower ray count is movement is not quite as smooth, but the upside is effects like bouncing balls look much better. 20241016_113837.mp4Left super high ray count (720) Right lower ray count (72) |
@Brandon502 can a ray count be user selectable? |
I don't know how to add to the UI, but that would definitely work. Just need to have a minimum of 4. All values above that work, multiples of 8 are the most efficient in terms of drawing. |
@Brandon502 looks to me like your latest approach is the best compromise between overdraw and FX compatibility. Having overdraw for some FX is acceptable as its still better than the current 0.15 variant. Is your repo up to date? |
Just pushed one last change. It seems your setPinwheelParameters function doesn't always distribute angles between each step to completely wrap around the matrix in certain cases. Likely due to rounding errors. This didn't pop up until I made ray count dynamic to vW/vH, capped at 90 there was never a gap. Here's an example of 128x64 with 136 rays. The final ray doesn't quite wrap back around to 0 leaving certain pixels off. This can get better/worse with different ray counts. I made a change to ensure the final ray always wraps back to 0, you may know a better way to make sure all angles are used. This method can create a noticeable wider band at the end if using a super high ray count (user adjustable if implemented). With that change I'm not seeing any holes on weird matrix sizes like 127x57 / 128x5. Edit: Found a better fix. Just needed to round base angle and everything is working better. |
Thx @Brandon502, this looks really good. I did notice that gap in one of my versions and solved it by simply increasing the angle a little, your fix is a better approach. |
@DedeHai That's actually a tricky question. Basicially you could use On esp32 there is |
@DedeHai @Brandon502 another option - at least for esp32 - could be a static array of bytes with a fixed size (say 2Kb) as "scratch area". These 2K would come from the global ".BSS" section. The memory could be used locally by any function, provided that the function does not call other functions that need the same "scratch area". |
I like the idea of a "reserved" scratch area in ram, this would certainly help with fragmentation and it could also be used elsewhere. Spinning this a little further: why not use the buffer that is already reserved for FX? i.e. extend the ram management of that with the limitation that ram can only be used during "FX rendering" so if FX extend the buffer, nothing clashes and pointer management can be simple. I did not think this through all the way, just from the top of my head. But in general: the FX have more than enough memory available, especially in 1D with a short virtual width, but management of said memory needs to be implemented. |
@DedeHai How would your color buffer work for effects that only draw a few rays per frame? |
good point. I guess it has to be a persistent buffer per segment so my initial idea would not work but a buffer pointer per segment stored in segment data would work. |
I haven't looked at the new code yet ... could this be the same buffer as i suggested for getPixelColor ? I.e. all pixels needed for a "full turn" of the wheel? Or something separate from gPC? |
I think they would have to be separate. The buffer would need to be reset each frame and gPC would have to persist. Would it be a bad idea to just modify the 1D effects that use gPC and have them store colors in effect data? I think there are only around 7 effects, 5 of which are audio. Could store colors if SEGLEN is say <360 then you wouldn't need gPC for pinwheel/arc. |