Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to pinwheel algorithm, should work on any size and should also be faster #4185

Open
wants to merge 5 commits into
base: 0_15
Choose a base branch
from

Conversation

DedeHai
Copy link
Collaborator

@DedeHai DedeHai commented Oct 7, 2024

  • should now work on larger setups, tested at vaious sizes from 8x8 - 32x32 and on 128x16
  • Increased fixed point shift to 15bit
  • fixed 'pixel holes' on large sizes
  • speed should be improved too
  • consolidated code to save flash (about -300 bytes)
  • reduced 'pixel stepping' to 50% steps to fix pixel holes (increases number of loops but it has no measurable impact on speed)

- should now work on larger setups, tested at vaious sizes from 8x8 - 32x32 and on 128x16
- Increased fixed point shift to 15bit
- fixed 'pixel holes' on large sizes
- speed should be improved too
- consolidated code to save flash (about -300 bytes)
- reduced 'pixel stepping' to ~60% steps to fix pixel holes (increases number of loops but it has no measurable impact on speed)
- tuned parameters for 'no holes'

there may be room for improvement, I did not find any.
@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 7, 2024

@Brandon502 please check if this is better or worse

@Brandon502
Copy link

@DedeHai With the latest commit I am see way more holes compared to your first version. 24x24 has holes and large matrices have quite a few. I struggled finding holes on your first version.

int y = posy >> Fixed_Shift;
if (x != lastX || y != lastY) { // only paint if pixel position is different
currentR++;
int requiredsteps = currentR * 8; // empirically found value (R is a rough estimation, always >= actual R)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Brandon502 can you try with requiredsteps set to 9 instead of 8?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried 9, 10, and 12. I haven't seen any holes using 12 up to 64x64.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the larger the number, the more overdraws and lower FPS, there are two more parameters to tweak, I tried a few options comparing holes vs FPS and current state was my best finding on 32x32 but did not check many other sizes

@Brandon502
Copy link

Brandon502 commented Oct 7, 2024

Wavesin pattern looks pretty nice with your method of reducing center writes instead of jumping.

// set pixel
if (x != lastX || y != lastY) setPixelColorXY(x, y, col); // only paint if pixel position is different
while ((posx < maxX) && (posy < maxY)) {
int x = posx >> Fixed_Shift;
Copy link
Collaborator

@softhack007 softhack007 Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check that shift works properly on signed integer?

The reason I initially used multiplication and division is: shifting a signed integer is undefined (or compiler defined) behaviour - depending on architecture you might end with either an arithmetic shift (preserving sign) or with a logical shift (deleting sign).
gcc is able to detect that mult/div by a constant with power of two can be done with a shift. But the compiler will pick the proper (arithmetic) shift.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is good info. I did notice a difference in negative and positive values (negative having less pixel holes) I will try with multiplications again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I sounded a bit like "lecturing". It's a lesson I've learned some time ago, after spending days to understand why negative values suddenly changed sign.....

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. I did notice a difference in negative and positive values (negative having less pixel holes)

This could also be an issue with rounding - negative values "jump" to a different side when cutting away the fraction part. So possibly there are more "off by 1" cases for negative values.

posx = (vW - 1) << (Fixed_Shift - 1); // X starting position = center (in fixed point)
posy = (vH - 1) << (Fixed_Shift - 1); // Y starting position = center (in fixed point)
inc_x = cosVal * (1 << Fixed_Shift); // X increment per step (fixed point)
inc_y = sinVal * (1 << Fixed_Shift); // Y increment per step (fixed point)
Copy link
Collaborator

@softhack007 softhack007 Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as below - generally it's better to avoid shifting signed integers.
And for me, "1 << bits" is harder to read than "32768" which is a well-known power of two.

I would propose to stay with my original implementation, using the scaling factor instead of shifting values around.

You could even translate one into the other: constexpr int fixed_scale = 1U << fixed_shift;. The "U" is important here to stay within "well-defined behaviour".

Copy link
Collaborator

@softhack007 softhack007 Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: this page from Microsoft explains the problem: https://learn.microsoft.com/en-us/cpp/cpp/left-shift-and-right-shift-operators-input-and-output?view=msvc-170

I think that c++20 (the c++ language definition from 2020) finally ended this by stating "Right-shift on signed integral types is an arithmetic right shift, which performs sign-extension.". But for older compilers, I would still not rely on "what happens when I shift a negative value".

Some more explanation is here: https://stackoverflow.com/questions/7522346/right-shift-and-signed-integer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read shifts easier than large numbers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then maybe constexpr int fixed_scale = 1U << fixed_shift;.  is a compromise that keeps both of us happy 😉 sorry again for me going into "lecture mode".

if (maxXY <= Pinwheel_Size_Big) return Pinwheel_Steps_Big;
// else
return Pinwheel_Steps_XL;
return maxXY * 5; // theoretical value sqrt(2) * pi (1 pixel step in the corner) Note: (maxXY * 9)/2 works on most sizes but not all, there is room for speed optimization here
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Brandon502 you can tweak the 5 here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we learned when we were optimizing before was that Esp32 and S3 performed differently with rounding. I was able to have lower number of rays on my setup with a esp32 but @softhack007 had holes in his S3 setup.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I remember.

Actually I could test with my 128x64 (hub75) -S3 setup once you have good parameters - just tell me where to tweak, and give me a few days (a lot of work@job coming this week).

@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 8, 2024

Wavesin pattern looks pretty nice with your method of reducing center writes instead of jumping.

how do you get the full preview? peek is always reduced for me.

@softhack007
Copy link
Collaborator

how do you get the full preview? peek is always reduced for me.

You need to change the "max preview pixels" in the code:

const size_t MAX_LIVE_LEDS_WS = 1024U;

@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 8, 2024

well that was easy.
I have a few more ideas I want to test, so this will come in handy. I have it working at small sizes but need to test larger ones. Changes should fix the missing pixel issues hopefully on all resolutions while still reducing pixels drawn.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 9, 2024

I spent way too much time checking different options ;)
Bottom line is:

  • radius based algorithms give a better look in the center but add calculation overhead for each step, making it slow
  • I was able to get close to the performance of the current version (skipping odd center rays) but tuning it such that it works on all sizes failed so far. there is always a compromise to increase the number of angular steps, slowing it down so in the end it was about 20% slower
  • Currently looking at a version that keeps the odd ray skipping, adding some more center skips and calculating steps on the fly (instead of using the "magic table")

So there always will be a trade-off between looks and performance.
btw: I wrote a code snippet (thx ChatGPT) to evaluate different algorithms over all sizes, printing out number of pixels drawn, number of zero pixels and number of 'dead loops' (no pixels drawn) for any square size. Current work in progress: optimize for 8x8, 16x16, 32x32 as these are most common and use more generic parameters that work on any other size. If all goes well, there should be up to 20% speed boost.
Let me know if you want me to put that test-code here.

@blazoncek
Copy link
Collaborator

If you'd be using Bresenham's algorithm your gaps would've been known in advance.
Just a thought.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 9, 2024

I don't think Bresenhams algorithm can solve the issue. I briefly looked at it in the beginning and while it is really good at efficiently drawing a line, it does not solve the over-drawing issue, the prevention of which is the actual cause for the gaps: it is easy to just fill the whole matrix, with lines that doe not leave gaps, but that leads to something like a 3.5x overdraw. Current algorithm brings it down to about 2.2x. If my latest Ideas work, it will bring it down to 1.9x.

@blazoncek
Copy link
Collaborator

As we already know, painting a pixel is expensive. Iterating through an array of "painted" pixels is not.
Deducting coordinates/angles from unpainted pixels (usually not that many of them) is straightforward.
Why paint 1.9x when you can do it only once? The expense would be RAM for an array of "painted" pixels and a few "reverse" calculations of "unpainted" pixels.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 9, 2024

if having an array as a "buffer", it would even be easier to pre-calculate a map. This was suggested earlier by @Brandon502 but it eats quite a lot of ram, especially if doing a large matrix, like 32x32 and larger. If ram can be expended, this would by far be the best and fastest solution. If matrix size is restricted to 256x256, a byte array would be sufficient.
Edit:
thinking about it, a map may also not be so easy to define.
maybe need to explain, why it is a bit complicated to not overpaint:
the function gets handed a single color value, not knowing anything about the future and the past it needs to convert a single point in a 1D array into a polar coordinate line. so where to start painting is the question that is unsolved. With just a 'stupid' line drawing algorithm for example, the center pixel gets painted many many times (I counted over 80 times at 32x32). Now if you skip painting it, which 1D position actually is priviledged to paint it? and what if that pixel is not set by the FX, will it stay black or does another one 'inherit' ? and how to know all that just by getting passed a color and a 1D position...
Edit2:
to efficiently translate, it would actually need to be an array containing variable length arrays as for each 1D coordinate, there are a certain number of X/Y coordinates to be painted and that number is not constant. But I am sure, someone already thought of a good solution as mapping to cartesian coordinates is an every-day kind of problem.

@blazoncek
Copy link
Collaborator

blazoncek commented Oct 9, 2024

Why not just use "mean" value of the surrounding pixels? The visual "error" would be the least. Which is more important than "actual" color rendered at the missing pixel. Getting "mean" value may be expensive (9x call to getPixelColorXY) if no other measures are taken.
I am just tossing ideas. I have not tried anything mentioned above.

EDIT: Memory requirements are 4kB for 64x64 matrix if using unpacked bool values. If using packed bits (at the expense of extracting each bit of information) that's only 512 bytes.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 9, 2024

using packed bits to keep track of painted pixels had passed my mind at one point. thanks for the ideas, that may get the best of both worlds: live painting with no over-painting with minimal use of ram.
I am just asking myself, how to handle such a buffer. It should be dynamic so heap or stack? and how to prevent fragmentation with many segments using this?

@blazoncek
Copy link
Collaborator

4k might be a lot for stack but 512 bytes is viable as it adds no overhead. If you are talking about 128x128 matrix (largest possible ATM without code change) that's still 2k which is somewhat safe to put on stack. But I would not go beyond 2k.

@softhack007
Copy link
Collaborator

softhack007 commented Oct 9, 2024

It should be dynamic so heap or stack?

I think it might become tricky to get this buffer into the stack - the stack is cleared when you leave the function - so it might be needed to already allocate the buffer in strip.service() and find a way to pass down the pointer until we arrive in segment::setPixelColor.

The other problem is you might need "alloca". Because a definition like bytes drawn[seg.width * seg.height / 8] will go to the heap (dynamically sized array) while bytes drawn[2048] will go into the stack.

@Brandon502
Copy link

Brandon502 commented Oct 9, 2024

I reworked it with a different idea. Uses Bersenham's algorithm and doesn't use memory. Haven't redid getPixelColor or found the perfect amount of rays per size yet. Can likely optimize quite a bit more since my C++ isn't great.

36 rays
64x64 using 36 rays. Can see a bit of banding.
72 rays
64x64 using 72 rays. Seems decent, needs to be testing on physical setup.

My fork is from MM so these numbers are based off of that. But original pinwheel on MM build esp32_4MB_S
64x64 8 pins 512 ws2812b ran at 20 loops/s. 72 rays with new method is at 25 loops/s. Edit: Pixels for comparison is 26 loops/s. Bar is 30 loops/s

Haven't found any holes. Can likely improve this a bit further.

@softhack007
Copy link
Collaborator

Haven't found any holes. Can likely improve this a bit further.

@Brandon502 The MM version of drawLine has an additional option to skip parts of the line (added for the new GEQ3D effect). Maybe this could be useful to skip inner parts of a line (like the previous odd-rays approach).

Should be easy to port to upstream
https://github.com/MoonModules/WLED/blob/98bdbd1eb2e3c6a07a17a7f966cf1ec0cb83804b/wled00/FX_2Dfcn.cpp#L783-L795

@Brandon502
Copy link

Brandon502 commented Oct 9, 2024

Haven't found any holes. Can likely improve this a bit further.

@Brandon502 The MM version of drawLine has an additional option to skip parts of the line (added for the new GEQ3D effect). Maybe this could be useful to skip inner parts of a line (like the previous odd-rays approach).

Should be easy to port to upstream https://github.com/MoonModules/WLED/blob/98bdbd1eb2e3c6a07a17a7f966cf1ec0cb83804b/wled00/FX_2Dfcn.cpp#L783-L795

I took Bresenham's algorithm from drawLine. Right now a single ray does not overwrite any pixels. But each ray slightly overlaps the next. Since rays are much thicker a lot of the effects look much better in my opinion. And getPixelColor is likely much more accurate.
Prev Noise4:
noise4orig
New Noise4:
noise4rays
Wavesins no longer has the moirè pattern though.
Screenshot 2024-10-09 144310

@blazoncek
Copy link
Collaborator

BTW & FYI if you'd fork AC and add MM as a remote, you'd have no problem of pulling and pushing into both (provided you have write access) or submitting PR into both without merge issues.

As for the development (and thinking) I am happy that it has turned this way. BTW the banding above is from a "still shot" it may be invisible when moving.

As an optimisation for drawing: if you can determine the length of a ray you can draw shorter rays for those that overlap.

@blazoncek
Copy link
Collaborator

@softhack007 et. al. I will steal this post to urge you to test 0_15 ABL so we can release b6 and soon after RC or 0.15
Then we can start merging these improvements into beta of 0.15.1

@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 10, 2024

I took Bresenham's algorithm from drawLine. Right now a single ray does not overwrite any pixels. But each ray slightly overlaps the next. Since rays are much thicker a lot of the effects look much better in my opinion. And getPixelColor is likely much more accurate.

Looks really good and may likelly to be the better approach - got any code to share? I would like to compare it my current state

@Brandon502
Copy link

I took Bresenham's algorithm from drawLine. Right now a single ray does not overwrite any pixels. But each ray slightly overlaps the next. Since rays are much thicker a lot of the effects look much better in my opinion. And getPixelColor is likely much more accurate.

Looks really good and may likelly to be the better approach - got any code to share? I would like to compare it my current state

I pushed it on this branch. Brandon502@761e2d7 still really rough, getPixelColor isn't working correctly on all sizes.

- parameters (constexpr) are a compromise between "no zero pixels on all sizes" and speed
- for 8x8, 16x16, 32x32 there are parameters with better optimization, improving speed above current 0.15 state
- there is more room to tweak the 'center skipping' on larger sizes
@softhack007 softhack007 added the optimization re-working an existing feature to be faster, or use less memory label Oct 11, 2024
- using Bresenham's line algorithm and block filling
- on-the-fly angle calculation
- bit map to avoid overdraw
change in mode_running_random() (FX Stream) may actually be a bugfix, still removing it again
@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 11, 2024

I think I got it now, may need some fin-tuning. Only tested on S3.
No more floats (sin16/cos16 can later be replaced by my new version)
Optimized for speed as much as possible.
Putting the bit mapping to fix overdrawing on the stack leads to crashes on larger sizes, maybe someone knows why (write access fault) on heap it works.

@Brandon502
Copy link

@DedeHai I may have fixed overdraw issue and the bit array may not be needed. 64x64 I had about 100 overdraws. Rectangles will likely have more than squares but need to test it more. Will also use your optimized calculations when I'm able to test later today. If I'm lucky gPC may work correctly but I'm not hopeful.

@softhack007
Copy link
Collaborator

If I'm lucky gPC may work correctly but I'm not hopeful.

Hi @Brandon502, in case you can't get gPC to work in all cases, I'd say don't spend too much time with it. To me it still looks like adding a dedicated pixel buffer for pinwheel (and maybe other mapping modes) is the way to go. We can address that topic in a new PR.

@Brandon502
Copy link

Brandon502 commented Oct 12, 2024

I merged @DedeHai's optimizations in my branch and tweaked a few things. Removed bitmap it should no longer be needed. Added redraw avoidance which reduces redrawing a ton. Only 32 extra draws on 64x64 rainbow. This varies slightly on effects and panel shape. Here's latest loop/s on esp32. Not the biggest improvement from a few days ago, but no more floats or overdrawing may help other chips.

Size MM Latest
64x64 20 28
32x32 83 113

I changed dynamic ray numbers to static for testing on an actual panel. Latest changes here: https://github.com/Brandon502/WLED/tree/pinwheel2 @softhack007 this should be good to test on your setup when you have time. Hopefully no holes appear due to rounding differences. If 72 rays is too few for 64x64 you can increase to 90 or swap to Dedhai's dynamic function which I left in.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 12, 2024

@Brandon502 the calculation improvements I made (and the avoiding floats) will mostly affect ESP32C3 and ESP8266 as they do floats in software.
In your latest approach: couldchanging the ray width (or delta angle between the lines) from currently one full step to something a little smaller achieve something similar?
What I do not like about the approach tracking rays left and right: it only works, if the rays are drawn in consecutive order (which a lot but not all FX do). FX "Two Dots" for example jumps back and forth when drawing. Would it work, if you always skip drawing pixels of the first line, just drawing the second + infill?
On the other hand: the bit-mapping has the disadvantage, that each ray can only be drawn once. if there is FX (not sure there are any) that rely on overdrawing, they will not work.
Did you compare the looks of my latest version and your approach? Does your latest version also work well on smaller sizes and odd numbered width/height?
What I like about the variant using a bit-map: it is very insensitive to changing the number of rays and in general "always works". If ray width is increased from what I currently have, I think also the gPC should work. In my gPC test, the ray width was too small, the edge colors would sometimes overlap. The drawback of having to add a buffer is mostly to heap fragmentation, not sure how problematic that is (a small buffer is created before a segment is drawn, deleted afterwards).

@softhack007
Copy link
Collaborator

softhack007 commented Oct 12, 2024

The drawback of having to add a buffer is mostly to heap fragmentation, not sure how problematic that is (a small buffer is created before a segment is drawn, deleted afterwards).

Well heap fragmentation is generally problematic, but it really depends on the size of blocks and on the frequency of malloc/free. We have several features that contribute to fragmentation. My esp32 often becomes unstable (random crash) when max used head goes above 80%. So I'd recommend if you can do something against fragmentation (like holding a block as long as possible, or creating memory pools like effect data) then it's worth a try.

@Brandon502
Copy link

In your latest approach: couldchanging the ray width (or delta angle between the lines) from currently one full step to something a little smaller achieve something similar?

I mainly tested on 64x64 but overdraw seems pretty dependent on how many rays there are. 72 seems perfect, hopefully it doesn't have banding on actual display. But once you put 100+ rays overdraw is minimized. I tried 720 as an extreme test and it didn't overdraw much.

What I do not like about the approach tracking rays left and right: it only works, if the rays are drawn in consecutive order (which a lot but not all FX do). FX "Two Dots" for example jumps back and forth when drawing. Would it work, if you always skip drawing pixels of the first line, just drawing the second + infill?

I tried second + infill and it was worse. I think two dots doesn't necessarily over draw since the rays are apart there is no overlapping. They just might be slightly wider if there are no surrounding rays if you consider that overdraw. I tested on a lot of effects and haven't see any crazy draw numbers, although bitmap would obviously always be perfect.

Did you compare the looks of my latest version and your approach? Does your latest version also work well on smaller sizes and odd numbered width/height? What I like about the variant using a bit-map: it is very insensitive to changing the number of rays and in general "always works".

I compared a few effects between our different versions and I can't really say one looks better or worse. They're probably not identical but they both look good to me. Every size seems to work on mine on esp32, can't speak for others chips.

If ray width is increased from what I currently have, I think also the gPC should work. In my gPC test, the ray width was too small, the edge colors would sometimes overlap.

When I tried your version DJ Lights doesn't work on 32x32 and 64x64 only sizes I tried. I lowered your step factor to 1 which works on square matrices, but rectangles easily breaks it. I don't think gPC is ever going to work on rectangles unless you have giant bands.

@Brandon502
Copy link

Brandon502 commented Oct 12, 2024

Here is probably the best way to visualize the differences. Main differences is with bitmap there are thicker lines on axis and the center isn't symmetrical.
72 rays no bitmap:
Screenshot 2024-10-12 121827
64*1.6 102 rays? with bitmap:
Screenshot 2024-10-12 121619
72 rays with bitmap:
image

Edit: Found the best ray options for no bitmap method and decreased over drawing further. On square matrices only one extra draw for effects that draw 0-n or reverse.

@Brandon502
Copy link

On the other hand: the bit-mapping has the disadvantage, that each ray can only be drawn once. if there is FX (not sure there are any) that rely on overdrawing, they will not work.

Went through almost all the effects and there were quite a few that did draw multiple times in the same frame which breaks the bitmap approach. Also did weird things on my version before fixing.

@blazoncek Not sure if this is a known bug, but effects like scanner and lighthouse don't always hit the edge of the strip. This affects all expand options but is really noticeable on pinwheel since the edge in going across the center.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 14, 2024

@Brandon502 did you note which FX draw multiple times?

@blazoncek
Copy link
Collaborator

Some effects are weird! Android is one of them. 😄
I did work on Scanner but I do not remember Lighthouse. If they do not hit the edge always then that's a flaw, otherwise it is a feature. IMO not all effects will look good in every expansion.

@Brandon502
Copy link

Some effects are weird! Android is one of them.

This was one. Effects that draw the background as a palette then draw over that, Chase and Sparkle I believe, there were a few others I'm forgetting.

I did work on Scanner but I do not remember Lighthouse. If they do not hit the edge always then that's a flaw, otherwise it is a feature.

Pretty sure it is a flaw. Some loops are completely fine, then one will skip the last/first pixel.

@Brandon502
Copy link

Brandon502 commented Oct 16, 2024

Tested on a 128x64 HUB75 display and can confirm no holes and 30+ fps on rainbow. Low ray count looks similar to high ray count on gradient type effects. The main down side of lower ray count is movement is not quite as smooth, but the upside is effects like bouncing balls look much better.

20241016_113837.mp4

Left super high ray count (720) Right lower ray count (72)

@blazoncek
Copy link
Collaborator

@Brandon502 can a ray count be user selectable?

@Brandon502
Copy link

@Brandon502 can a ray count be user selectable?

I don't know how to add to the UI, but that would definitely work. Just need to have a minimum of 4. All values above that work, multiples of 8 are the most efficient in terms of drawing.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 20, 2024

@Brandon502 looks to me like your latest approach is the best compromise between overdraw and FX compatibility. Having overdraw for some FX is acceptable as its still better than the current 0.15 variant. Is your repo up to date?
How can we make sure there are no holes on any selected ray number on any size? Users also use things like 8x256 or 7x33 for example. Manually checking all combos is not feasible.

@Brandon502
Copy link

Brandon502 commented Oct 20, 2024

@Brandon502 looks to me like your latest approach is the best compromise between overdraw and FX compatibility. Having overdraw for some FX is acceptable as its still better than the current 0.15 variant. Is your repo up to date? How can we make sure there are no holes on any selected ray number on any size? Users also use things like 8x256 or 7x33 for example. Manually checking all combos is not feasible.

Just pushed one last change. It seems your setPinwheelParameters function doesn't always distribute angles between each step to completely wrap around the matrix in certain cases. Likely due to rounding errors. This didn't pop up until I made ray count dynamic to vW/vH, capped at 90 there was never a gap.

Here's an example of 128x64 with 136 rays.

Screenshot_20241020_130958_Gallery

The final ray doesn't quite wrap back around to 0 leaving certain pixels off. This can get better/worse with different ray counts. I made a change to ensure the final ray always wraps back to 0, you may know a better way to make sure all angles are used. This method can create a noticeable wider band at the end if using a super high ray count (user adjustable if implemented). With that change I'm not seeing any holes on weird matrix sizes like 127x57 / 128x5.

Edit: Found a better fix. Just needed to round base angle and everything is working better.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 21, 2024

Thx @Brandon502, this looks really good. I did notice that gap in one of my versions and solved it by simply increasing the angle a little, your fix is a better approach.
@softhack007 the current version from Brandon would benefit from a pre-draw color array you proposed for gPC as drawing could always be done starting at 0 and moving up, many checks could be simplified or removed. So I would like to tackle that as well, then do a squash and have it all in a new PR. To create the color array, the same approach I used for the bit-map could be used (create before calling FX functions, delete after), preferably on stack but I have seen crashes, so we need to limit the maximum number of rays. I don't know at what buffer-size it will run into trouble, especially ESP8266 may be problematic. A buffer of 128 colors would be 512 bytes which may still work. Any suggestions on how to do it? Can free stack-size be checked? Putting it in heap works for sure but there is the fragmentation problem.

@softhack007
Copy link
Collaborator

softhack007 commented Oct 21, 2024

preferably on stack but I have seen crashes, so we need to limit the maximum number of rays. I don't know at what buffer-size it will run into trouble, especially ESP8266 may be problematic. A buffer of 128 colors would be 512 bytes which may still work. Any suggestions on how to do it? Can free stack-size be checked? Putting it in heap works for sure but there is the fragmentation problem.

@DedeHai That's actually a tricky question. Basicially you could use alloca(), or use a fixed-sized local array (variable length array is not an option as it would be allocated to the heap).
The problem is indeed available stack space. On esp32 the "loopTask" has 8K of stackframe, and typically 3-4K are unused. However sPC might get called from other contexts - async_tcp (web server task) or async_udp (e131 receiver task) in particular. Both tasks have a very low amount of free stack ( around 1K) , and async_tcp is always at the limit. I don't know about 8266 stack frames - maybe @willmmiles has an idea?

On esp32 there is uxTaskGetStackHighWaterMark(NULL) but this tells you how much stack memory was used at max since startup.

@softhack007
Copy link
Collaborator

@DedeHai @Brandon502 another option - at least for esp32 - could be a static array of bytes with a fixed size (say 2Kb) as "scratch area". These 2K would come from the global ".BSS" section. The memory could be used locally by any function, provided that the function does not call other functions that need the same "scratch area".

@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 21, 2024

I like the idea of a "reserved" scratch area in ram, this would certainly help with fragmentation and it could also be used elsewhere. Spinning this a little further: why not use the buffer that is already reserved for FX? i.e. extend the ram management of that with the limitation that ram can only be used during "FX rendering" so if FX extend the buffer, nothing clashes and pointer management can be simple. I did not think this through all the way, just from the top of my head. But in general: the FX have more than enough memory available, especially in 1D with a short virtual width, but management of said memory needs to be implemented.
A separate scratch area is a bit tricky as it would need proper management to ensure no overlaps. could be as simple as a 'occupied' flag though.

@Brandon502
Copy link

@DedeHai How would your color buffer work for effects that only draw a few rays per frame?

@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 21, 2024

@DedeHai How would your color buffer work for effects that only draw a few rays per frame?

good point. I guess it has to be a persistent buffer per segment so my initial idea would not work but a buffer pointer per segment stored in segment data would work.

@softhack007
Copy link
Collaborator

I guess it has to be a persistent buffer per segment so my initial idea would not work but a buffer pointer per segment stored in segment data would work.

I haven't looked at the new code yet ... could this be the same buffer as i suggested for getPixelColor ? I.e. all pixels needed for a "full turn" of the wheel? Or something separate from gPC?

@Brandon502
Copy link

Brandon502 commented Oct 21, 2024

I guess it has to be a persistent buffer per segment so my initial idea would not work but a buffer pointer per segment stored in segment data would work.

I haven't looked at the new code yet ... could this be the same buffer as i suggested for getPixelColor ? I.e. all pixels needed for a "full turn" of the wheel? Or something separate from gPC?

I think they would have to be separate. The buffer would need to be reset each frame and gPC would have to persist.

Would it be a bad idea to just modify the 1D effects that use gPC and have them store colors in effect data? I think there are only around 7 effects, 5 of which are audio. Could store colors if SEGLEN is say <360 then you wouldn't need gPC for pinwheel/arc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effect in progress This pull request is currently being reviewed and, if necessary, modified optimization re-working an existing feature to be faster, or use less memory
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants