Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap notes #11

Open
PaintYourDragon opened this issue May 22, 2020 · 3 comments
Open

Roadmap notes #11

PaintYourDragon opened this issue May 22, 2020 · 3 comments

Comments

@PaintYourDragon
Copy link
Contributor

This isn’t really an issue, but putting some future roadmap notes here publicly for anyone’s input if there are different approaches or better ideas.

  • Make an arch directory and split out each architecture from arch.h into its own file. I’d initially pursued this but instead kept it single-file as Murphy’s Law dictates someone will add a new architecture header file and wonder why it’s not working (there’s no wildcard #include). BUT, arch.h is already getting super unwieldy much quicker than expected, so at some point that directory will be added and arch.h will #include every file in it. This will mean editing two files for a new architecture instead of one, I’ll try to be super clear about that both in the guide and code.
  • Library is obscure enough that it might be a good guinea pig for testing “#pragma once” instead of oldschool include guards, see if it rains on anyone’s parade, in which case would switch back.
  • When splitting out architectures, make distinct files for SAMD21 and SAMD51. The two have a lot in common but next item will change that somewhat…
  • Even if an architecture supports an atomic bit-toggle register, might not want to use it. Reason being that (for most efficient memory use) this requires all 6 RGB bits AND the clock bit in the same byte (or word) of a PORT register. If no toggle register, the clock bit can be anywhere in the same PORT, making pin assignments easier. Bit-toggle is worth using on SAMD21 because the CPU clock is 48 MHz and each write op takes only 2 instructions this way…but on every single device faster than that, we’ve had to add NOP instructions to slow it down. So, skip the bit-toggles and some of the NOPs, have more pin freedom.
  • For similar reasons, don’t bother with loop unrolling on faster architectures. Aim for 12-16 MHz clock, anything faster and the matrix can’t keep up. “De-unrolling” the loops should implicitly throttle things back and require fewer NOP shenanigans.
@PaintYourDragon
Copy link
Contributor Author

Some notes regarding last two items (throttling software bitbang-out speed). Less an issue in the future if we start to optimize with DMA and timers, etc. But for now:

NOPs are probably good enough on SAMD21 (if even needed) and maybe nRF52840. Things get more complicated on faster devices, especially those where the CPU clock is configurable at compile time (e.g. M4, Teensy), as the number of cycles to delay is not constant across all configurations (requiring different NOP lists for different speeds), plus the number of NOPs gets really ridiculous on fast devices like Teensy 4. Some alternatives might include:

  • Writing a 0 to the clock PORT’s bit-set, -clear or -toggle register, IF the GPIO clock is distinct from the CPU core and constant regardless of F_CPU. Basically a GPIO-timed NOP (see the ESP32 code where something like this is done for other reasons, plus the gotcha that ESP32 must alternate between writes of the PORT set and clear registers if you want the thing to actually wait).
  • Small loop using a volatile variable that scales with F_CPU:
#define _PM_DELAYLET(n) \
  for(volatile uint32_t _PM_BLAT=F_CPU * n / _PM_SOME_CONSTANT, _PM_BLAT_--;);

Must be volatile as even casual optimizer settings recognize and remove do-nothing loops. _PM_SOME_CONSTANT would be an empirically-derived value (per architecture) that gives enough resolution to tune the bit-banging to the target frequency range, not an actual established time interval like micro- or nano-seconds.

Speaking of target frequency range, 12-16 MHz might be too conservative, and it might be OK to go a bit faster. Or not. Something to test with different matrices, wiring setups, etc.

@PaintYourDragon
Copy link
Contributor Author

Or
_PM_BLAT=(F_CPU * n + (_PM_SOME_CONSTANT / 2)) / _PM_SOME_CONSTANT
if you want to be persnickety and have it round up fractions.

In either case, everything’s a constant there and the preprocessor reduces this to a value; no math actually occurs when setting up the _PM_DELAYLET loop, it’s just a load instruction.

Variable and define names are hypothetical, just for demonstration purposes. Don’t actually use these, or they’ll stick just like the Protomatter name.

@PaintYourDragon
Copy link
Contributor Author

Oh also, don’t just go making an “arch” directory willy-nilly, that’ll break when compiling in Arduino IDE. Proper Layout™ for Arduino libs (and note of “arch” being deprecated) is described here (arch is probably fine if everything’s moved into a src subdirectory):
http://goo.gl/gfFJzU

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant