Working SPI ISR optimizations with ~30% performance improvement #10
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No functional changes, only local optimizations in the
ShiftPWM.h
file.Performance changes...
Saves 2 cycles per bit by replacing the existing loop enclosing a sequence of 8 calls to
add_one_pin_to_byte()
with the singe functionsend_spi_bytes()
.Writing the whole send sequence in a single inline ASM allowed for explicit pre-decrement
indexing. This optimization did not happen naturally inside the loop because of compiler limitations.
Old emitted ASM per bit:
New emitted ASM per bit...
There is also a savings of 1 cycle per byte because the compiler redundantly compares the loop variable to zero after decrementing it.
Non-performance changes...
ShiftPWM_balanceLoad
) and then always adding this to thecounter
on each loop pass. This is performance neutral if the option is selected, and costs a single cycle per byte if it is not since the code would have been statically eliminated in the old version.These changes were motivated by keeping the ASM code clean and continuous. In order to allow static elimination based on a
const
variable, I would have either had to...send_spi_bytes()
function to cover each case of the options being selected. This is optimal, but ugly.Overall, I think the per-bit savings get the code fast enough that it is keeping up with the SPI hardware, so any additional savings would likely be wasted waiting for the SPI transmit to complete.
That said, if there was motivation to support 2X SPI mode, then there are some tricks we could use to keep up with that. Let me know if you think this is a relevant use case.
Ossilicope traces of the before and after outputs attached.
Thanks!
-josh