Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working SPI ISR optimizations with ~30% performance improvement #10

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bigjosh
Copy link

@bigjosh bigjosh commented Dec 20, 2015

No functional changes, only local optimizations in the ShiftPWM.h file.

Performance changes...

Saves 2 cycles per bit by replacing the existing loop enclosing a sequence of 8 calls to
add_one_pin_to_byte() with the singe function send_spi_bytes().

Writing the whole send sequence in a single inline ASM allowed for explicit pre-decrement
indexing. This optimization did not happen naturally inside the loop because of compiler limitations.

Old emitted ASM per bit:

        add_one_pin_to_byte(sendbyte, counter, --ledPtr);
     3e2:   20 81           ld  r18, Z
     3e4:   52 17           cp  r21, r18
     3e6:   47 95           ror r20
     3e8:   31 97           sbiw    r30, 0x01   ; 1

New emitted ASM per bit...

     3d6:   02 90           ld  r0, -Z
     3d8:   20 15           cp  r18, r0
     3da:   37 95           ror r19

There is also a savings of 1 cycle per byte because the compiler redundantly compares the loop variable to zero after decrementing it.

Non-performance changes...

  1. Using an EOR against a preloaded register rather than a NEG invert the output bits instead of a compare and branch. This is performance neutral if the option is selected, and costs a single cycle per byte if it is not since the code would have been statically eliminated in the old version.
  2. Preloading the load balancing step factor into a register (0 or 8 depending on ShiftPWM_balanceLoad) and then always adding this to the counter on each loop pass. This is performance neutral if the option is selected, and costs a single cycle per byte if it is not since the code would have been statically eliminated in the old version.

These changes were motivated by keeping the ASM code clean and continuous. In order to allow static elimination based on a const variable, I would have either had to...

  1. Break up the asm() into multiple parts. In my experience, this increases the chances that the compiler will mess up the emitted code, especially on older versions of the Arduino IDE.
  2. Make 4 versions of the send_spi_bytes() function to cover each case of the options being selected. This is optimal, but ugly.

Overall, I think the per-bit savings get the code fast enough that it is keeping up with the SPI hardware, so any additional savings would likely be wasted waiting for the SPI transmit to complete.

That said, if there was motivation to support 2X SPI mode, then there are some tricks we could use to keep up with that. Let me know if you think this is a relevant use case.

Ossilicope traces of the before and after outputs attached.
shiftpwm_rgb_example before
shiftpwm_rgb_example after

Thanks!

-josh

…o functional changes

Replaced the existing loop enclosing a sequence of 8 calls to
add_one_pin_to_byte() with the singe function send_spi_bytes(). Writing
the whole send sequence in a single inline ASM allowed for pre-decrement
indexing and some other smaller optimization in count checking and bit
inversion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant