-
-
Notifications
You must be signed in to change notification settings - Fork 774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Accelerate BMPremote SPI data phase by removing inter-byte gaps #1946
base: main
Are you sure you want to change the base?
Conversation
4cf226f
to
a23c853
Compare
Rebased to main. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies this landed by the way side, done an initial review as with the revised platform configurations and such this is worth getting in for v2.0.
src/include/platform_support.h
Outdated
@@ -77,6 +77,7 @@ bool platform_spi_deinit(spi_bus_e bus); | |||
|
|||
bool platform_spi_chip_select(uint8_t device_select); | |||
uint8_t platform_spi_xfer(spi_bus_e bus, uint8_t value); | |||
void platform_spi_xfer_block(spi_bus_e bus, uint8_t *const data, size_t count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The const
here is correct on the buffer in the function definition itself, but should be dropped from this declaration per the clang-tidy
lint about useless const
.
@@ -67,9 +67,13 @@ void bmp_spi_read(const spi_bus_e bus, const uint8_t device, const uint16_t comm | |||
bmp_spi_setup_xfer(bus, device, command, address); | |||
/* Now read back the data that elicited */ | |||
uint8_t *const data = (uint8_t *const)buffer; | |||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're adding this new functionality, please just add it - if you want to preserve working with the old API then introduce a #define
in the platform header that can be tested for here to switch to the new implementation. This will then also fix the builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I felt like PLATFORM_HAS_SPI
and PLATFORM_HAS_SPI_BLOCKWISE
or so would be nice to add. The first macro would guard dummy impls in all but two platforms, the second macro would dispatch to calling a block xfer function instead of slow byte wise callchains.
But first I wanted to evaluate flash size increase from this feature.
return; | ||
} | ||
|
||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the benefit and drawback of these two approaches? Can the simpler more expressive loop get similar performance if interrupts are suspended with an atomic context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it can't, because it blocking-waits for the entire duration of 8/16-bit SPI word in https://github.com/libopencm3/libopencm3/blob/201f5bcfb3fa70ee34818152463e7139f24db377/lib/stm32/common/spi_common_all.c#L189-L190
But thanks to that it does not submit an extra word in flight to keep data pumping, and hence cannot miss an Rx byte.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fair enough - then drop the simpler loop please as there's no point keeping it in this new code.
* Existing implementation has to walk up and down the function stack per byte, which is fine for commands and general poking * 256-byte long page reads and writes can be accelerated because the length is known ahead of time * Keep a byte in flight on stm32f1/f4 SPI (this is simpler than IRQ or DMA)
f5f65a1
to
6a88dbf
Compare
Detailed description
Tested to increase
bmpflash read -b int
dump times from 35 to 31 seconds for a 8192 KiB w25q64 chip (using 12 MHz). The atomic section is used to block interrupts for 170 microseconds (256 byte read), otherwise my patch made the board hang reliably (no read timeouts). I may likely rewrite this once more using direct register manipulation as opposed to libopencm3 spi API usage. Short reads, like SFDP, indicate normal gaps between command bytes (I didn't change them) but no gaps in data page phase.The acceleration is achieved by keeping a byte (actually 8/16-bit SPI word) in flight behind the DR shadow register, which is how it is intended to be used. DMA bindings are harder and may result in channel/stream conflicts.
Your checklist for this pull request
Closing issues