-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce SPI overhead by removing extra calls #19123
Conversation
Bugfix 2.0.x
Bugfix 2.0.x
[cron] Bump distribution date (2020-08-18)
Bugfix 2.0.x
The beginTransaction, normally refers to the spi device selection and setup. For example: in stm32f1, the nano v1 uses spi 1 for tft and spi 2 for touch. So, each time a spi transfer is needed, it needs to select the correct spi for that peripheral. But, it doesn’t mean that the beginTransaction is currently right. I had similar issues in stm32f1 when porting the marlin tft. I noted that in stm32f1, the code was wrongly restarting the spi device each time. I separated init and begin methods and I had no issue now. I’m currently working in a full spi class for LPC. I didn’t check others HAL yet. But I will take a look in your PR. Anyway, I don’t think that remove the begin totally is the right thing to do, because we really need select the spi device before use it. May just fix the begin code. I will take a look. |
Other issue that could be happing is double begin. I will take a look. |
In |
Yes, it's what I suspect: double init/begin. But I really think that the begin/end is the part that must stay, as its objective is just select+configure the current spi that will be used. Without it, we may break any board that have more than one spi peripheral. |
There is a PR started which aims to allocate SPI bus + CS pin combinations to devices and to make sure the right CS pins are activated at the right times, but it needs more work. |
Is not only about CS pins. Is about the: which spi (1, 2 or 3) ? what speed? what data mode? what bit order? Each time we talk with a SPI peripheral, we must setup and active the right spi with right config. So, its common (and right) call begin/end when using SPI. And is not only because the CS pin. |
But not for EACH transfered byte! This already makes HARDWARE problem in some cases like mine. |
If an SPI bus has only a single device on it then you can set its CS pin and forget it, and all SPI transfers can go ahead with no preparation and no cleanup. The main complication comes from having more than one device on the same bus in a multi-threaded environment. Remember that Marlin doesn't access SPI from interrupt context. And it is a single-threaded program. SPI is very robust against interruptions, which is why we don't need to bracket our SPI reads and writes with interrupt-disabling. And, because of Marlin's single-threaded nature, we don't need to worry about the wrong CS pin(s) being set and breaking an ongoing SPI transaction. So, the changes proposed by this PR are quite reasonable.
I agree that this needs to be optimized properly, but with some care. All of Marlin's SPI communication is atomic and there is no case where Marlin is going off in the middle of an SPI communication to do another SPI communication. However, we can't account for everything that a library might do. But, again, as long as all SPI communication is done in the same (main) thread there should be no issues with SPI bus transfers getting hijacked, even from libraries. |
Sure. But if the begin/end are removed from the "each byte function", ALL other functions that calls the "each byte function" MUST do the begin/end calls. Without it, it almost sure that something will break. Or at least we will have another PR putting back the begin/end in the right place. We have more than one peripheral that uses a different SPI config: SD, TFT, SPI Flash and TOUCH. And this PR changes 6 different HAL! For LVGL, for example, in the same "loop", the code: reads from spi flash, write to spi TFT and read touches. All using HW SPI. All with different configs (speed or device in this case). So, the calling code always do the begin/end calls. The point of this PR is very valid. It's clearly inefficient call begin/init for each byte, but it must be called somewhere by the code using those SPI funcions... Is it assured? |
The SKR Pro cannot currently use hardware SPI because it needs to operate the onboard SD in Mode 3 to avoid problems with extra edges on the clock like. I wonder if this board has the same issue causing this. @Serhiy-K, does your board work without this if you define Which board are you using? |
It would be fairly straightforward to add debug flags to the SPI classes in the HAL to keep track that I'm fuzzy on whether all SPI activity goes through the HAL or whether we go straight to the SPI library for some devices. I'm also fuzzy on whether hardware SPI differs from software SPI in how Marlin interfaces with them. As I understand it, every part of Marlin that needs to communicate over SPI does the full setup, loop, and completion in place, and there is no "background" SPI activity regardless of whether the actual bit-banging is done by hardware or software. |
I also think so, which is why it makes sense to remove unnecessary initialization as I suggested.
HAL_SPI.cpp for AVR, STM32F1, LPC and DUE does not contain begin/end in data transfer functions and all works good many ears with different peripherals.
I don't use SOFTWARE_SPI because hardware SPI works good for my TFT panel and SD card connected to the same SPI bus. Of course with separated CS. This works with AVR, STM32F1 and LPC without any changes in HAL but with STM32F411 I had a problem. |
The point is:
Calling
For example, in STM32F1, in a fast search, I found:
And those are just what I aware of because I have working with SPI TFT for a few weeks.
Did you try lowering the speed of the SPI? In a fast look at the Sd2Card, seems it have redundant calls, calling I came comment in this PR, because I working exactly on that: fixing/completing a SPI class for LPC that allow change the SPI bus as needed, using begin/end. |
A practical example from my recent experience: When I started working on the LVGL UI, I only had my Chitu board. So I went to test on the MKS Nano V2. Nothing worked. Often, even in the same function, it is necessary to use 2 peripherals at the same time: for example, reading an image from the flash and sending it to the TFT. At that moment I understood and went to read more about SPI and how it works with multiple peripherals with different configurations. That's why this PR caught my attention a lot. |
Bugfix 2.0.x
We should definitely avoid doing things like running loops where we read a byte from one source, then write a byte to a destination, because it might be the same SPI bus with a different CS and it will require preparing the SPI bus too frequently. Instead, a loop like this should buffer and read/write bigger chunks of data so there is less overhead for setup between transactions. Perhaps it would help to add an intermediate I see that the HALs do contain functions to send a whole buffer at once, but these are not used by any Marlin code. |
We have no option when the image have 32kb... A 480x320 screen have: 480x320x2 = 300kb. But even if the data were smaller, we may not have direct control of the flush logic of the UI renderer.
The SPIClass work this way. Each instance of that class keep configured all the SPI devices that are available. @sjasonsmith and @p3p can correct if I'm wrong, but the change from one SPI setup to another, it's just some flags and pointers operation. Until now, I could not find any bottleneck by switching between different SPI configurations. The LVGL UI is a good benchmark.
For TFT and SPI Flash, we already use SPI DMA 16 bit to transfer data faster. I'm finishing the same SPIClass for LPC. It will allow all those operations and anyone will be able to switch between different spi configs as needed, without any problem or conflict (I hope). |
Even if the loop reads only 16 bytes into a buffer before writing out 16 bytes, that will still be more efficient than a 1 byte buffer. With certain configurations it could be as much as … 32 times … more efficient … maybe.
That would be kewl if it does. I haven't looked at it closely in a long time. It will be good to do a full audit of all of Marlin's SPI communication, see how each one is handling its I/O, and make sure all peripherals are operating on the same assumptions. |
As long as all SPI transfers (in Marlin) have a startTransaction, (endTransaction is not really necessary unless you wan't to use dma but block until it finishes), this assumes that a transaction includes making sure the pins are configured and in hardware mode the peripheral is set up correctly, the transaction call should also block until a previous dma transfer is complete. The problem with keeping track of SPI state in a Manager has always been that libraries just do there own thing and we have no control over that. |
But I didn't say that we read one by each time. We read a buffer from SPI (don't remember the size), and write at least one image "line" to TFT. :-)
Yes, it does and works very well :-) I'm sure that the current spi* functions have room for improvement. I'm just pointing out that the current PR changes are removing code that may break the callers of those functions, and at least one init that I'm sure that cannot be removed. What I'm asking is:
|
Having looked at what this PR is addressing, this will probably break things as is but removing the transactions from the HAL SPI function calls is the correct call in the long run, the transaction should be handled by the client code that is calling these functions so it can encapsulate the entire SPI transaction not just single transfers. so each usage of SPI in Marlin would need wrapped in a transaction of some kind, probably not just a Arduino |
Description
I had a problem initializing the SD card on my TFT panel connected to STM32F411 board. With oscilloscope and step-by-step debugging I found the cause of the problem – it was an additional pulse on the SCK line before first main SCK pulse. This pulse was created by calling SPI.beginTransaction() before transfer EACH byte. So SCK line contain 9 CSK pulses for one byte. In Sd2Card.cpp this function is called when executing commands spiSend, cardCommand, spiRead and spiRec. SPI.beginTransaction() performs full initialization for SPI bus, during which an additional SCK pulse appears.
I also found unnecessary SPI bus initialization in Sd2Card::chipSelect().
After deleting the lines containing SPI.beginTransaction() from HAL_SPI.cpp and line spiInit(spiRate_); from Sd2Card::chipSelect() in Sd2Card.cpp the card worked good.
Even SPI.beginTransaction() does not create hardware problems with SD card it greatly slows down the speed of data exchange with SD card.
Mentioned problem may be for ESP32, SAMD51, STM32, STM32_F4_F7, TEENSYxx.
Benefits
Preventing hardware problems with SD card in some cases
Increasing the speed of exchange with the SD card for some architectures