Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce SPI overhead by removing extra calls #19123

Closed
wants to merge 21 commits into from

Conversation

Serhiy-K
Copy link
Contributor

Description

I had a problem initializing the SD card on my TFT panel connected to STM32F411 board. With oscilloscope and step-by-step debugging I found the cause of the problem – it was an additional pulse on the SCK line before first main SCK pulse. This pulse was created by calling SPI.beginTransaction() before transfer EACH byte. So SCK line contain 9 CSK pulses for one byte. In Sd2Card.cpp this function is called when executing commands spiSend, cardCommand, spiRead and spiRec. SPI.beginTransaction() performs full initialization for SPI bus, during which an additional SCK pulse appears.
I also found unnecessary SPI bus initialization in Sd2Card::chipSelect().
After deleting the lines containing SPI.beginTransaction() from HAL_SPI.cpp and line spiInit(spiRate_); from Sd2Card::chipSelect() in Sd2Card.cpp the card worked good.
Even SPI.beginTransaction() does not create hardware problems with SD card it greatly slows down the speed of data exchange with SD card.
Mentioned problem may be for ESP32, SAMD51, STM32, STM32_F4_F7, TEENSYxx.

Benefits

Preventing hardware problems with SD card in some cases
Increasing the speed of exchange with the SD card for some architectures

@Serhiy-K Serhiy-K changed the title Bugfix 2.0.x Fix for HAL_SPI.cpp Aug 23, 2020
@rhapsodyv
Copy link
Member

The beginTransaction, normally refers to the spi device selection and setup.
It is needed, always, as we might have some peripheral using the spi device 1, other the spi 2, and so on.

For example: in stm32f1, the nano v1 uses spi 1 for tft and spi 2 for touch. So, each time a spi transfer is needed, it needs to select the correct spi for that peripheral.

But, it doesn’t mean that the beginTransaction is currently right.

I had similar issues in stm32f1 when porting the marlin tft. I noted that in stm32f1, the code was wrongly restarting the spi device each time. I separated init and begin methods and I had no issue now.

I’m currently working in a full spi class for LPC. I didn’t check others HAL yet.

But I will take a look in your PR. Anyway, I don’t think that remove the begin totally is the right thing to do, because we really need select the spi device before use it. May just fix the begin code.

I will take a look.

@rhapsodyv
Copy link
Member

Other issue that could be happing is double begin. I will take a look.

@Serhiy-K
Copy link
Contributor Author

Serhiy-K commented Aug 23, 2020

In Sd2Card::init() function spiInit(spiRate_) runs before any next transfers. We don't need next init for SPI
before finishing work with bus in current cycle. Especially for the transmission of EACH byte. For architectures not listed above, include LPC, we have no bus reinitialization in data transfer functions.

@rhapsodyv
Copy link
Member

In Sd2Card::init() function spiInit(spiRate_); runs before any next transfers. We don't need next init for SPI
before finishing work with bus in current cycle. Especially for the transmission of EACH byte. For architectures not listed above, include LPC, we have no bus reinitialization in data transfer functions.

Yes, it's what I suspect: double init/begin. But I really think that the begin/end is the part that must stay, as its objective is just select+configure the current spi that will be used. Without it, we may break any board that have more than one spi peripheral.

@thinkyhead
Copy link
Member

Without it, we may break any board that have more than one SPI peripheral.

There is a PR started which aims to allocate SPI bus + CS pin combinations to devices and to make sure the right CS pins are activated at the right times, but it needs more work.

@rhapsodyv
Copy link
Member

Without it, we may break any board that have more than one SPI peripheral.

There is a PR started which aims to allocate SPI bus + CS pin combinations to devices and to make sure the right CS pins are activated at the right times, but it needs more work.

Is not only about CS pins. Is about the: which spi (1, 2 or 3) ? what speed? what data mode? what bit order?

Each time we talk with a SPI peripheral, we must setup and active the right spi with right config. So, its common (and right) call begin/end when using SPI. And is not only because the CS pin.

@Serhiy-K
Copy link
Contributor Author

Each time we talk with a SPI peripheral, we must setup and active the right spi with right config

But not for EACH transfered byte! This already makes HARDWARE problem in some cases like mine.
In addition, I saw a decrease in the exchange rate when working with a SD card by about half, and this is at a speed 300000. At high speed, the fall can be more than ten times.

@thinkyhead
Copy link
Member

Is not only about CS pins. Is about the: which spi (1, 2 or 3) ? what speed? what data mode? what bit order?

If an SPI bus has only a single device on it then you can set its CS pin and forget it, and all SPI transfers can go ahead with no preparation and no cleanup. The main complication comes from having more than one device on the same bus in a multi-threaded environment.

Remember that Marlin doesn't access SPI from interrupt context. And it is a single-threaded program. SPI is very robust against interruptions, which is why we don't need to bracket our SPI reads and writes with interrupt-disabling. And, because of Marlin's single-threaded nature, we don't need to worry about the wrong CS pin(s) being set and breaking an ongoing SPI transaction.

So, the changes proposed by this PR are quite reasonable.

But not for EACH transfered byte!

I agree that this needs to be optimized properly, but with some care. All of Marlin's SPI communication is atomic and there is no case where Marlin is going off in the middle of an SPI communication to do another SPI communication. However, we can't account for everything that a library might do. But, again, as long as all SPI communication is done in the same (main) thread there should be no issues with SPI bus transfers getting hijacked, even from libraries.

@rhapsodyv
Copy link
Member

Each time we talk with a SPI peripheral, we must setup and active the right spi with right config

But not for EACH transfered byte! This already makes HARDWARE problem in some cases like mine.
In addition, I saw a decrease in the exchange rate when working with a SD card by about half, and this is at a speed 300000. At high speed, the fall can be more than ten times.

Sure. But if the begin/end are removed from the "each byte function", ALL other functions that calls the "each byte function" MUST do the begin/end calls. Without it, it almost sure that something will break. Or at least we will have another PR putting back the begin/end in the right place.

We have more than one peripheral that uses a different SPI config: SD, TFT, SPI Flash and TOUCH. And this PR changes 6 different HAL!

For LVGL, for example, in the same "loop", the code: reads from spi flash, write to spi TFT and read touches. All using HW SPI. All with different configs (speed or device in this case). So, the calling code always do the begin/end calls.

The point of this PR is very valid. It's clearly inefficient call begin/init for each byte, but it must be called somewhere by the code using those SPI funcions... Is it assured?

@sjasonsmith
Copy link
Contributor

The SKR Pro cannot currently use hardware SPI because it needs to operate the onboard SD in Mode 3 to avoid problems with extra edges on the clock like. I wonder if this board has the same issue causing this.

@Serhiy-K, does your board work without this if you define SOFTWARE_SPI? If so, that could be a temporary solution while a more completed solution is figured out. I agree with @rhapsodyv that a change across so many devices has a high risk of accidentally breaking something without a lot of care and testing.

Which board are you using?

@thinkyhead
Copy link
Member

The point of this PR is very valid. It's clearly inefficient [to] call begin/init for each byte, but it must be called somewhere by the code using those SPI [functions]... Is it assured?

It would be fairly straightforward to add debug flags to the SPI classes in the HAL to keep track that SPI.beginTransaction has been called, then clearing the flag when SPI.endTransaction is called. From that we can see what the effect of the changes in this PR would be, and trace back to the points where SPI.beginTransaction should have been called.

I'm fuzzy on whether all SPI activity goes through the HAL or whether we go straight to the SPI library for some devices. I'm also fuzzy on whether hardware SPI differs from software SPI in how Marlin interfaces with them. As I understand it, every part of Marlin that needs to communicate over SPI does the full setup, loop, and completion in place, and there is no "background" SPI activity regardless of whether the actual bit-banging is done by hardware or software.

@Serhiy-K
Copy link
Contributor Author

Serhiy-K commented Aug 24, 2020

As I understand it, every part of Marlin that needs to communicate over SPI does the full setup, loop, and completion in place, and there is no "background" SPI activity regardless of whether the actual bit-banging is done by hardware or software.

I also think so, which is why it makes sense to remove unnecessary initialization as I suggested.

But if the begin/end are removed from the "each byte function", ALL other functions that calls the "each byte function" MUST do the begin/end calls.

HAL_SPI.cpp for AVR, STM32F1, LPC and DUE does not contain begin/end in data transfer functions and all works good many ears with different peripherals.

@Serhiy-K, does your board work without this if you define SOFTWARE_SPI?

I don't use SOFTWARE_SPI because hardware SPI works good for my TFT panel and SD card connected to the same SPI bus. Of course with separated CS. This works with AVR, STM32F1 and LPC without any changes in HAL but with STM32F411 I had a problem.

@rhapsodyv
Copy link
Member

rhapsodyv commented Aug 24, 2020

As I understand it, every part of Marlin that needs to communicate over SPI does the full setup, loop, and completion in place, and there is no "background" SPI activity regardless of whether the actual bit-banging is done by hardware or software.

The point is:

  1. If the setup is done only once in some "init" code, it needs the begin/end call when using spi.
  2. OR, it call spiInit right before the use, always (a lot of places don't use begin, but call spiInit, that does call the begin)

Calling spiInit in some init method, I never calling it again, neither calling spi begin, is prone to problem. It may work fine without any problem until we have two devices that need different spi needs (speed, device, data mode and so on). And is what I'm working right now on LPC.

HAL_SPI.cpp for AVR, STM32F1, LPC and DUE does not contain begin/end in data transfer functions and all works good many ears with different peripherals.

For example, in STM32F1, in a fast search, I found:

  • 17 Calls to W25QXX.init => that calls spi begin
  • 4 SPIx?.begin
  • 14 tft.*write(M|S) => that call spi begin to write to TFT

And those are just what I aware of because I have working with SPI TFT for a few weeks.

I don't use SOFTWARE_SPI because hardware SPI works good for my TFT panel and SD card connected to the same SPI bus. Of course with separated CS. This works with AVR, STM32F1 and LPC without any changes in HAL but with STM32F411 I had a problem.

Did you try lowering the speed of the SPI?


In a fast look at the Sd2Card, seems it have redundant calls, calling spiInit(spiRate_); on Sd2Card::chipSelect() and begin Transation() is called for each byte. But the PR is removing both - the init in chipselect and the begin. One of them must stay. At least chipselect must have the init call.
But, it need to make sure that every other place that uses those spi* functions are aware that they are not doing the begin/end. Maybe only sd2card use those spi* functions... I don't know.

I came comment in this PR, because I working exactly on that: fixing/completing a SPI class for LPC that allow change the SPI bus as needed, using begin/end.
In the port of the marlin UIs for LPC, I need to use TOUCH and TFT with different velocities. But the current SPI code in LPC doesn't support it. So I'm filling the gaps and creating a class to support that multi bus.
And to support multi bus, we need begin/end calls :-)

@rhapsodyv
Copy link
Member

A practical example from my recent experience:

When I started working on the LVGL UI, I only had my Chitu board.
The LVGL code was full of SPI init, every time I was going to use the flash.
The first thing I did was to remove those duplicate calls, keeping only one in the begin of the TFT setup function.
And it worked perfectly.
The detail: Chitu's TFT is FSMC and the touch is sw spi. Only Flash is SPI.
So, really, I just needed a spi init / begin and it would work forever.

So I went to test on the MKS Nano V2. Nothing worked.
The nano v2 card uses SPI for: TFT, TOUCH and FLASH!
There are 3 different SPI configurations (speed and device).
I put all the spi init calls back, and everything went back to normal.

Often, even in the same function, it is necessary to use 2 peripherals at the same time: for example, reading an image from the flash and sending it to the TFT.

At that moment I understood and went to read more about SPI and how it works with multiple peripherals with different configurations.

That's why this PR caught my attention a lot.

@thinkyhead
Copy link
Member

for example, reading an image from the flash and sending it to the TFT

We should definitely avoid doing things like running loops where we read a byte from one source, then write a byte to a destination, because it might be the same SPI bus with a different CS and it will require preparing the SPI bus too frequently. Instead, a loop like this should buffer and read/write bigger chunks of data so there is less overhead for setup between transactions.

Perhaps it would help to add an intermediate SPIManager class to Marlin that you have to go through to associate a feature/device with an SPI bus. This intermediate class can keep track of the last feature/device that a (shared) SPI bus was set up for, and then it can take care of any needed setup before the transaction proceeds.

I see that the HALs do contain functions to send a whole buffer at once, but these are not used by any Marlin code.

@rhapsodyv
Copy link
Member

for example, reading an image from the flash and sending it to the TFT

We should definitely avoid doing things like running loops where we read a byte from one source, then write a byte to a destination, because it might be the same SPI bus with a different CS and it will require preparing the SPI bus too frequently. Instead, a loop like this should buffer and read/write bigger chunks of data so there is less overhead for setup between transactions.

We have no option when the image have 32kb... A 480x320 screen have: 480x320x2 = 300kb. But even if the data were smaller, we may not have direct control of the flush logic of the UI renderer.

Perhaps it would help to add an intermediate SPIManager class to Marlin that you have to go through to associate a feature/device with an SPI bus. This intermediate class can keep track of the last feature/device that a (shared) SPI bus was set up for, and then it can take care of any needed setup before the transaction proceeds.

The SPIClass work this way. Each instance of that class keep configured all the SPI devices that are available. @sjasonsmith and @p3p can correct if I'm wrong, but the change from one SPI setup to another, it's just some flags and pointers operation. Until now, I could not find any bottleneck by switching between different SPI configurations. The LVGL UI is a good benchmark.

I see that the HALs do contain functions to send a whole buffer at once, but these are not used by any Marlin code.

For TFT and SPI Flash, we already use SPI DMA 16 bit to transfer data faster. I'm finishing the same SPIClass for LPC. It will allow all those operations and anyone will be able to switch between different spi configs as needed, without any problem or conflict (I hope).
This same class already exist in STM32F1 and I think that in Arduino too.

@thinkyhead
Copy link
Member

We have no option when the image have 32kb.

Even if the loop reads only 16 bytes into a buffer before writing out 16 bytes, that will still be more efficient than a 1 byte buffer. With certain configurations it could be as much as … 32 times … more efficient … maybe.

The SPIClass work this way…

That would be kewl if it does. I haven't looked at it closely in a long time. It will be good to do a full audit of all of Marlin's SPI communication, see how each one is handling its I/O, and make sure all peripherals are operating on the same assumptions.

@p3p
Copy link
Member

p3p commented Aug 24, 2020

As long as all SPI transfers (in Marlin) have a startTransaction, (endTransaction is not really necessary unless you wan't to use dma but block until it finishes), this assumes that a transaction includes making sure the pins are configured and in hardware mode the peripheral is set up correctly, the transaction call should also block until a previous dma transfer is complete. The problem with keeping track of SPI state in a Manager has always been that libraries just do there own thing and we have no control over that.

@rhapsodyv
Copy link
Member

rhapsodyv commented Aug 24, 2020

We have no option when the image have 32kb.

Even if the loop reads only 16 bytes into a buffer before writing out 16 bytes, that will still be more efficient than a 1 byte buffer. With certain configurations it could be as much as … 32 times … more efficient … maybe.

But I didn't say that we read one by each time. We read a buffer from SPI (don't remember the size), and write at least one image "line" to TFT. :-)

The SPIClass work this way…

That would be kewl if it does. I haven't looked at it closely in a long time. It will be good to do a full audit of all of Marlin's SPI communication, see how each one is handling its I/O, and make sure all peripherals are operating on the same assumptions.

Yes, it does and works very well :-)

I'm sure that the current spi* functions have room for improvement. I'm just pointing out that the current PR changes are removing code that may break the callers of those functions, and at least one init that I'm sure that cannot be removed.

What I'm asking is:

  1. ALL code that uses those changed spi* functions, do they call the init/begin in a "closer" context of the spi* function usage across the 6 different HALs that are being changed ?

  2. The Sd2Card::chipSelect() must keep a call for spiInit, or at least spiBegin (if the setup remains the same). Without this init/begin in this method, as soon as we use more than one SPI in the same time of Sd2Card, it will break.

@p3p
Copy link
Member

p3p commented Aug 24, 2020

Having looked at what this PR is addressing, this will probably break things as is but removing the transactions from the HAL SPI function calls is the correct call in the long run, the transaction should be handled by the client code that is calling these functions so it can encapsulate the entire SPI transaction not just single transfers. so each usage of SPI in Marlin would need wrapped in a transaction of some kind, probably not just a Arduino SPI.beginTransaction().

@thinkyhead thinkyhead changed the title Fix for HAL_SPI.cpp Reduce SPI overhead by removing extra calls Aug 26, 2020
@thisiskeithb thisiskeithb added the Needs: Work More work is needed label Aug 26, 2020
@thinkyhead thinkyhead closed this Aug 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants