Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hw,spi_host,rtl] Add a 32-bit command length register #25199

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Razer6
Copy link
Member

@Razer6 Razer6 commented Nov 19, 2024

Currently, the SPI host only supports a 9-bit command length, meaning an SPI transaction can only transfer 512b. This is especially problematic for integrated devices, where large FW blobs (>5MB) need to be fetched from the external SPI flash. When used in conjunction with the DMA (hardware-handshake mode), only a few chunks can be transferred before re-programing the SPI host and DMA.

To overcome this problem, we extend the command length to 20
Bit. This allows a more flexible use case and allows the system to read large FW blobs with a single SPI and DMA configuration.

/cc @rivos-eblot

@Razer6 Razer6 marked this pull request as ready for review November 19, 2024 10:35
@Razer6 Razer6 requested review from a team as code owners November 19, 2024 10:35
@Razer6 Razer6 requested review from hcallahan-lowrisc, timothytrippel, GregAC and andreaskurth and removed request for a team and timothytrippel November 19, 2024 10:35
@Razer6 Razer6 force-pushed the spi-len branch 8 times, most recently from 2d52326 to 17bf30b Compare November 19, 2024 14:47
@andreaskurth
Copy link
Contributor

@a-will: Your feedback / review of this would be highly appreciated. TIA!

Copy link
Contributor

@a-will a-will left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, there are alternatives that use existing unused COMMAND bits, so the interface with software doesn't become more complex.

// Exclude from RW tests
"excl:CsrAllTests:CsrExclWrite"]
},
{ name: "COMMAND_LEN",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An entirely separate CSR for the length seems excessive and loses the atomicity of single-CSR command writes, which was also used as the trigger to write the command FIFO.

There are many bits available in the COMMAND CSR. Why not add another 10 bits to the length there?

Even if we manage to create a spi_host implementation that is 8 pins wide and runs at 200 MHz DDR, 512 KiB would still be a big enough chunk to only require servicing less often than every other millisecond. Also, that's assuming you are pretending there is no command queue, which enables having multiple chunks, and its depth could be extended to increase the service interval.

Copy link
Contributor

@rivos-eblot rivos-eblot Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An entirely separate CSR for the length seems excessive and loses the atomicity of single-CSR command writes, which was also used as the trigger to write the command FIFO.
There are many bits available in the COMMAND CSR. Why not add another 10 bits to the length there?

I have not suspected having a single write for a new command was a requirement / feature.

Even if we manage to create a spi_host implementation that is 8 pins wide and runs at 200 MHz DDR, 512 KiB would still be a big enough chunk to only require servicing less often than every other millisecond. Also, that's assuming you are pretending there is no command queue, which enables having multiple chunks, and its depth could be extended to increase the service interval.

This does not play nice with DMA: Increasing the command depth would not help.

However combining the unused bits (18) of the command register with the actual length field (9) would give enough command length bits for most use cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An entirely separate CSR for the length seems excessive and loses the atomicity of single-CSR command writes, which was also used as the trigger to write the command FIFO.
There are many bits available in the COMMAND CSR. Why not add another 10 bits to the length there?

I have not suspected having a single write for a new command was a requirement / feature.

It's not something that absolutely needs to be maintained, but this looked to me like throwing aside the benefit when suitable alternatives remain.

Even if we manage to create a spi_host implementation that is 8 pins wide and runs at 200 MHz DDR, 512 KiB would still be a big enough chunk to only require servicing less often than every other millisecond. Also, that's assuming you are pretending there is no command queue, which enables having multiple chunks, and its depth could be extended to increase the service interval.

This does not play nice with DMA: Increasing the command depth would not help.

Don't just hand-wave here. What is it about DMA that makes it need to know how the spi_host controller chunks a larger transfer? This sounds like a failure in the DMA design.

Is there a missing exported signal from spi_host, like when a full, multi-transfer transaction has completed? There are numerous I/O protocols that want DMA to support transfers across multiple small packets, so in general, the DMA design should be able to support it. Of course SPI isn't one of those protocols (just a possibility in practice, due to other limits).

However combining the unused bits (18) of the command register with the actual length field (9) would give enough command length bits for most use cases.

I'd feel just a little leery of using all the bits, but I haven't yet had the chance to think about how this controller might grow its feature set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not something that absolutely needs to be maintained, but this looked to me like throwing aside the benefit when suitable alternatives remain.

Yes, no trouble here.

This does not play nice with DMA: Increasing the command depth would not help.

Don't just hand-wave here. What is it about DMA that makes it need to know how the spi_host controller chunks a larger transfer? This sounds like a failure in the DMA design.

Yes sorry, I think I got confused with the initial DMA design and the previous SPI host implementation featuring the edge-triggered events.

I'd feel just a little leery of using all the bits, but I haven't yet had the chance to think about how this controller might grow its feature set.

I was thinking about the SW having to push many identical commands for the sake of completing a single read request. The whole 27 bits are not required, but 9 (512 bytes) are really too short to read n MiB from SPI flash (that would give about 10K commands to push if I'm not mistaken)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about making the length 20-bit wide, leaving 7-bit for any future extensions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about making the length 20-bit wide, leaving 7-bit for any future extensions.

Sounds good to me!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a-will I updated the implementation to 20-bit within the command register. I needed to re-shuffle the bits though. Could you take another look on this?

hw/ip/spi_host/rtl/spi_host.sv Outdated Show resolved Hide resolved
Currently the SPI host only supports a 9-bit command length, meaning a SPI
transaction can only transfer 512b. This especially problematic, for integrated
devices, where large FW blobs (>5MB) need to be fetched from the external SPI
flash. When using in conjunction with the DMA (hardware-handshake mode), only
a few chunks can be transferred before needing to re-program the SPI host and
DMA.

To overcome this problem, extend the command length to 20-bit.

Signed-off-by: Robert Schilling <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants