-
Notifications
You must be signed in to change notification settings - Fork 779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hw,spi_host,rtl] Add a 32-bit command length register #25199
base: master
Are you sure you want to change the base?
Conversation
2d52326
to
17bf30b
Compare
@a-will: Your feedback / review of this would be highly appreciated. TIA! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, there are alternatives that use existing unused COMMAND
bits, so the interface with software doesn't become more complex.
hw/ip/spi_host/data/spi_host.hjson
Outdated
// Exclude from RW tests | ||
"excl:CsrAllTests:CsrExclWrite"] | ||
}, | ||
{ name: "COMMAND_LEN", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An entirely separate CSR for the length seems excessive and loses the atomicity of single-CSR command writes, which was also used as the trigger to write the command FIFO.
There are many bits available in the COMMAND
CSR. Why not add another 10 bits to the length there?
Even if we manage to create a spi_host implementation that is 8 pins wide and runs at 200 MHz DDR, 512 KiB would still be a big enough chunk to only require servicing less often than every other millisecond. Also, that's assuming you are pretending there is no command queue, which enables having multiple chunks, and its depth could be extended to increase the service interval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An entirely separate CSR for the length seems excessive and loses the atomicity of single-CSR command writes, which was also used as the trigger to write the command FIFO.
There are many bits available in theCOMMAND
CSR. Why not add another 10 bits to the length there?
I have not suspected having a single write for a new command was a requirement / feature.
Even if we manage to create a spi_host implementation that is 8 pins wide and runs at 200 MHz DDR, 512 KiB would still be a big enough chunk to only require servicing less often than every other millisecond. Also, that's assuming you are pretending there is no command queue, which enables having multiple chunks, and its depth could be extended to increase the service interval.
This does not play nice with DMA: Increasing the command depth would not help.
However combining the unused bits (18) of the command register with the actual length field (9) would give enough command length bits for most use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An entirely separate CSR for the length seems excessive and loses the atomicity of single-CSR command writes, which was also used as the trigger to write the command FIFO.
There are many bits available in theCOMMAND
CSR. Why not add another 10 bits to the length there?I have not suspected having a single write for a new command was a requirement / feature.
It's not something that absolutely needs to be maintained, but this looked to me like throwing aside the benefit when suitable alternatives remain.
Even if we manage to create a spi_host implementation that is 8 pins wide and runs at 200 MHz DDR, 512 KiB would still be a big enough chunk to only require servicing less often than every other millisecond. Also, that's assuming you are pretending there is no command queue, which enables having multiple chunks, and its depth could be extended to increase the service interval.
This does not play nice with DMA: Increasing the command depth would not help.
Don't just hand-wave here. What is it about DMA that makes it need to know how the spi_host controller chunks a larger transfer? This sounds like a failure in the DMA design.
Is there a missing exported signal from spi_host, like when a full, multi-transfer transaction has completed? There are numerous I/O protocols that want DMA to support transfers across multiple small packets, so in general, the DMA design should be able to support it. Of course SPI isn't one of those protocols (just a possibility in practice, due to other limits).
However combining the unused bits (18) of the command register with the actual length field (9) would give enough command length bits for most use cases.
I'd feel just a little leery of using all the bits, but I haven't yet had the chance to think about how this controller might grow its feature set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not something that absolutely needs to be maintained, but this looked to me like throwing aside the benefit when suitable alternatives remain.
Yes, no trouble here.
This does not play nice with DMA: Increasing the command depth would not help.
Don't just hand-wave here. What is it about DMA that makes it need to know how the spi_host controller chunks a larger transfer? This sounds like a failure in the DMA design.
Yes sorry, I think I got confused with the initial DMA design and the previous SPI host implementation featuring the edge-triggered events.
I'd feel just a little leery of using all the bits, but I haven't yet had the chance to think about how this controller might grow its feature set.
I was thinking about the SW having to push many identical commands for the sake of completing a single read request. The whole 27 bits are not required, but 9 (512 bytes) are really too short to read n MiB from SPI flash (that would give about 10K commands to push if I'm not mistaken)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about making the length 20-bit wide, leaving 7-bit for any future extensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about making the length 20-bit wide, leaving 7-bit for any future extensions.
Sounds good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a-will I updated the implementation to 20-bit within the command register. I needed to re-shuffle the bits though. Could you take another look on this?
Currently the SPI host only supports a 9-bit command length, meaning a SPI transaction can only transfer 512b. This especially problematic, for integrated devices, where large FW blobs (>5MB) need to be fetched from the external SPI flash. When using in conjunction with the DMA (hardware-handshake mode), only a few chunks can be transferred before needing to re-program the SPI host and DMA. To overcome this problem, extend the command length to 20-bit. Signed-off-by: Robert Schilling <[email protected]>
Currently, the SPI host only supports a 9-bit command length, meaning an SPI transaction can only transfer 512b. This is especially problematic for integrated devices, where large FW blobs (>5MB) need to be fetched from the external SPI flash. When used in conjunction with the DMA (hardware-handshake mode), only a few chunks can be transferred before re-programing the SPI host and DMA.
To overcome this problem, we extend the command length to 20
Bit. This allows a more flexible use case and allows the system to read large FW blobs with a single SPI and DMA configuration.
/cc @rivos-eblot