Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spi_device] TPM poorly handles aborted transactions #20654

Closed
a-will opened this issue Dec 14, 2023 · 8 comments · Fixed by #21322
Closed

[spi_device] TPM poorly handles aborted transactions #20654

a-will opened this issue Dec 14, 2023 · 8 comments · Fixed by #21322
Assignees
Labels
Component:RTL Earlgrey-PROD Candidate Temporary label to triage issues into Earlgrey-PROD Milestones IP:spi_device Priority:P2 Priority: medium Type:Bug Bugs

Comments

@a-will
Copy link
Contributor

a-will commented Dec 14, 2023

Description

In spi_device, the TPM state machine and FIFOs poorly handle aborted transactions, creating ambiguity and race conditions that cannot be readily solved by the information left behind.

For write commands, if a sequence of two commands enters the command FIFO and data is in the write FIFO, there is no way to know which command is associated with the data. It could be the first or the second command that was aborted, and the separate FIFOs for commands and data cause that association to be dropped.

For read commands, the read data FIFO is reset whenever CSB is de-asserted, allowing it to clear any data written in the FIFO for a past command. However, in a sequence of two read commands, if the first read command is aborted, software can potentially write a response to the first read command while the second is active (CSB is asserted). Thus, the host may see read data for the wrong command.

@a-will
Copy link
Contributor Author

a-will commented Dec 14, 2023

CC @msfschaffner

@msfschaffner msfschaffner added this to the Earlgrey-PROD.M2 milestone Dec 19, 2023
@msfschaffner msfschaffner added the Earlgrey-PROD Candidate Temporary label to triage issues into Earlgrey-PROD Milestones label Dec 19, 2023
@msfschaffner msfschaffner added the Priority:P2 Priority: medium label Dec 19, 2023
@jesultra
Copy link
Contributor

CC @mazurek-michal
This sounds as if we may need a comprehensive review of the software/hardware interface of the TPM SPI component. I have not taken the time to myself study the details of the interface. I suggest that we regroup in the beginning of January, and see if we can work out some reasonable modifications.

@mazurek-michal
Copy link
Contributor

mazurek-michal commented Dec 20, 2023

@jesultra ACK
Current tests, even in the modified version by @a-will for hyperdebug310, do not support sending stand-alone read transaction. We need test program that work on device and give more control over what happens with transaction.

I am actively working on more robust device code for SPI-TPM test.

@jesultra
Copy link
Contributor

jesultra commented Jan 25, 2024

I have not yet looked into the details of the SPI TPM device hardware functionality. But at first glance, it sounds as if some of the features we put into the I2C device IP block could be appropriate.

Basically, the I2C block has an "incoming" queue of requests received from the AP, which could be reads or writes. For writes, the data is inlined into the same queue as the transaction headers (commands).

Then we added a funny feature, which was to make the hardware treat the data read FIFO (outgoing data from OT) as if it was empty, at any time when the incoming queue was non-empty. The intent was the same as clearing the data read FIFO when chip select is deasserted, but as Alex points out, doing so at the edge is insufficient to guard against the software populating a response just after the AP has given up and deasserted CS. If instead the hardware would decide based on the incoming queue being non-empty, then a second read by the AP, which the software has not yet seen or processed, would mean that the incoming queue would be non-empty, with the effect that any stale data put in the data read FIFO by software would be ignored by the hardware, until the software has had a chance to respond to the second read command.

Not having a foolproof way of telling requests apart, or guarding against stale data being transmitted, would potentially seriously impact the ability of ChromeOS to use the OT chip in designs that use the SPI bus for TPM. In my opinion, this needs to be addressed before the first production tapeout.

@mazurek-michal
Copy link
Contributor

@jesultra Is it possible to manually abort spi transaction (in-flight) in current hyperdebug firmware?

@jesultra
Copy link
Contributor

jesultra commented Jan 25, 2024

@jesultra Is it possible to manually abort spi transaction (in-flight) in current hyperdebug firmware?

HyperDebug firmware just knows about SPI read/write transactions. Opentitantool has low-level commands e.g. opentitantool spi raw-write ..., and it has a higher level commands such as opentitantool spi tpm .... The latter will instruct HyperDebug do several separate writes and reads on the spi bus, to e.g. execute a TPM command, or fewer to read/write a single TPM register. The high-level commands to not have support for issuing "invalid" or truncated sequences of SPI operations on the SPI TPM bus.

You could probably manually compose the SPI sequence to read a TPM register with opentitantool spi raw-write-read ..., though it would not properly do the repeated polling to see when the actual response begun. You could certainly issue a request to read a register, without reading any data (aborting the register read), using just opentitantool spi raw-write ....

If you want more advanced things, like aborting a TPM command after doing part of the dance around the STATUS and FIFO register, you may need to implement something in Rust that interfaces with opentitanlib, since opentitantool does not have support for keeping chip select asserted across multiple separate calls to raw-write and raw-read. (It is possible that such support could be added, if needed.)

@msfschaffner
Copy link
Contributor

Note: if we're changing this part we should probably look at fixing #13996 and #14045 at the same time.

@a-will
Copy link
Contributor Author

a-will commented Feb 10, 2024

The plan currently in motion (and generally completed, except for the WrFIFO buffer release CSR and the little shift in RdFIFO reset release):

  • The WrFIFO and RdFIFO move to the SRAM.
    • The command/address FIFO is still separate from the SRAM.
    • Saves 128 Bytes of flops.
  • For write commands, the command/address FIFO gets written only when the last bit of the last byte is clocked in.
    • There is then no need for a separate interrupt for the WrFIFO. All required data is present when the command/address FIFO entry arrives.
    • Write commands continue to be gated by a non-empty command/address FIFO.
    • Aborted write commands never reach the FIFO.
  • The WrFIFO's interface changes to a simple addressable memory, just like the flash modes' payload upload.
    • Unlike the flash mode case, the payload size is known ahead of time.
    • With the command/address FIFO only getting written at the time of the last bit, the entry in that FIFO describes the size of the payload in the memory.
  • The WrFIFO / write payload buffer changes to have an explicit "buffer release" CSR.
    • The "buffer busy" status is another input that blocks write commands at wait states.
    • The RAM interface + "buffer release" CSR is a little simpler for handling resets to the WrFIFO. When released to begin the data phase, the next command always begins at offset 0. Relying purely on circular buffer occupancy appears to require more awkward back-and-forth communication between the two domains.
  • The RdFIFO reset is extended until a read command pops from the command / address FIFO, and that FIFO is empty.
    • Previously, it would reset only while CSB was inactive.
    • This prevents firmware from writing stale data to the RdFIFO.
  • Otherwise...
    • The interfaces to the command/address FIFO and RdFIFO remain the same.

I believe this will take care of the two hazards mentioned here. Only one command can be active, and both command types should have the synchronization points necessary to prevent ambiguous associations of commands and data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component:RTL Earlgrey-PROD Candidate Temporary label to triage issues into Earlgrey-PROD Milestones IP:spi_device Priority:P2 Priority: medium Type:Bug Bugs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants