Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] tdm: Trusted Device Manager architecture definition #290

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sameo
Copy link
Member

@sameo sameo commented Jul 24, 2023

This is the initial commit for the Trusted Device Manager (TDM). It only contains an architecture document, no implementation is provided yet.

@sameo sameo requested review from mythi, zvonkok and fitzthum July 24, 2023 10:40
@sameo
Copy link
Member Author

sameo commented Jul 24, 2023

cc @cclaudio

@larrydewey
Copy link

This is great work @sameo! Thanks for putting it together! I have some questions about a few of the approaches, but I think you have successfully captured the vision!

Copy link
Member

@jialez0 jialez0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sameo This is an exciting field, thanks for putting them together!

3. The **Linux CoCo ABI plugins** abstracts the vendor specific CoCo Linux
kernel ABIs in order expose a vendor-agnostic internal interface for the TDM
core to consume.
4. The **Device Attestation** is actually a relying party implementation, for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use local attestation within the guest, how can we set the attestation policy and reference values? Does this mean they need to be built into the rootfs image?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running local attestation, I think that the guest can be provisioned with attestation policies and reference values in two ways:

  1. Statically: The guest image is built with these pieces of data.
  2. Dynamically: The guest could use an attestation token to retrieve the policy and RV from a relying party and run a local attestation service.

The TDM should support both attestation methods (local and remote) through a TDM configuration file.

TDM is a Relying Party.
TDM to use part of the Attestation Service crates

##### Remote Attestation
Copy link
Member

@jialez0 jialez0 Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sameo For remote attestation, I have a design suggestion: Device attestation module of TDM can be responsible for call get_token API of Attestation-Agent (We are about to implement this API) and pass the device evidence to AA to perform remote attestation. Then TDM only need to verify the signature of the attestation result token. This aligns with AA's now role in guest components.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jialez0 Would the TDM provide the attestation agent with the device attestation evidence? And then let the AA run remote attestation to finally get an attestation results token? If so, that makes sense to me, and I started to add something like that in this document.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sameo Now AA has two interfaces: Get Evidence and Get Token
I think a better design would be:

  • In local attestation mode: TDM Attestation Module is responsible for calling AA's Get Evidence API to obtain Evidence and verifying evidence with locally integrated AS.
  • In remote attestation mode: TDM Attestation Module is responsible for calling AA's Get Token API to perform remote attestation and ultimately obtain an attestation results token.

In both modes, the collection of Attestation Evidence is completed by the integrated attester drivers in AA. So that, to support device attestation we only need to add a new attester for AA and a new verifier for AS. Just like we support a new type of TEE.

Copy link
Member

@fitzthum fitzthum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess some of my questions are sort of addressed later in the doc

┃ ║┌─────────────────────┐║ ┃
┃ ║│ Relying Party │║ ┃
┃ ┌───▷║│(Attestation Service)│║ ┃
┃ │ ║└─────────────────────┘║ ┃
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit confusing to me to have the Relying Party / AS drawn inside the guest here. Should this be the AA, which is a trusted proxy to those services? I assume the idea is that the TDM can get trusted configuration or secrets from the client?

Also, should we differentiate between client provided trusted information and other artifacts that might come from the manufacturer (with signatures)?

Copy link

@bodzhang bodzhang Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sameo , is the Relying Party block in the diagram specific for "local attestation" scheme, where the Trusted Device Manager performs the TEE-IO device Attestation evidence verification?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry the diagram was specific to local attestation. In my mind, the device attestation can either be local or remote and both paths should be handled through the attestation module. For local attestation, this module would expect to be provided (statically or dynamically) with attestation policies and reference values. For remote attestation, the module would use the AA API and use it as an attestation proxy. Note that this creates a dependency between the TDM and the AA (i.e. the TDM could not be used outside of the confidential-containers context without pulling the AA dependency), which is fine with me.

┃ │ ║ ╔════════╗ ╔════════╗ ║ ┃
┃ │ ╚═╣ TEE ╠═╣ TEE ╠═╝ ┃
┃ │ ║ Plugin ║ ║ Plugin ║ ┃
┃ udev ╚════════╝ ╚════════╝ ┃
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the TDM will be modular at the level of the TEE? What about different devices? Will they all be attested with the same code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see the TEE interface as the ABI beween the guest and the TSM. Today we use it mostly for fetching attestation evidences, but I think it should/will be extended to also support the TEE-IO flows.
All assigned devices will be managed by the TSM, on behalf of the untrusted VMM, so the TSM will be the single interface for getting any device attestation evidence, and for guests to accept devices into their TCBs.
Does that make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sameo @fitzthum I think a better approach may be to unify the TEE plugin here with the existing attester plugin in AA, both of which are at the same level as the attester clause module. Here, we can import the attester crate to obtain evidence or call AA's get_evidence API to obtain evidence.

verification process, the TDM must run a local or remote attestation of the
device attestation evidence. To support local attestation, the TDM
practically becomes a Relying Party.
5. Notify the TSM of it TDI acceptance decision. Here as well, this relies on

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the proposal here is to perform TEE-IO device attestation evidence verification before accepting the device assignment to the confidential guest, an alternative for at least some of the CoCo usages is to capture the TEE-IO device attestation evidence, supply it to a remote Relying Party, together with the confidential guest attestation evidence (including containers loaded or allowed to be loaded), to be verified at the time of secret provisioning from the remote Relying Party.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the proposal here is to perform TEE-IO device attestation evidence verification before accepting the device assignment to the confidential guest, an alternative for at least some of the CoCo usages is to capture the TEE-IO device attestation evidence, supply it to a remote Relying Party, together with the confidential guest attestation evidence (including containers loaded or allowed to be loaded), to be verified at the time of secret provisioning from the remote Relying Party.

The evidence verification must happen before the guest accepts the assigned device, because once it's accepted the device is trusted and can directly access confidential memory. With that in mind, combining the device and guest attestation evidence together is appealing but has the following drawbacks:

  • It can only happen when running guest attestation, i.e. device hotplug is not really supported (Kata relies on device hotplug for direct assignment)
  • The remote relying party is able to verify both the guest evidence and the device one. I can see how vendors may want to maintain and provision those services separately because a device attestation evidence is basically orthogonal to a guest one. On the other hand, there may be cases where workload owners may want to support only certain combinations of guest stacks and devices (e.g. I would only want kernel version A with GPU driver version B to work with GPU firmware version C) and we need to allow for supporting that use case as well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an unverified device can be accepted by the TEE-protected guest (itself also unverified until remote attestation), as long as the device identity/configuration can be captured in Attestation Evidence as "tamper-proof" against potential attack from the unverified device itself.

This requirement rules out an Attestation Evidence design where the SW running inside the CPU-TEE-protected guest simply appends the TEE-IO device attestation evidence stored in guest memory to the CPU-TEE-produced confidential guest attestation evidence. A malicious device once attached would be able to modify the memory content. But a design that captures the TEE-IO device attestation evidence in CPU-TEE-produced attestation evidence, for example, in TDX RTMRs, can make sure Relying Party detects untrusted TEE-IO device attached to the environment during Remote Attestation. Designs not relying on RTMR-type capability (as some CPU-TEEs don't have RTMR-type capability) requires more exploration. Maybe SEV-SNP HostData or TDX MRConfigID can be used to capture the expected device evidence at the guest creation time.

With regard to Relying Party verifying both the guest evidence and device evidence, I agree that the flexibility of maintaining separate service/infrastructure for CPU and Devices verification is valuable. Combining the device and guest attestation evidence together does not force any specific implementation of monolithic CPU and Device verification service. A Relying Party verifying a combined device and guest evidence can consult CPU and Device verification service separately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an unverified device can be accepted by the TEE-protected guest (itself also unverified until remote attestation), as long as the device identity/configuration can be captured in Attestation Evidence as "tamper-proof" against potential attack from the unverified device itself.

This requirement rules out an Attestation Evidence design where the SW running inside the CPU-TEE-protected guest simply appends the TEE-IO device attestation evidence stored in guest memory to the CPU-TEE-produced confidential guest attestation evidence. A malicious device once attached would be able to modify the memory content.

Not only the malicious device could tamper with confidential memory, but the host VMM could use an attached device to access and modify confidential memory as well. Keep in mind that the host VMM emulates the guest PCI bus, and thus controls the MMIO mappings between the guest physical addresses and the host physical ones. TDISP, through the TDI interface report, allows guests to verify that those mappings are correctly set by the TSM, i.e. that MMIO GPAs do map to the actual device, not to other devices or to the host controlled memory.
Moreover, the device directly accesses the TVM private memory through IOMMU controlled translation tables, and it can do so at any point in time, i.e. not only when the TVM is actually running.
This leads to a bunch of threat vectors that are different from plain confidential computing ones, as assigning and accepting a device into a TVM allows for untrusted host software components to access private memory indirectly (through the device MMIO and DMA). FWIW We tried to capture those threats here: https://github.com/riscv-non-isa/riscv-ap-tee-io/blob/main/specification/security_model.adoc#threats

Device attestation, either captured through CPU attestation evidence or run as a separate attestation flow, can not mitigate those threats alone. A device attestation evidence result will tell the guest about a device trustworthiness, but it won't tell about the host emulated PCI mappings and DMA translation tables being valid or not. In other words, your relying party could very well show that a perfectly trustworthy device is attached to your TVM, but the host software could have programmed the actual IO (MMIO and/or DMA) in such a way that the guest believes its talking to this trusted device while part or the whole IO traffic is captured by the host. This is a fundamental difference between guest and device attestation and the reason why a guest must verify the device before accepting it. And by verifying we mean checking with the TSM about I/O mappings, the state of the physical link, the device identity and finally the device trustworthiness (i.e. the actual attestation). Skipping any of these steps could allow for the guest or the device itself to trick the guest and leak confidential data. And they should all be completed before the device is accepted (i.e. the IO mappings are enabled) and the guest can use it.

This is a fundamental difference between guest and device attestation. The former can be done asynchronously, before secrets are delivered to the guest, and is sufficient to guarantee that confidential data is protected. The latter only concludes the device verification process and is not by itself sufficient for guaranteeing confidential data protection against malicious devices or host components that could then access it even while the guest is not running.

But a design that captures the TEE-IO device attestation evidence in CPU-TEE-produced attestation evidence, for example, in TDX RTMRs,

I'm not sure I understand how this is different than appending to the CPU attestation evidence? The device attestation evidence is integrity protected as it's signed by a device manufacturer endorsed key. Maybe I'm not getting your point, apologies if that's the case.

Copy link

@bodzhang bodzhang Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By capturing the TEE-IO device attestation evidence in TDX RTMRs (for example, extending a RTMR with the measurement of the device attestation evidence) before accepting the device in the guest VM, the guest makes sure the device, if malicious or compromised, cannot alter the attestation evidence without detection (by Relying Party). The device attestation evidence can be appended to the TEE-CPU produced guest attestation evidence and bound to the RTMR value. If the attack from a malicious or compromised device alters the guest memory (code/data) to spoof a legit Device attestation evidence, it won't match the RTMR value and will be detect when the Relying Party verifies the guest and device attestation evidence. If the TEE-IO attestation evidence is instead simply stored in guest memory, but not bound with "tamper-proof" RTMRs, the potential alternation mentioned above won't be detected by the Relying Party.

Before the guest accepts the device, it should perform many checks using static code logic, for example, IO mapping must match TSM provided info. Capturing the TSM provided info, I/O mapping info, etc as part of device attestation evidence to be verified by Relying Party later only moves the static checking logic to the Relying Party, without the flexibility benefit. For checks using "dynamic" reference data, such as device cert revocation list, expected firmware measurement and configuration values, etc., performing the verification by Relying Party during remote attestation instead of "local" check, allows much more flexibility on acquiring/updating the "dynamic" reference data used in verification. For remote check approach, the guest should capture the relevant info provided by TSM in the device attestation evidence bound to RTMRs.

Sorry for not including more details in my previous post.

@sameo sameo force-pushed the topic/tdm branch 7 times, most recently from 6f7f470 to 6389ef9 Compare August 8, 2023 10:46
3. The **Linux CoCo ABI plugins** abstracts the vendor specific CoCo Linux
kernel ABIs in order expose a vendor-agnostic internal interface for the TDM
core to consume.
4. The **Device Attestation** module is responsible for verifying a device
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself (Thanks @jyao1): The attestation module may have to support vendor specific attestation results, so it may need to be modularized as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may leverage here: confidential-containers/confidential-containers#135

  • Interoperability outside the CoCo flows - for example to reuse the AR computed by CoCo in subsequent TLS exchanges involving the CoCo attester,
  • Ease the task of defining and computing authorization policies by relying parties due to the reliance on AR4SI's "trustworthiness vector" to present a normalized view of the evaluation results. An example showing how to use OPA to evaluate a EAR in just a few lines of Rego can be found here (click on the Evaluate button.)
  • Code reuse via existing open-source libraries,
  • Normalised and attester-agnostic result format, which allows to accommodate all currently supported formats as well as any other kinds of attester / CC technologies in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'm all for it for sure. There may be cases where we would not talk to KBS but to vendor specific attestation services that return non EAR formatted attestation results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I treat EAR to be one possible option.
But I am not sure if that is the only option.

Do we want to allow different format?

@zvonkok
Copy link
Member

zvonkok commented Aug 11, 2023

@sameo Some PCIe devices need additional functionality beyond attestation. In the case of the GPU e.g. if attestation succeeds, the GPU is still in a "Not Ready" state. A Relying Party has to set the GPU explicitly into a "Ready" state. Are we planning to add vendor-specific "hook" or "callbacks" to the TDM to enable such behavior?

@sameo
Copy link
Member Author

sameo commented Aug 11, 2023

@sameo Some PCIe devices need additional functionality beyond attestation. In the case of the GPU e.g. if attestation succeeds, the GPU is still in a "Not Ready" state. A Relying Party has to set the GPU explicitly into a "Ready" state.

How does the RP sets the GPU into this state? Is that documented anywhere?

Are we planning to add vendor-specific "hook" or "callbacks" to the TDM to enable such behavior?

We may have to, and that's partly why I started this as an RFC, to collect feedback and build a flexible enough design.

@jyao1
Copy link
Member

jyao1 commented Aug 11, 2023

@sameo , good idea to have a dedicate TDM!
I will review and comment with more detail info.

tdm/docs/architecture.md Outdated Show resolved Hide resolved
tdm/docs/architecture.md Outdated Show resolved Hide resolved
tdm/docs/architecture.md Outdated Show resolved Hide resolved
#### TDI Security Attributes Check

#### Device Attestation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have different attestation model, according to RATS rfc9334 (https://datatracker.ietf.org/doc/rfc9334/).

Do we want to support different patter, such as Passport Model, Background-Check Model?

@zvonkok
Copy link
Member

zvonkok commented Aug 11, 2023

@sameo It needs to be done via a specific call to nvidia-smi, currently we do local-gpu-attestation with bundled golden-measurement RIMs and if this succeeds we do nvidia-smi conf-compute -srs 1 (srs = set ready state).
It is all done inside the VM so that a GPU workload container does not see anything. If attestation fails we do not set it "Ready" and a user will have a GPU in the Container but not in the preferred state. A Container cannot set "Ready" state.

@zvonkok
Copy link
Member

zvonkok commented Aug 11, 2023

@sameo @fitzthum @fidencio @stevenhorsman One thing we're missing is, I think crucial, how to provide a channel to the user for important messages that he can react upon.
What if the Relying Party is the user creating the Pod or the Container, all the other parts can do their job KBS/KBC etc, but the user wants to do e.g.

echo "Ready" > /run/..../whatever 

Or read from a dedicated channel that e.g. attetation failed without going through a ton of journald messages

cat /run/.../whataver

Also other entities could read like operators or side-car containers to provide important messages to Prometheus/Kubelet/3rd party logging on which decision could be made.

@sameo
Copy link
Member Author

sameo commented Aug 11, 2023

@sameo It needs to be done via a specific call to nvidia-smi, currently we do local-gpu-attestation with bundled golden-measurement RIMs and if this succeeds we do nvidia-smi conf-compute -srs 1 (srs = set ready state). It is all done inside the VM so that a GPU workload container does not see anything. If attestation fails we do not set it "Ready" and a user will have a GPU in the Container but not in the preferred state. A Container cannot set "Ready" state.

Thanks for the clarification. It seems to me that nvidia-smi conf-compute -srs 1 is equivalent to accepting the device into the TVM TCB. Is that a stretch?

Also, do you support remote device attestation? If so, would the flow be similar i.e. the guest VM would need to call nvidia-smi based on the retrieved attestation results? What would be the attestation results format, if that path is supported?

@sameo sameo force-pushed the topic/tdm branch 2 times, most recently from 8361964 to 3b3d38a Compare August 12, 2023 13:55
@jialez0
Copy link
Member

jialez0 commented Aug 15, 2023

@sameo I have the following design suggestions for the attestation:

In the current design, the more accurate name of TDM's attestation module should be evidence verification module, but the complete attestation process should include two stages: obtaining evidence and verifying evidence.
Therefore, in TDM, there should be two corresponding modules:

  1. Attester module: for evidence acquisition.
  2. Verifier module: for evidence verification.

TDM Attester Module

This module is responsible for collecting device attestation evidence, and it will only be used in local attestation mode.
The TDM Attester Module here should provide two ways to use it:

  1. Integration: Directly integrate the attester library crate to obtain device attestation evidence. The evidence collection logic plugin that is specific to device TEE should have the same status as the plugins of other CPU TEEs, implemented as a plugin of attester crate.
  2. Call AA: Call the API of the AA gRPC/ttRPC service to obtain device attestation evidence.

TDM Verifier Module

This module is responsible for verifying device attestation evidence and has two modes:

  1. Local attestation: In this mode, the TDM Verifier module actually becomes a Reply Party, which integrates attestation-service crate and RVPS for evidence verification.
  2. Remote attestation: In this mode, TDM no longer needs to collect device attestation evidence. Instead, the TDM Verifier module directly calls the API of the AA gRPC/ttRPC service, where AA is responsible for collecting device attestation evidence and sending it to the remote AS for verification. After verification, a signed Attestation Results Token will be returned to TDM.

What do you think?

┃ │ │ ┃
┃ │ ▽ ┃
┃ │ ┌───────┐ ┃
┃ │ ┌───────────────┤TEE ABI├────┐ ┃
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I accept that there are cases where userspace needs an override for the evidence acceptance flow especially in these early days where there might not be unification across devices on evidence formats, but I want to advocate for and end goal where the kernel only needs a certificate chain to be able to make acceptance decisions. I am reluctant to sign-off on a design direction that makes the kernel beholden to userspace for device usage as I worry about scenarios like error recovery and power management where userspace upcalls are not feasible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I accept that there are cases where userspace needs an override for the evidence acceptance flow especially in these early days where there might not be unification across devices on evidence formats, but I want to advocate for and end goal where the kernel only needs a certificate chain to be able to make acceptance decisions.

The evidence format heterogeneity is not the main problem here, and actually I am pretty sure we'll have to deal with different manufacturers using different, non-standard formats for a while ;-)

The kernel can get a cert chain from the device, but the only thing that would provide us with is the device authenticity. The kernel would know that it's talking to is a genuine NVIDIA/AMD/Intel/etc device. But that does not tell the guest workload about:

  • The state of the physical link between the device and the host (Is it encrypted? Is it a selective link? Are PCI switches in the TCB?)
  • The whole firmware stack the device is running
  • The state of the device itself (Debug? RMA? Production?)
  • The nature of the logical link between the TSM and the device firmware (Are we running a secure SPDM session?)
  • The MMIO mappings between the host VMM emulated guest PCI GPA and the actual HPA (Are we mapping the right PCI BARs into the expected device host physical ranges? Are those mapped in the right order?)
  • The DMA translation tables for the attached device (Have those been setup by the TSM, into a trusted IOMMU?)

Without getting answers to all the above questions, the guest workload can not realistically decide if it can trust the device its attached to or not. Some of those questions can only be answered by the TSM while others are arguably not something you'd want your guest kernel to do (Do we want the kernel to handle attestation evidences and attestation results tokens?).

I am reluctant to sign-off on a design direction that makes the kernel beholden to userspace for device usage as I worry about scenarios like error recovery and power management where userspace upcalls are not feasible.

That's a fair concern although I think that once userspace accepts a trusted device into its TCB, at least the PM flows should be the same as regular devices. For error recovery, this may be slightly different as e.g. a host VMM trying to tamper with a locked/running trusted PCIe device must move the device TDISP state machine into an unrecoverable (from a guest perspective) TDISP error state, i.e. it would no longer be assigned to the TVM until the host decides to re-assign it.

@sameo
Copy link
Member Author

sameo commented Aug 22, 2023

@sameo I have the following design suggestions for the attestation:

In the current design, the more accurate name of TDM's attestation module should be evidence verification module, but the complete attestation process should include two stages: obtaining evidence and verifying evidence. Therefore, in TDM, there should be two corresponding modules:

  1. Attester module: for evidence acquisition.
  2. Verifier module: for evidence verification.

TDM Attester Module

This module is responsible for collecting device attestation evidence, and it will only be used in local attestation mode. The TDM Attester Module here should provide two ways to use it:

  1. Integration: Directly integrate the attester library crate to obtain device attestation evidence. The evidence collection logic plugin that is specific to device TEE should have the same status as the plugins of other CPU TEEs, implemented as a plugin of attester crate.
  2. Call AA: Call the API of the AA gRPC/ttRPC service to obtain device attestation evidence.

TDM Verifier Module

This module is responsible for verifying device attestation evidence and has two modes:

  1. Local attestation: In this mode, the TDM Verifier module actually becomes a Reply Party, which integrates attestation-service crate and RVPS for evidence verification.
  2. Remote attestation: In this mode, TDM no longer needs to collect device attestation evidence. Instead, the TDM Verifier module directly calls the API of the AA gRPC/ttRPC service, where AA is responsible for collecting device attestation evidence and sending it to the remote AS for verification. After verification, a signed Attestation Results Token will be returned to TDM.

What do you think?

I think it makes a lot of sense. Let me try to summarize how the TDM attestation module should behave (Regardless of it being split in 2 separate modules):

  • If configured to run local attestation
    1. Either call into the AA gRPC API or directly use the attester crate to collect the device attestation evidence.
    2. Use the attestation-service crate to verify the collected attestation evidence.
  • If configured to run remote attestation
    1. Use the AA gRPC API to get an attestation result token.

If that's correct, this simplifies things quite a bit. I'd like for the TDM to not depend on an attestation agent to be running as I think this would make it usable outside of the confidential-containers context. If we want to fully support that, we'd need to be able to use the AA as a standalone crate. I think we're not too far from being able to do that.

@jialez0
Copy link
Member

jialez0 commented Aug 23, 2023

If that's correct, this simplifies things quite a bit. I'd like for the TDM to not depend on an attestation agent to be running as I think this would make it usable outside of the confidential-containers context. If we want to fully support that, we'd need to be able to use the AA as a standalone crate. I think we're not too far from being able to do that.

@sameo This is exactly. As you mentioned, our AA can now provide both gRPC API for remote calls and be integrated as a standalone crate. Therefore, whether it is in local or remote mode, we can freely choose the gRPC or native integration mode of AA according to the configuration.

@sameo
Copy link
Member Author

sameo commented Aug 23, 2023

If that's correct, this simplifies things quite a bit. I'd like for the TDM to not depend on an attestation agent to be running as I think this would make it usable outside of the confidential-containers context. If we want to fully support that, we'd need to be able to use the AA as a standalone crate. I think we're not too far from being able to do that.

@sameo This is exactly. As you mentioned, our AA can now provide both gRPC API for remote calls and be integrated as a standalone crate. Therefore, whether it is in local or remote mode, we can freely choose the gRPC or native integration mode of AA according to the configuration.

Thanks @jialez0 . I'll update the diagrams accordingly.

@sameo
Copy link
Member Author

sameo commented Aug 28, 2023

If that's correct, this simplifies things quite a bit. I'd like for the TDM to not depend on an attestation agent to be running as I think this would make it usable outside of the confidential-containers context. If we want to fully support that, we'd need to be able to use the AA as a standalone crate. I think we're not too far from being able to do that.

@sameo This is exactly. As you mentioned, our AA can now provide both gRPC API for remote calls and be integrated as a standalone crate. Therefore, whether it is in local or remote mode, we can freely choose the gRPC or native integration mode of AA according to the configuration.

Thanks @jialez0 . I'll update the diagrams accordingly.

@jialez0 Let me know if the diagram looks better now.

@sameo sameo force-pushed the topic/tdm branch 4 times, most recently from fd24795 to 20f504c Compare August 28, 2023 13:58
@sameo sameo force-pushed the topic/tdm branch 2 times, most recently from a615b2e to 7df8acd Compare December 4, 2023 05:43
This is the initial commit for the Trusted Device Manager (TDM).
It only contains an architecture document, no implementation is provided
yet.

Signed-off-by: Samuel Ortiz <[email protected]>
@sameo
Copy link
Member Author

sameo commented Dec 4, 2023

@jialez0 @djbw @larrydewey @zvonkok @jyao1 I updated the PR to move to an hybrid approach, where the kernel would rely on the TDM for the device attestation part.

The TDM <-> kernel interaction is now synchronous, and the TDI ownership model is clearer: The kernel owns the TDI, and optionally decides to call into the TDM for attesting to it. This would happen after the kernel detects the TDI, and before it probes it.

@imlk0
Copy link
Contributor

imlk0 commented Dec 4, 2023

Hi @sameo. The new diagrams are great.

There seems to be a problem in the flow.

I think it makes a lot of sense. Let me try to summarize how the TDM attestation module should behave (Regardless of it being split in 2 separate modules):

  • If configured to run local attestation

    1. Either call into the AA gRPC API or directly use the attester crate to collect the device attestation evidence.
    2. Use the attestation-service crate to verify the collected attestation evidence.
  • If configured to run remote attestation

    1. Use the AA gRPC API to get an attestation result token.

In the previous discussion, the attester crate is responsible for collecting the device attestation evidence.

  1. The guest kernel requests the previously registered TDM to attest the
    assigned device. The device attestation evidence is added to the attestation
    request message sent by the guest kernel to the TDM.

And in the new "synchronous flow", the device attestation evidence is sent by guest kernel to the TDM (I don't know the exact way, I'm assuming it's via polling an ioctl fd). And then the device evidence will be passed on to the attester crate via kernel -> TDI Management Module -> Device Attestation Module -> attester crate. This looks confusing since attester crate is actually an "evidence provider", and the flow above doesn't match the attester crate interface.

Please correct me if there are any problems in my understanding. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants