-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat | Implement initdata for bare-metal/qemu hypervisor #10163
Conversation
@Xynnn007, I will look into deleting SetPolicy after we get the Init Data working. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose the plan is to perform the initdata verification imto the launch of attestation-agent, before it creates a socket? or do we keep running the agent until a remote attestation is attempted?
const CDH_CONFIG_PATH: &str = "/run/cdh.toml"; | ||
|
||
/// Path of ocicrypt config file. This is used by image-rs when decrypting image. | ||
const OCICRYPT_CONFIG_PATH: &str = "/tmp/ocicrypt_config.json"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move that to /run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can. What's the reason inside this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on a rootfs /tmp might not be writeable and for consistency, since we store files created at runtime in /run ususally
#[serde(rename = "cdh.toml")] | ||
cdh_config: Option<String>, | ||
#[serde(rename = "policy.rego")] | ||
policy: Option<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that should probably be behind the agent-policy
feature flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature flag is here. IMO the structure is fine to have these three fields, but we should raise an info that the rego field would not be used when agent-policy is not enabled. wdyt? @danmihai1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Xynnn007 , I would:
- Use a new init-data optional feature (perhaps enabled by default in Kata CI images), and
- Automatically enable agent-policy when init-data is enabled.
I prefer the former. What is your consideration upon this? |
req: protocols::agent::SetInitdataRequest, | ||
) -> ttrpc::Result<Empty> { | ||
trace_rpc_call!(ctx, "set_initdata", req); | ||
crate::initdata::do_set_initdata(&req).await.map_err(|e| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks it will overwrite the initdata set by process-user-data in PeerPod implemented in confidential-containers/cloud-api-adaptor#1895, Is there a plan to support initdata's hostpath mount in longer term plan?
yup, I'd have the same preference. |
This commit adds the implementation of initdata setting on the runtime side. If initdata annotation is given, the kata-runtime will ensure that the first RPC called to kata-agent is the SetInitdata. This commit also implements both TDX and SNP platform initdata setting. To align with both legacy kata and the CoCo scenarios where initdata is not used, an empty SetInitdata call will be performed if no cc_init_data annotation is set. Signed-off-by: Xynnn007 <[email protected]>
5b2173e
to
fc7950e
Compare
This commit adds the handling of initdata on the kata-agent side. Also changes the launch logic of guest-components. As we assume that the first api called by kata-runtime is SetInitData, thus the guest-components would be launched inside the handler of SetInitData. On the runtime side, if no initdata annotation is set, an empty SetInitData request will be performed, thus guest components would be launch with default config. Signed-off-by: Xynnn007 <[email protected]>
fc7950e
to
56dd95d
Compare
How can we extend this work to also support CAA configuration that need to be measured, same as CDH configuration or AA configuration? See confidential-containers/cloud-api-adaptor#2072 |
This PR will not allow configuring PeerPods agent-protocol-forwarder (APF) since APF is active prior to the kata-agent. Further, under peer-pods SecureComms, the communication between the runtime and the kata-agent is performed via the cloud-api-adaptor (CAA) and APF and delayed till after the CAA connects to the APF and completes podvm attestation and retrieval of keys from Trustee. This PR apparently requires that the order of things will be:
Peer-pods now rely on the following order:
A) Can you please clarify how can this PR work with peer-pods and what is the plan for peer-pods? B) This problem apparently can be fixed by not implementing a InitData RPC and instead using a separate initialization procedure for setting InitData prior to running the kata-agent. This will allow to remove the ordering issue introduced by this PR and keep the kata-agent out of the process of bringing up the podvm. Please reconsider if it is the right thing to bake podvm initialization into kata-agent and what may be the alternatives. |
Following up on a discussion in the Trustee community call on 2024-10-21, I tried to assess how the approach presented in this PR (and the related PR on guest-components) can be reconciled with the requirements for peerpods. The point was made that peerpods should aim to not diverge from an integrated initdata approach in kata, guest-components and trustee. IMO that's reasonable, if peerpods wants to minimize maintenance costs and considering further evolution of the attestation-stack in CoCo. What peerpods does currentlyPerpods has a somewhat preliminary implementation of the functionality that is presented in this PR. Initdata is extracted from a pod spec by the cloud-api-adaptor daemonset and provided to the Virtual Machine via User-Data (Cloud-Config) as parameter to a CreateVM call. Since a peerpod image is leveraging systemd, kata-agent is not the launcher/supervisor of the guest-component processes. Those are launched as systemd services, in a dependency cascade. A dedicated one-shot service (process-user-data) is:
Once the files are provisioned, kata-agent and the guest-components are being launched. Init-Data can be verified as part of the TEE verification process in trustee/attestation-service. (A nice side-effect of this implementation is the option to specify a coco-wide default init-data, that the daemonset will inject in each coco pod) How peerpod's impl could be reconciled with this PRCAA daemonset is not concerned with Init-data. Kata-agent is still not managing guest-component process directly
Kata-agent is processing init-data:
Systemd units watch the config files and launch AA
ProblemsBinding of initdata to TEECurrently there is an assumption encoded in initdata-related code: initdata is bound to a TEE by performing an equality check with either HOSTDATA on SNP or MRCONFIG on TDX architectures (verify_init_data). This field can be set at vm launch and is part of the signed attestation report. Peerpod TEEs like IBM SE and azure CVMs do not provide a similar facility but allow extension of TEE evidence with a measurement, in place of a verification of a pre-calcuated digest. We would need to generalize "verify_init_data()" to "bind_init_data()" with TEE-specific implementations to perform the binding. Secure CommunicationSecure communication is a feature that leverages KBS to secure the tunnel between a peerpod worker-node and the kata-agent process on the remote VM (which would be just vsock on bare-metal). If this feature is enabled, init-data cannot be "monopolized" by kata-runtime and kata-agent, as init-data would be required to bootstrap kata-agent in the first place. I don't see a trivial way out of this. Either peerpod is a following a bespoke approach in handling init-data or the suggested implementation is adjusted to accommodate peerpod's design (e.g. replacing RPC with a host-provided init-data block device) A non-trivial way out of this would be maybe to separate the concerns of kata+init-data from what peerpods+secure-communications needs. one of the various kbs cli tools or the new plugin architecture in kbs could maybe help with that. |
if gc_procs != GuestComponentsProcs::None { | ||
if !attestation_binaries_available(logger, &gc_procs) { | ||
warn!( | ||
logger, | ||
"attestation binaries requested for launch not available" | ||
); | ||
} | ||
} | ||
|
||
// skip launch of any guest-component | ||
if agent_config.guest_components_procs == GuestComponentsProcs::None { | ||
return Ok(()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if gc_procs != GuestComponentsProcs::None { | |
if !attestation_binaries_available(logger, &gc_procs) { | |
warn!( | |
logger, | |
"attestation binaries requested for launch not available" | |
); | |
} | |
} | |
// skip launch of any guest-component | |
if agent_config.guest_components_procs == GuestComponentsProcs::None { | |
return Ok(()); | |
} | |
// skip launch of any guest-component | |
if agent_config.guest_components_procs == GuestComponentsProcs::None { | |
return Ok(()); | |
} | |
if !attestation_binaries_available(logger, &gc_procs) { | |
warn!( | |
logger, | |
"attestation binaries requested for launch not available" | |
); | |
} | |
} | ||
|
||
async fn launch_attestation_agent(aa_config: Option<String>, initdata: String) -> Result<()> { | ||
if Path::new(AA_PATH).exists() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have already established that the AA bin exists at the caller site, so I think it's fair to fail hard here, non-presence of that file would be a bug.
In any case I would return early to reduce nesting:
if !Path::new(AA_PATH).exists() {
return Ok(());
}
sl!(), | ||
"Launch Attestation Agent with config delivered via Initdata" | ||
); | ||
tokio::fs::write(AA_CONFIG_PATH, config.as_bytes()).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move the logic of writing an AA config file out of the lauch-process fn? We will also need to write a config file if we launch AA via systemd.
} | ||
|
||
async fn launch_confidential_data_hub(cdh_config: Option<String>) -> Result<()> { | ||
if Path::new(CDH_PATH).exists() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see other comment on launch_aa fn
sl!(), | ||
"Launch Confidential Data Hub with config delivered via Initdata" | ||
); | ||
tokio::fs::write(CDH_CONFIG_PATH, config.as_bytes()).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move the logic of writing an CDH config file out of the lauch-process fn? We will also need to write a config file if we launch CDH via systemd.
} | ||
}; | ||
|
||
args.push("--initdata".into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we specify the initdata digest as file w/ raw bytes? (e.g /run/initdata.digest
).
That would make it easier if AA is launched via systemd. otherwise, the logic of picking the hash alg and calculating the digest has to be reimplemented and performed elsewhere.
As I understand, this PR will be coupling initdata handling with kata-agent. So any Kata VM entities starting after kata-agent and requiring measured bootstrap config can use this approach. There are few other use cases as well. Eventually the key requirement boils down to providing measured bootstrap configuration to entities that cannot rely on the configuration provided by the kata-agent because these entities must initialise before kata-agent initialisation. Any recommendations ? Since the PR is still draft, hope we can collectively work to address different requirements. |
@bpradipt we will be covering these questions at the community meeting this Thursday. My basic view is that this PR is a good approach for implementing init-data in kata. I don't think it makes much sense to introduce a new component for Kata or significantly rework the flow. That said, this PR doesn't necessarily define how peer pods should work. Ideally the two codebases would converge but it's possible that peer pods has unique requirements here and needs a different solution. This is something we'll have to discuss although I think it's mainly a question for the peer pods community. Finally, I think one of the reasons that there is some tension around protecting the API of the kata agent (i.e. secure comms) is that this simply is not part of the trust model that coco with kata uses. It's a separate question whether it should be, but it's not too surprising that it is creating some issues. |
I may be missing some history with this particular feature. But let me ask it here. Were there discussions around pros and cons of having a new component to handle initdata ?
As mentioned earlier, my requirements are not solely tied to Kata-remote (peer-pods). I have similar requirements for Kata Qemu (bare metal) as well. My comments in this issue intend to bring forward the requirements as we learn from potential users of CoCo and its components so that collectively, we can make conscious decisions for the project. I understand that this implementation is specific to the kata-agent. But there is also a possibility of having reusable components in the CoCo project to provide measured bootstrap configuration to Kata-agent or any other entities). Returning to peer-pods, we do have a way to provide a measured bootstrap config that is not dependent on this PR but is specific for peer-pods. The question that remains is how we can provide a consistent bootstrap configuration to entities that need to run in the Kata VM, without being dependent on the kata-agent? As a community we may decide not to address the requirement of providing measured bootstrap config to any entity other than the kata-agent. That's fair as well (but restrictive). But it should be a well thought about decision.
The initial discussion started by David was specific to secure comms but let's not discount the other use cases which are unrelated securecomms or Kata remote hypervisor (peer-pods) |
From the meeting it seems like the main concern with this approach is that it makes it difficult for something that runs before the Kata Agent to be configured by the Kata Agent. I see two options.
I lean more towards 1, but I am open to either approach. If we go with the second one we would probably want to add this new thing to guest components or maybe just extend the AA. |
personally, I think 2) is more elegant, rather than the kata-agent waiting for a rpc to happen and then spawning auxiliary processes in this case. it doesn't feel much cleaner than a half-initialized attestation-agent, but this is bikeshedding to a degree. it's also not something that cannot be revised later. I think peerpods should be able to accommodate either (w/ a few minor fixes suggested in this PR). it should not configure the agent dynamically, especially via untrusted, unmeasured values. it might be tempting to ship all sort of configuration options via user-data to the TEE, but that has a questionable security posture that can undermine confidentiality. If we pick 1) the secure communication feature cannot be built on top of coco initdata. I would recommend implementing this scheme independent of kata. It aims to bootstrap an SSH communication channel between 2 trusted nodes, facilitated by remote attestation and secure key release. this should generalize well and doesn't need to be coupled to kata, it might be useful beyond confidential-containers and I imagine it would be a good fit for the the plugin feature in kbs, similar to what is planned w/ nebula. |
I believe that initdata should not depend upon kata-agent having started and that kata-agent should not be the control point for measuring,attesting and distributing configuration to all other code within the VM. For me initdata delivers measureable initialisation to the VM , a portion of which is needed/used by kata-agent. We already have examples where forcing the kata-agent to perform the role of controlling all measured initdata causes friction. For me the kata-agent should focus on the management and running of containers within the VM and therefore measurement and attestation of configuration relevent to this. We observed Trustee is not specific to CoCo, kubernetes or Kata, I'd want to avoid binding the attestation-agent to the kata-agent so tightly, if we acknowledge that other components within the VM should be allowed to use the attestation-agent at later points then why do we restrict this from happening earlier before the kata-agent has started? In terms of what can be configured for kata-agent or any other component that to me is a different discussion on a per component basis and separate to the ability to deliver initdata to the VM which will be measured and can be attested from very early in the VM initialisation cycle (before Kata-agent) starts. For us a key example beyond things already mentioned is with Secure Execution we have the ability to use encrypted VM images. We can initialise our VM with a portion of initdata encrypted which can be decrypted at boot by a secret baked into the image. The encrypted initdata can be measured and attested however a key thing for us is that decrypting this portion of initdata must happen before any other component starts. This allows us to protect the decryption process from any other component that may eventually start. |
Would it be a nicer compromise if rather than RPC, we used a file-based approach where the kata agent expects to find the init-data in a certain place? Either the agent mounts a block device there for bare metal or some other thing prepares it. Kata would enforce checking the init-data and propagating it to the various config files. Another thing to think about is that we could potentially check the init-data more than once. For people who want to introduce new components running before the agent, they could use a one-shot version of the AA to check the init-data before carrying on with the usual flow. We'd need to be careful about cases where checking the init-data has side-effects, but that should be possible. |
following @magowan's comment, does kata-agent necessarily have to meddle in attestation affairs? I imagine we can still perform the binding of init-data to the TEE as part of attestation-agent startup logic (like proposed in this PR). kata-agent can be enlightened to understand initdata to extract the policy (e.g. via an initdata crate in guest-components) and attestation-agent (+CDH) would also be configured with initdata, pulling out the values it needs. the current initdata spec sort of reifies that initdata is an arbitrary map of filename/content and that there will be some mechanism (in the agent) that will provision content to files at a well-known location on the fs, for the policy engine in the agent and the child daemons to pick it up later. it works, but there's a lot of nesting and it incurs potential redundancies (like the same kbs cert body in both AA and CDH config files): [data]
"cdh.toml" = '''
kbs_cert = """
01a...2bc
"""
'''
"policy.rego" = '''
{}
''' since we own the whole stack and its configuration schema, we could remove the file indirection and just specify initdata more explicitly [data]
policy = """
{}
"""
[data.kbs]
cert = """
01a...2bc
""" edit: but this is not required. attestation-agent spawned w/ |
Yes I like this compromise idea of file-based approach. The idea being the initdata can be present very early in the boot process before pretty much any "component" needs to start. And the principle of being able to check init-data more than once sounds good . We can be very clear what Kata-agent expects to find in initdata and what kata-agent will take responsibility for configuring. Leaving it open for there to be other components that can be configured by init-data of which the kata-agent is not aware. (Though their configuration will still form part of attested initdata.) |
that's tricky if the binding involves side effects, like Tobin mentioned. for HOSTDATA == digest(initdata) check that's not a problem. If you do extend_pcr(initdata) you'd have to go through an event log though, to check whether you already performed a binding to the TEE. doable, but a nuisance. I would prefer attestation-agent to have a monopoly on TEE-HW interactions, at least within the kata + guest components realm. |
I would even go a step further: the Kata agent should focus on management and running of containers - period. Measuring the agent configuration, generating attestation statements and management of VM-level processes should be done in separate, dedicated components ("process-init-data"1, attestation agent and systemd, respectively). Tightly scoped components are easier to reason about, especially when it comes to security. Having less optional dependencies and conditional execution in the agent would also play nicer with non-CoCo use cases. Footnotes |
Inspiring discussion folks. It seems that we may converge on the idea of a separate component to handle the initdata file, not coupled with the kata-agent. Let me try to sum it up and hopefully I will not break any hearts. @burgerdev gives a good design principle that different components are better to take different decoupled responsibility. Combined with @fitzthum (fs way, supported by @magowan )and @mkulke 's suggestion (AA handles all TEE HW interactions). We could have an overall design for baremetal kata
I think this is what @huoqifeng once suggested in confidential-containers/confidential-containers#171 (comment) Some potential questions
We might not need to recheck the PCR binding. As when we do attestation, it can be lazily checked by trustee. Before that, the extended event cannot be tampered.
The rootfs integrity is already protected by the dm-verity. Thus the So sorry to be late for the implementation process. I am struggling with setting a CoCo dev env these days. Much thanks from help of @fidencio. |
We discussed init-data at the community meeting. Generally people seem happy to move away from the RPC approach to the block device. The big question about the proposal above is how it would work with deployments that don't use systemd. Currently initrd-based guests don't use systemd. Instead, the Kata agent is the init process. This mainly follows from the existing Kata implementation. We could potentially decide that CoCo will only support deployments with systemd. This would means that the PR would also have to add systemd to the initrds. I'm not sure what the Kata upstream view on this would be. In some ways only supporting systemd is appealing. The kata agent might not be as well-suited to starting child processes as systemd, which is designed to do exactly that. In our discussions so far most people have seemed to find systemd apealing. On the other hand, systemd adds size and complexity to the guest. The bigger problem is that adopting systemd everywhere would probably take a few more rounds of discussion that aren't guaranteed to converge. So the real question might not be whether we should use systemd, but whether it's worth deciding if we should use systemd. If we don't want to mandate systemd, we have some options. We could still use a block device to pass in the init-data. The Kata Agent can check if it is the init process, so it could either read the initdata from a file (if it is not init) or mount the block device itself. This is very similar to what @Xynnn007 suggested above but with a tweak to make sure things still work without systemd. This would also allow us to gracefully transition to only using systemd if decide to do that in the future. |
regarding the size, do we have numbers? afaik, systemd is suitable for and being used in embedded contexts, I don't think systemd, the core binary adds a lot of weight. systemd can run also run in the initrd. Regarding the complexity, I would disagree. The complexity is inherent, we wouldn't have to introduce systemd for that. we
Personally, from a decision engineering perspective, I think that's an ok way forward. It's a path that is open to future refinements and doesn't venture into the wrong direction. Does it address the principal bootstrapping problem that was initially raised here regarding the secure communication feature in peerpod? I think it does. The agent is still provisioning files, but it doesn't receive the to-be-provisioned data via RPC. it will process the initdata, launch a bunch of processes and listen on the agent socket for RPC. the secure communication feature should have everything it needs: guest-component services and kata-agent are configured, launched and listening, so a SSH tunnel to the k8s node can be facilitated, through which the agent will be able to receive RPC. |
Need to read the complete thread, but what about introducing a VM service module like coconut-svsm or OpenHCL to handle the proper initdata handling? |
Could make sense but will take a while for that to be available and integrated with CoCo. |
Ok. Let me close this PR and open a new one following the discussed results |
New PR to follow #10610 |
Much thanks to @arronwy and @ChengyuZhu6 's great help upon this. This PR resolves #9468. Mainly including both runtime(qemu) and kata-agent side change to support initdata (TDX and SNP in this PR).
There are breakage changes
SetInitData
RPC. For deployments whoseio.katacontainers.config.runtime.cc_init_data
empty, an emptySetInitData
will be performed to launch guest-components with default configs, this also works for legacy kata.policy.rego
is included inside initdata toml.Btw, I did not delete the SetPolicy API in this PR but seems we need to.
Still testing and debugging, and not sure it works for peerpod integration. So let's mark it as draft.
cc @fitzthum @danmihai1 @huoqifeng @stevenhorsman