Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat | Implement initdata for bare-metal/qemu hypervisor #10163

Closed
wants to merge 2 commits into from

Conversation

Xynnn007
Copy link
Contributor

@Xynnn007 Xynnn007 commented Aug 15, 2024

Much thanks to @arronwy and @ChengyuZhu6 's great help upon this. This PR resolves #9468. Mainly including both runtime(qemu) and kata-agent side change to support initdata (TDX and SNP in this PR).

There are breakage changes

  1. Move the guest components initialization code into the handler of SetInitData RPC. For deployments whose io.katacontainers.config.runtime.cc_init_data empty, an empty SetInitData will be performed to launch guest-components with default configs, this also works for legacy kata.
  2. SetPolicy would be called by SetInitData if policy.rego is included inside initdata toml.

Btw, I did not delete the SetPolicy API in this PR but seems we need to.

Still testing and debugging, and not sure it works for peerpod integration. So let's mark it as draft.

cc @fitzthum @danmihai1 @huoqifeng @stevenhorsman

@katacontainersbot katacontainersbot added the size/huge Largest and most complex task (probably needs breaking into small pieces) label Aug 15, 2024
@danmihai1
Copy link
Member

Btw, I did not delete the SetPolicy API in this PR but seems we need to.

@Xynnn007, I will look into deleting SetPolicy after we get the Init Data working.

Copy link
Contributor

@mkulke mkulke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the plan is to perform the initdata verification imto the launch of attestation-agent, before it creates a socket? or do we keep running the agent until a remote attestation is attempted?

const CDH_CONFIG_PATH: &str = "/run/cdh.toml";

/// Path of ocicrypt config file. This is used by image-rs when decrypting image.
const OCICRYPT_CONFIG_PATH: &str = "/tmp/ocicrypt_config.json";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move that to /run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can. What's the reason inside this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a rootfs /tmp might not be writeable and for consistency, since we store files created at runtime in /run ususally

src/agent/src/initdata.rs Outdated Show resolved Hide resolved
#[serde(rename = "cdh.toml")]
cdh_config: Option<String>,
#[serde(rename = "policy.rego")]
policy: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that should probably be behind the agent-policy feature flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature flag is here. IMO the structure is fine to have these three fields, but we should raise an info that the rego field would not be used when agent-policy is not enabled. wdyt? @danmihai1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xynnn007 , I would:

  • Use a new init-data optional feature (perhaps enabled by default in Kata CI images), and
  • Automatically enable agent-policy when init-data is enabled.

@Xynnn007
Copy link
Contributor Author

I suppose the plan is to perform the initdata verification imto the launch of attestation-agent, before it creates a socket? or do we keep running the agent until a remote attestation is attempted?

I prefer the former. What is your consideration upon this?

req: protocols::agent::SetInitdataRequest,
) -> ttrpc::Result<Empty> {
trace_rpc_call!(ctx, "set_initdata", req);
crate::initdata::do_set_initdata(&req).await.map_err(|e| {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks it will overwrite the initdata set by process-user-data in PeerPod implemented in confidential-containers/cloud-api-adaptor#1895, Is there a plan to support initdata's hostpath mount in longer term plan?

@mkulke
Copy link
Contributor

mkulke commented Sep 2, 2024

I suppose the plan is to perform the initdata verification imto the launch of attestation-agent, before it creates a socket? or do we keep running the agent until a remote attestation is attempted?

I prefer the former. What is your consideration upon this?

yup, I'd have the same preference.

This commit adds the implementation of initdata setting on the runtime
side. If initdata annotation is given, the kata-runtime will ensure that
the first RPC called to kata-agent is the SetInitdata. This commit also
implements both TDX and SNP platform initdata setting.

To align with both legacy kata and the CoCo scenarios where initdata is
not used, an empty SetInitdata call will be performed if no cc_init_data
annotation is set.

Signed-off-by: Xynnn007 <[email protected]>
This commit adds the handling of initdata on the kata-agent side. Also
changes the launch logic of guest-components.

As we assume that the first api called by kata-runtime is SetInitData,
thus the guest-components would be launched inside the handler of
SetInitData. On the runtime side, if no initdata annotation is set, an
empty SetInitData request will be performed, thus guest components would
be launch with default config.

Signed-off-by: Xynnn007 <[email protected]>
@davidhadas
Copy link

How can we extend this work to also support CAA configuration that need to be measured, same as CDH configuration or AA configuration?

See confidential-containers/cloud-api-adaptor#2072
And https://cloud-native.slack.com/archives/C04A2EJ70BX/p1729067445890829
Cc @bpradipt , @stevenhorsman

@davidhadas
Copy link

davidhadas commented Oct 20, 2024

This PR will not allow configuring PeerPods agent-protocol-forwarder (APF) since APF is active prior to the kata-agent. Further, under peer-pods SecureComms, the communication between the runtime and the kata-agent is performed via the cloud-api-adaptor (CAA) and APF and delayed till after the CAA connects to the APF and completes podvm attestation and retrieval of keys from Trustee.

This PR apparently requires that the order of things will be:

  1. runtime connect to kata agent and use the InitData RPC
  2. AA and CDH initialize
  3. Attestation is done and keys retrieved

Peer-pods now rely on the following order:

  1. CAA connect to APF and use InitData sent via user-data
  2. AA and CDH initialize
  3. Attestation is done and keys retrieved
  4. runtime connect to kata agent - note that this is done after attestation already completed and keys already retrieved so setting/updating InitData at this point seems not what we are after...

A) Can you please clarify how can this PR work with peer-pods and what is the plan for peer-pods?

B) This problem apparently can be fixed by not implementing a InitData RPC and instead using a separate initialization procedure for setting InitData prior to running the kata-agent. This will allow to remove the ordering issue introduced by this PR and keep the kata-agent out of the process of bringing up the podvm. Please reconsider if it is the right thing to bake podvm initialization into kata-agent and what may be the alternatives.

@mkulke
Copy link
Contributor

mkulke commented Oct 28, 2024

Following up on a discussion in the Trustee community call on 2024-10-21, I tried to assess how the approach presented in this PR (and the related PR on guest-components) can be reconciled with the requirements for peerpods.

The point was made that peerpods should aim to not diverge from an integrated initdata approach in kata, guest-components and trustee. IMO that's reasonable, if peerpods wants to minimize maintenance costs and considering further evolution of the attestation-stack in CoCo.

What peerpods does currently

Perpods has a somewhat preliminary implementation of the functionality that is presented in this PR. Initdata is extracted from a pod spec by the cloud-api-adaptor daemonset and provided to the Virtual Machine via User-Data (Cloud-Config) as parameter to a CreateVM call.

Since a peerpod image is leveraging systemd, kata-agent is not the launcher/supervisor of the guest-component processes. Those are launched as systemd services, in a dependency cascade.

A dedicated one-shot service (process-user-data) is:

  • provisioning the InitData files (policy, guest-component configuration files)
  • measuring the InitData Body
  • extending the TEE evidence with the measurement (binding init-data to the TEE).

Once the files are provisioned, kata-agent and the guest-components are being launched.

Init-Data can be verified as part of the TEE verification process in trustee/attestation-service.

(A nice side-effect of this implementation is the option to specify a coco-wide default init-data, that the daemonset will inject in each coco pod)

How peerpod's impl could be reconciled with this PR

CAA daemonset is not concerned with Init-data.
Peerpod's process-user-data is not concerned with init-data.

Kata-agent is still not managing guest-component process directly

  • AFAIK this is a requirement due to network namespace isolation specifics in peerpod VMs

Kata-agent is processing init-data:

  • setting the policy
  • provisioning guest-component config files
  • producing an init-data-digest
  • extending TEE evidence with the init-data digest

Systemd units watch the config files and launch AA

  • with --initdata base64(init-data-digest)

Problems

Binding of initdata to TEE

Currently there is an assumption encoded in initdata-related code: initdata is bound to a TEE by performing an equality check with either HOSTDATA on SNP or MRCONFIG on TDX architectures (verify_init_data). This field can be set at vm launch and is part of the signed attestation report.

Peerpod TEEs like IBM SE and azure CVMs do not provide a similar facility but allow extension of TEE evidence with a measurement, in place of a verification of a pre-calcuated digest. We would need to generalize "verify_init_data()" to "bind_init_data()" with TEE-specific implementations to perform the binding.

Secure Communication

Secure communication is a feature that leverages KBS to secure the tunnel between a peerpod worker-node and the kata-agent process on the remote VM (which would be just vsock on bare-metal). If this feature is enabled, init-data cannot be "monopolized" by kata-runtime and kata-agent, as init-data would be required to bootstrap kata-agent in the first place.

I don't see a trivial way out of this. Either peerpod is a following a bespoke approach in handling init-data or the suggested implementation is adjusted to accommodate peerpod's design (e.g. replacing RPC with a host-provided init-data block device)

A non-trivial way out of this would be maybe to separate the concerns of kata+init-data from what peerpods+secure-communications needs. one of the various kbs cli tools or the new plugin architecture in kbs could maybe help with that.

Comment on lines +328 to +340
if gc_procs != GuestComponentsProcs::None {
if !attestation_binaries_available(logger, &gc_procs) {
warn!(
logger,
"attestation binaries requested for launch not available"
);
}
}

// skip launch of any guest-component
if agent_config.guest_components_procs == GuestComponentsProcs::None {
return Ok(());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if gc_procs != GuestComponentsProcs::None {
if !attestation_binaries_available(logger, &gc_procs) {
warn!(
logger,
"attestation binaries requested for launch not available"
);
}
}
// skip launch of any guest-component
if agent_config.guest_components_procs == GuestComponentsProcs::None {
return Ok(());
}
// skip launch of any guest-component
if agent_config.guest_components_procs == GuestComponentsProcs::None {
return Ok(());
}
if !attestation_binaries_available(logger, &gc_procs) {
warn!(
logger,
"attestation binaries requested for launch not available"
);
}

}

async fn launch_attestation_agent(aa_config: Option<String>, initdata: String) -> Result<()> {
if Path::new(AA_PATH).exists() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have already established that the AA bin exists at the caller site, so I think it's fair to fail hard here, non-presence of that file would be a bug.

In any case I would return early to reduce nesting:

if !Path::new(AA_PATH).exists() {
  return Ok(());
}

sl!(),
"Launch Attestation Agent with config delivered via Initdata"
);
tokio::fs::write(AA_CONFIG_PATH, config.as_bytes()).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the logic of writing an AA config file out of the lauch-process fn? We will also need to write a config file if we launch AA via systemd.

}

async fn launch_confidential_data_hub(cdh_config: Option<String>) -> Result<()> {
if Path::new(CDH_PATH).exists() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see other comment on launch_aa fn

sl!(),
"Launch Confidential Data Hub with config delivered via Initdata"
);
tokio::fs::write(CDH_CONFIG_PATH, config.as_bytes()).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the logic of writing an CDH config file out of the lauch-process fn? We will also need to write a config file if we launch CDH via systemd.

}
};

args.push("--initdata".into());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we specify the initdata digest as file w/ raw bytes? (e.g /run/initdata.digest).

That would make it easier if AA is launched via systemd. otherwise, the logic of picking the hash alg and calculating the digest has to be reimplemented and performed elsewhere.

@bpradipt
Copy link
Contributor

bpradipt commented Nov 4, 2024

Much thanks to @arronwy and @ChengyuZhu6 's great help upon this. This PR resolves #9468. Mainly including both runtime(qemu) and kata-agent side change to support initdata (TDX and SNP in this PR).

There are breakage changes

  1. Move the guest components initialization code into the handler of SetInitData RPC. For deployments whose io.katacontainers.config.runtime.cc_init_data empty, an empty SetInitData will be performed to launch guest-components with default configs, this also works for legacy kata.
  2. SetPolicy would be called by SetInitData if policy.rego is included inside initdata toml.

Btw, I did not delete the SetPolicy API in this PR but seems we need to.

Still testing and debugging, and not sure it works for peerpod integration. So let's mark it as draft.

cc @fitzthum @danmihai1 @huoqifeng @stevenhorsman

@Xynnn007 @fitzthum

As I understand, this PR will be coupling initdata handling with kata-agent. So any Kata VM entities starting after kata-agent and requiring measured bootstrap config can use this approach.
However, we have some use cases where we would want to use initdata before starting of kata-agent. For example, in the case of kata-remote (peer pods) we would want to use initdata to provide a measured bootstrap configuration to retrieve certificates for establishing an ssh tunnel. Likewise, for kata baremetal we would want to use the initdata to provide a measured bootstrap configuration to retrieve keys to decrypt an encrypted partition.

There are few other use cases as well. Eventually the key requirement boils down to providing measured bootstrap configuration to entities that cannot rely on the configuration provided by the kata-agent because these entities must initialise before kata-agent initialisation.

Any recommendations ?

Since the PR is still draft, hope we can collectively work to address different requirements.

@fitzthum
Copy link
Contributor

fitzthum commented Nov 4, 2024

@bpradipt we will be covering these questions at the community meeting this Thursday.

My basic view is that this PR is a good approach for implementing init-data in kata. I don't think it makes much sense to introduce a new component for Kata or significantly rework the flow.

That said, this PR doesn't necessarily define how peer pods should work. Ideally the two codebases would converge but it's possible that peer pods has unique requirements here and needs a different solution. This is something we'll have to discuss although I think it's mainly a question for the peer pods community.

Finally, I think one of the reasons that there is some tension around protecting the API of the kata agent (i.e. secure comms) is that this simply is not part of the trust model that coco with kata uses. It's a separate question whether it should be, but it's not too surprising that it is creating some issues.

@bpradipt
Copy link
Contributor

bpradipt commented Nov 5, 2024

@bpradipt we will be covering these questions at the community meeting this Thursday.

My basic view is that this PR is a good approach for implementing init-data in kata. I don't think it makes much sense to introduce a new component for Kata or significantly rework the flow.

I may be missing some history with this particular feature. But let me ask it here. Were there discussions around pros and cons of having a new component to handle initdata ?

That said, this PR doesn't necessarily define how peer pods should work. Ideally the two codebases would converge but it's possible that peer pods has unique requirements here and needs a different solution. This is something we'll have to discuss although I think it's mainly a question for the peer pods community.

As mentioned earlier, my requirements are not solely tied to Kata-remote (peer-pods). I have similar requirements for Kata Qemu (bare metal) as well. My comments in this issue intend to bring forward the requirements as we learn from potential users of CoCo and its components so that collectively, we can make conscious decisions for the project.

I understand that this implementation is specific to the kata-agent. But there is also a possibility of having reusable components in the CoCo project to provide measured bootstrap configuration to Kata-agent or any other entities).

Returning to peer-pods, we do have a way to provide a measured bootstrap config that is not dependent on this PR but is specific for peer-pods.
Imho everyone building their specific solution for the exact requirement is not a good approach.

The question that remains is how we can provide a consistent bootstrap configuration to entities that need to run in the Kata VM, without being dependent on the kata-agent?

As a community we may decide not to address the requirement of providing measured bootstrap config to any entity other than the kata-agent. That's fair as well (but restrictive). But it should be a well thought about decision.

Finally, I think one of the reasons that there is some tension around protecting the API of the kata agent (i.e. secure comms) is that this simply is not part of the trust model that coco with kata uses. It's a separate question whether it should be, but it's not too surprising that it is creating some issues.

The initial discussion started by David was specific to secure comms but let's not discount the other use cases which are unrelated securecomms or Kata remote hypervisor (peer-pods)

@fitzthum
Copy link
Contributor

fitzthum commented Nov 8, 2024

From the meeting it seems like the main concern with this approach is that it makes it difficult for something that runs before the Kata Agent to be configured by the Kata Agent. I see two options.

  1. Don't support these use cases for the time being. We've been working on init data for an extremely long time, so maybe we should just get something relatively simple (i.e. this PR) in. @bpradipt mentioned some use cases that might not be supported, but it seems like we're still in the early stages of designing these integrations. I'm not totally sure how firm or immediate these requirements are. Perhaps it would be reasonable to simply say that extra daemons added to the guest that require configuration via init-data need to start after the Kata Agent.

  2. Introduce an additional component that runs before the Kata Agent and handles the init-data. @burgerdev @Xynnn007 and @magowan have all suggested something along these lines. There are some advantages. For instance, we would be able to use the init-data to configure the kata agent itself and we might be able to share more code with peer pods. On the other hand this would pull the coco implementation away from kata a little bit. It also probably wouldn't be compatible with the agent running as init. We'd need to rethink the startup sequences a little bit and it seems like it would be more complex.

I lean more towards 1, but I am open to either approach. If we go with the second one we would probably want to add this new thing to guest components or maybe just extend the AA.

@mkulke
Copy link
Contributor

mkulke commented Nov 12, 2024

personally, I think 2) is more elegant, rather than the kata-agent waiting for a rpc to happen and then spawning auxiliary processes in this case. it doesn't feel much cleaner than a half-initialized attestation-agent, but this is bikeshedding to a degree. it's also not something that cannot be revised later.

I think peerpods should be able to accommodate either (w/ a few minor fixes suggested in this PR). it should not configure the agent dynamically, especially via untrusted, unmeasured values. it might be tempting to ship all sort of configuration options via user-data to the TEE, but that has a questionable security posture that can undermine confidentiality.

If we pick 1) the secure communication feature cannot be built on top of coco initdata. I would recommend implementing this scheme independent of kata. It aims to bootstrap an SSH communication channel between 2 trusted nodes, facilitated by remote attestation and secure key release. this should generalize well and doesn't need to be coupled to kata, it might be useful beyond confidential-containers and I imagine it would be a good fit for the the plugin feature in kbs, similar to what is planned w/ nebula.

@magowan
Copy link

magowan commented Nov 13, 2024

I believe that initdata should not depend upon kata-agent having started and that kata-agent should not be the control point for measuring,attesting and distributing configuration to all other code within the VM. For me initdata delivers measureable initialisation to the VM , a portion of which is needed/used by kata-agent.

We already have examples where forcing the kata-agent to perform the role of controlling all measured initdata causes friction. For me the kata-agent should focus on the management and running of containers within the VM and therefore measurement and attestation of configuration relevent to this.
In different environments there will be other components running within the VM beyond the kata-agent which will need configuring. And this configuration would need measured and attested in some way.
This means Yes CoCo is different to Kata and it is CoCo that considers the security and initialisation of all components that may be used within the VM.

We observed Trustee is not specific to CoCo, kubernetes or Kata, I'd want to avoid binding the attestation-agent to the kata-agent so tightly, if we acknowledge that other components within the VM should be allowed to use the attestation-agent at later points then why do we restrict this from happening earlier before the kata-agent has started?

In terms of what can be configured for kata-agent or any other component that to me is a different discussion on a per component basis and separate to the ability to deliver initdata to the VM which will be measured and can be attested from very early in the VM initialisation cycle (before Kata-agent) starts.

For us a key example beyond things already mentioned is with Secure Execution we have the ability to use encrypted VM images. We can initialise our VM with a portion of initdata encrypted which can be decrypted at boot by a secret baked into the image. The encrypted initdata can be measured and attested however a key thing for us is that decrypting this portion of initdata must happen before any other component starts. This allows us to protect the decryption process from any other component that may eventually start.

@fitzthum
Copy link
Contributor

Would it be a nicer compromise if rather than RPC, we used a file-based approach where the kata agent expects to find the init-data in a certain place? Either the agent mounts a block device there for bare metal or some other thing prepares it. Kata would enforce checking the init-data and propagating it to the various config files.

Another thing to think about is that we could potentially check the init-data more than once. For people who want to introduce new components running before the agent, they could use a one-shot version of the AA to check the init-data before carrying on with the usual flow. We'd need to be careful about cases where checking the init-data has side-effects, but that should be possible.

@mkulke
Copy link
Contributor

mkulke commented Nov 14, 2024

Would it be a nicer compromise if rather than RPC, we used a file-based approach where the kata agent expects to find the init-data in a certain place? Either the agent mounts a block device there for bare metal or some other thing prepares it. Kata would enforce checking the init-data and propagating it to the various config files.

following @magowan's comment, does kata-agent necessarily have to meddle in attestation affairs? I imagine we can still perform the binding of init-data to the TEE as part of attestation-agent startup logic (like proposed in this PR).

kata-agent can be enlightened to understand initdata to extract the policy (e.g. via an initdata crate in guest-components) and attestation-agent (+CDH) would also be configured with initdata, pulling out the values it needs.

the current initdata spec sort of reifies that initdata is an arbitrary map of filename/content and that there will be some mechanism (in the agent) that will provision content to files at a well-known location on the fs, for the policy engine in the agent and the child daemons to pick it up later.

it works, but there's a lot of nesting and it incurs potential redundancies (like the same kbs cert body in both AA and CDH config files):

[data]
"cdh.toml" = '''
kbs_cert = """
01a...2bc
"""
'''
"policy.rego" = '''
{}
'''

since we own the whole stack and its configuration schema, we could remove the file indirection and just specify initdata more explicitly

[data]
policy = """
{}
"""
[data.kbs]
cert = """
01a...2bc
"""

edit: but this is not required. attestation-agent spawned w/ --initdata /run/initdata could still perform the unnesting of aa.toml itself.

@magowan
Copy link

magowan commented Nov 14, 2024

Would it be a nicer compromise if rather than RPC, we used a file-based approach where the kata agent expects to find the init-data in a certain place? Either the agent mounts a block device there for bare metal or some other thing prepares it. Kata would enforce checking the init-data and propagating it to the various config files.

Another thing to think about is that we could potentially check the init-data more than once. For people who want to introduce new components running before the agent, they could use a one-shot version of the AA to check the init-data before carrying on with the usual flow. We'd need to be careful about cases where checking the init-data has side-effects, but that should be possible.

Yes I like this compromise idea of file-based approach. The idea being the initdata can be present very early in the boot process before pretty much any "component" needs to start.

And the principle of being able to check init-data more than once sounds good .

We can be very clear what Kata-agent expects to find in initdata and what kata-agent will take responsibility for configuring. Leaving it open for there to be other components that can be configured by init-data of which the kata-agent is not aware. (Though their configuration will still form part of attested initdata.)

@mkulke
Copy link
Contributor

mkulke commented Nov 14, 2024

And the principle of being able to check init-data more than once sounds good .

that's tricky if the binding involves side effects, like Tobin mentioned. for HOSTDATA == digest(initdata) check that's not a problem. If you do extend_pcr(initdata) you'd have to go through an event log though, to check whether you already performed a binding to the TEE. doable, but a nuisance.

I would prefer attestation-agent to have a monopoly on TEE-HW interactions, at least within the kata + guest components realm.

@burgerdev
Copy link
Contributor

[...] the kata-agent should focus on the management and running of containers within the VM and therefore measurement and attestation of configuration relevent to this.

We observed Trustee is not specific to CoCo, kubernetes or Kata, I'd want to avoid binding the attestation-agent to the kata-agent so tightly, if we acknowledge that other components within the VM should be allowed to use the attestation-agent at later points then why do we restrict this from happening earlier before the kata-agent has started?

I would even go a step further: the Kata agent should focus on management and running of containers - period. Measuring the agent configuration, generating attestation statements and management of VM-level processes should be done in separate, dedicated components ("process-init-data"1, attestation agent and systemd, respectively). Tightly scoped components are easier to reason about, especially when it comes to security. Having less optional dependencies and conditional execution in the agent would also play nicer with non-CoCo use cases.

Footnotes

  1. https://cloud-native.slack.com/archives/C039JSH0807/p1731073209673509?thread_ts=1730991933.184159&cid=C039JSH0807

@Xynnn007
Copy link
Contributor Author

Xynnn007 commented Nov 21, 2024

Inspiring discussion folks. It seems that we may converge on the idea of ​​a separate component to handle the initdata file, not coupled with the kata-agent. Let me try to sum it up and hopefully I will not break any hearts.

@burgerdev gives a good design principle that different components are better to take different decoupled responsibility.

Combined with @fitzthum (fs way, supported by @magowan )and @mkulke 's suggestion (AA handles all TEE HW interactions). We could have an overall design for baremetal kata

  1. The kata-runtime will create a block device that includes the content of initdata. We use block device rather than virtio-fs for security reasons from @fidencio . See @arronwy 's comments
  2. An oneshot systemd service (like process-user-data in peer-pods) to check if the block device exists. If so it will parse and extract them into config files to specific paths.
  3. From now on kata baremetal is the same as Peerpods. AA will then run with --initdata parameter due to PR and the same to CDH.

I think this is what @huoqifeng once suggested in confidential-containers/confidential-containers#171 (comment)

Some potential questions

  1. @mkulke mentioned that extend_pcr(initdata) way for AA.

We might not need to recheck the PCR binding. As when we do attestation, it can be lazily checked by trustee. Before that, the extended event cannot be tampered.

  1. process-user-data does not check the initdata binding. What if, for example, modify the configuration file of CDH and modify the systemd configuration of CDH to point to the malicious configuration, but put the reasonable configuration under the path of AA inspection.

The rootfs integrity is already protected by the dm-verity. Thus the process-user-data cannot be a tempered one. When doing remote attestation, the rootfs hash will be checked. This rootfs hash record cannot be tempered in the guest level.

So sorry to be late for the implementation process. I am struggling with setting a CoCo dev env these days. Much thanks from help of @fidencio.

@fitzthum
Copy link
Contributor

We discussed init-data at the community meeting.

Generally people seem happy to move away from the RPC approach to the block device.

The big question about the proposal above is how it would work with deployments that don't use systemd. Currently initrd-based guests don't use systemd. Instead, the Kata agent is the init process. This mainly follows from the existing Kata implementation.

We could potentially decide that CoCo will only support deployments with systemd. This would means that the PR would also have to add systemd to the initrds. I'm not sure what the Kata upstream view on this would be.

In some ways only supporting systemd is appealing. The kata agent might not be as well-suited to starting child processes as systemd, which is designed to do exactly that. In our discussions so far most people have seemed to find systemd apealing. On the other hand, systemd adds size and complexity to the guest. The bigger problem is that adopting systemd everywhere would probably take a few more rounds of discussion that aren't guaranteed to converge. So the real question might not be whether we should use systemd, but whether it's worth deciding if we should use systemd.

If we don't want to mandate systemd, we have some options. We could still use a block device to pass in the init-data. The Kata Agent can check if it is the init process, so it could either read the initdata from a file (if it is not init) or mount the block device itself. This is very similar to what @Xynnn007 suggested above but with a tweak to make sure things still work without systemd. This would also allow us to gracefully transition to only using systemd if decide to do that in the future.

@mkulke
Copy link
Contributor

mkulke commented Nov 21, 2024

On the other hand, systemd adds size and complexity to the guest.

regarding the size, do we have numbers? afaik, systemd is suitable for and being used in embedded contexts, I don't think systemd, the core binary adds a lot of weight. systemd can run also run in the initrd.

Regarding the complexity, I would disagree. The complexity is inherent, we wouldn't have to introduce systemd for that. we have to do orchestrate a few components in a dependency tree that perform IPC among each other. we are doing this imperatively in code and are cutting a few corners (logging, restarts, cgroups, ...), but the complexity doesn't go away.

If we don't want to mandate systemd, we have some options. We could still use a block device to pass in the init-data. The Kata Agent can check if it is the init process, so it could either read the initdata from a file (if it is not init) or mount the block device itself. This is very similar to what @Xynnn007 suggested above but with a tweak to make sure things still work without systemd. This would also allow us to gracefully transition to only using systemd if decide to do that in the future.

Personally, from a decision engineering perspective, I think that's an ok way forward. It's a path that is open to future refinements and doesn't venture into the wrong direction.

Does it address the principal bootstrapping problem that was initially raised here regarding the secure communication feature in peerpod?

I think it does. The agent is still provisioning files, but it doesn't receive the to-be-provisioned data via RPC. it will process the initdata, launch a bunch of processes and listen on the agent socket for RPC. the secure communication feature should have everything it needs: guest-component services and kata-agent are configured, launched and listening, so a SSH tunnel to the k8s node can be facilitated, through which the agent will be able to receive RPC.

@zvonkok
Copy link
Contributor

zvonkok commented Nov 21, 2024

Need to read the complete thread, but what about introducing a VM service module like coconut-svsm or OpenHCL to handle the proper initdata handling?
AFAIK OpenHCL e.g. can populate the vTPM with a state.

@fitzthum
Copy link
Contributor

Need to read the complete thread, but what about introducing a VM service module like coconut-svsm or OpenHCL to handle the proper initdata handling?
AFAIK OpenHCL e.g. can populate the vTPM with a state.

Could make sense but will take a while for that to be available and integrated with CoCo.

@Xynnn007
Copy link
Contributor Author

Xynnn007 commented Dec 3, 2024

Ok. Let me close this PR and open a new one following the discussed results

@Xynnn007
Copy link
Contributor Author

Xynnn007 commented Dec 3, 2024

New PR to follow #10610

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/huge Largest and most complex task (probably needs breaking into small pieces)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Initdata Implementation