-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podvm-mkosi: dm-verity, reproducible builds & measurements #1606
podvm-mkosi: dm-verity, reproducible builds & measurements #1606
Conversation
This is a big step in our remote attestation story. With introducing a dm-verity protected root file system, the content of our root FS is measured, including binaries and configuration that is part of the image. The root FS is now read-only. Signed-off-by: Paul Meyer <[email protected]>
b593296
to
77ce6e6
Compare
As the root file system is now read-only, files that need to be written must reside in /run. Signed-off-by: Paul Meyer <[email protected]>
As the root file system is now read-only, files that need to be written must reside in /run. Signed-off-by: Paul Meyer <[email protected]>
sshd generates server keys on boot. This change moves the generated keys to /run, as their original destination is now read-only. Fedora uses a custom script 'sshd-keygen' to generate the keys that cannot be configured with the target path, so we need to vendor the script and fix the path there. Signed-off-by: Paul Meyer <[email protected]>
Working with the new images, two information about the image are of fundamental interest both for devs as for users: the vTPM PCR values of the image and whether the image is a debug image or intended for production use. This changes adds printing capabilities for those information. As production images don't allow access via SSH or serial console, we print everything to the serial console after boot. For dev convenience, the debug image will also print the image info on SSH login. Signed-off-by: Paul Meyer <[email protected]>
This enables foremost the serial console. It is helpful for debug images as they allow access through the serial consol for debugging as well as for production images which display PCR values and don't allow login. Signed-off-by: Paul Meyer <[email protected]>
Adding packages and enabling units explicitly that were running before. Signed-off-by: Paul Meyer <[email protected]>
77ce6e6
to
938b8e0
Compare
I built a standalone image for testing and it worked without any issues. Currently testing out a peer-pod scenario. Will update. |
I failed to successfully test e2e flow. The agent-protocol-forwarder is failing to start
This shouldn't have happened as code changes are there. I'll have to recheck my environment and redo this next week. |
podvm-mkosi/Makefile
Outdated
--build-arg CAA_SRC=https://github.com/katexochen/cloud-api-adaptor \ | ||
--build-arg CAA_SRC_REF=feat/mkosi-dmverity-nix \ | ||
--no-cache \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could parameterize those in the the Makefile or just drop before merge. I opt for the latter to reduce the noise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that commit should be dropped.
So the binaries agent-protocol-forwarder and process-user-data under One quick observation was increase in pod startup time. Earlier with ubuntu on t3.small it took roughly 60s. Now it took almost close to 2min.
Will investigate this further next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, also, the ssh debug shell works now and there's a super-cute issue screen 😃
I build an image and ran a pod successfully on it. When failing to launch an encrypted image I noticed that we were using the wrong config file for kata-agent.
cat /etc/systemd/system/kata-agent.service | grep agent-config
ExecStart=/usr/local/bin/kata-agent --config /etc/agent-config.toml
ExecStopPost=/usr/local/bin/kata-agent-clean --config /run/peerpod/agent-config.toml
# Now specified in the agent-config.toml Environment="KATA_AGENT_SERVER_ADDR=unix:///run/kata-containers/agent.sock"
fixed the line, rebuilt the image, pushed it to a gallary and checked the fix - all within 10min (90% of that is gallery publishing). It's hard to overstate the improvement to the developer experience!
...ofc I forgot to build w/ AA_KBC but it should work 🙄
Last login: Fri Dec 1 18:32:19 2023 from 172.16.0.2
...
.-=**++*+=: PodVM image for cloud-api-adaptor
:=**=:. .-+*+-. ---------------------------------
.-+*+-. :=+*=: Part of Confidential Containers project
.-+*=: .-+*+-
-+*=: .-: -+*=. Image variant: debug
:*+ :=*****+-. :**
=*- .-+************=: +*. OS: Fedora Linux 38 (Thirty Eight) x86_64
+*: +****************#%- +*: Host: Virtual Machine Hyper-V UEFI Release v4.1
+*. +************##%@@@= =*: Kernel: 6.6.2-101.fc38.x86_64
+*. +*********#%%@@@@@@= =*: Shell: bash 5.2.21
+*. +********#@@@@@@@@@= =*: Terminal: /dev/pts/0
+*. +********#@@@@@@@@@= =*: CPU: 19/01 (2) @ 2.445GHz
=*: =********#@@@@@@@@%: +*: Memory: 341MiB / 7477MiB (4%)
=*- :=*****#@@@@@%+: **. Disk (/): 243M / 243M (100%)
:*+. :-+*#@@#=. :*+
:+*+: -: .-+*=.
.-+*+-. :=+*=:
.:=*+-: .:=*+=:
:-+*+-. .:=+*=-.
:=+*+**+-.
..
[root@fedora ~]# journalctl -u kata-agent | grep AA
Dec 01 18:30:51 fedora kata-agent[672]: [2023-12-01T18:30:51Z ERROR attestation_agent::rpc::getresource::ttrpc] Call AA-KBC to get resource failed: AA does not support the given KBC module! Module: cc_kbc
Dec 01 18:30:52 fedora kata-agent[672]: [2023-12-01T18:30:52Z ERROR attestation_agent::rpc::getresource::ttrpc] Call AA-KBC to get resource failed: AA does not support the given KBC module! Module: cc_kbc
I noticed the same and opened this issue to look into it |
I can confirm, w/ the fix and a rebuilt AA image layer decryption also works 🎉 |
938b8e0
to
313b5bd
Compare
Link check failure is unrelated, caused by #1508 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -5,9 +5,11 @@ Wants=process-user-data.service | |||
[email protected] process-user-data.service | |||
|
|||
[Service] | |||
ExecStart=/usr/local/bin/kata-agent --config /etc/agent-config.toml | |||
# Temporary fix for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, just haven't opened that issue/PR yet.
Retesting on AWS. I'll update my findings. Overall this looks good to me. |
ExecStart=/usr/local/bin/kata-agent --config /etc/agent-config.toml | ||
# Temporary fix for | ||
ExecStartPre=mkdir -p /run/kata-containers | ||
ExecStart=/usr/local/bin/kata-agent --config /run/peerpod/agent-config.toml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question came to my mind when going through the changes again.
@katexochen can the systemd config files for mkosi based builds be kept under podvm-mkosi directory?
This will give us sometime to cleanly deprecate packer based builds and eventually remove packer based builds or move it out to separate repo.
Also having the required dependencies for mkosi under separate directory will make subsequent changes to mkosi based builds easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
katexochen can the systemd config files for mkosi based builds be kept under podvm-mkosi directory?
Not sure I understand your proposal here. You don't want this PR to have any changes inside of /podvm
? The binaries container files for mkosi are in that directory, and mkosi uses the same /podvm/files
as the legacy images.
There is no way scoping these changes to /podvm-mkosi
, as other components are touched, like process-user-data, adaptor and forwarder, as well as the guest-components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
katexochen can the systemd config files for mkosi based builds be kept under podvm-mkosi directory?
Not sure I understand your proposal here. You don't want this PR to have any changes inside of
/podvm
? The binaries container files for mkosi are in that directory, and mkosi uses the same/podvm/files
as the legacy images.
Yes, I'm looking at having all mkosi related configs and deps under podvm-mkosi to help with deprecating packer based builds.
I created a related issue to track this - #1615
There is no way scoping these changes to
/podvm-mkosi
, as other components are touched, like process-user-data, adaptor and forwarder, as well as the guest-components.
I'm only referring to configs (like the systemd scripts, Dockerfile, Makefile). Apologise if it was not clear.
At a high level, this is what I was thinking
- Copy the systemd config files in
podvm/files/etc
topodvm-mkosi
- Include Makefile.inc in the podvm-mkosi/Makefile to build the binaries (will need some adjustments)
- Move the fedora dockerfiles under podvm-mkosi as fedora is only used for podvm-mkosi builds
I plan to handle these in a separate PR as part of 1615. However, since this PR is touching the systemd files , can these files be copied to podvm-mkosi and used from there?
If it's too much work, then leave it. Will handle it separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the fedora dockerfiles under podvm-mkosi as fedora is only used for podvm-mkosi builds
okay for me, as you said this is independent and has no impact either way
Copy the systemd config files in podvm/files/etc to podvm-mkosi
Include Makefile.inc in the podvm-mkosi/Makefile to build the binaries (will need some adjustments)
I think duplicating this stuff will lead to a mess. There are no mkosi specific modifications to the Makefile.inc, so why would we duplicate that? Regarding the /files, I don't see benifeits either. The changes that this PR is doing must be made to the legacy podvm files, too, otherwise the legacy images will just break. There are no mkosi specific updates to these files. If you really don't want these changes in the legacy images, you would basically need need to fork guest-components and cloud-api-adaptor at least.
In general, I don't think it is a good approach to isolate the two image-builds more than necessary from each other. We want to migrat -- making both approaches fully independent will rather lead to a fork than to a migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The migration is going to take time and I am exploring ways to make it smooth. Also I'm looking at ways to have a path forward for ongoing experimentation (by end users of upstream project and developers) with workloads on peer-pods. This is important to identify functional issues with peer-pods approach.
Currently, I can't experiment with AI use cases involving large models and image sizes as it gets broken with this PR due to removal of the bind mount. We don't have a way to embed the container images in podvm yet. And spinning up large memory instances and keeping them running for experiments and debugging is very costly. Also required instances are not available always in a specific region where the cluster is deployed.
Experimenting with gpu workloads is another example. Supported packages are available only for ubuntu and rhel. We can enable gpu support only for packer based builds and let users experiment with their workload and see if peer-pods approach work for them or not.
In parallel it gives us time to test F37 nvidia packages on F38 mkosi images and if it works, we enable it for mkosi built image. Or wait for F38 packages to be released to enable it in mkosi built image.
I feel isolating the two builds will help with the velocity of the project at this stage and most importantly give us time to focus on mkosi based builds knowing that packer based images are available to continue with experiments both for us and end users of upstream peer-pods project.
Are there other alternatives that we can look into ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I‘ll be able to take a more thorough look into it tomorrow, but I agree, this PR shouldn‘t break packer builds. (If that would be the case, we’d need more reviewers to ack) I‘m optimistic we‘re able introduce this iteratively without having to mirror everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, I can't experiment with AI use cases involving large models and image sizes as it gets broken with this PR due to removal of the bind mount.
Ah, now this is actually a concrete problem we can work on. I asked about the bind mounts some while ago in the slack , and my understanding after that conversation was that the mounts weren't used, that's why I removed them. There are some easy ways to unblock you in this regard:
- We can copy the files like you've said (fine for me if we have a reason like this)
- We can disable the units in the mkosi build
- We can remove the files in the docker build step of the binaries so they are not included in the tar ball (my preferred way)
In parallel it gives us time to test F37 nvidia packages on F38 mkosi images and if it works, we enable it for mkosi built image. Or wait for F38 packages to be released to enable it in mkosi built image.
I would need to check, but I don't think the image has any f38 specific dependencies. We could try to downgrade the full image to f37, but notice that f37 is EOL since yesterday (if I understand their schedule correctly).
And we could also reconsider building both fedora + ubuntu images with mkosi if we can't get AI stuff running on fedora.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove the files in the docker build step of the binaries so they are not included in the tar ball (my preferred way)
Since we clone + build the repo in the docker builds, I guess binaries + config files are somewhat coupled. so there might be inconsistencies if we retrieve /etc/** from somewhere in the file tree (HEAD), while the binaries are copied from a docker layer (built from some specific ref).<
Since there is a plan to keep packer-based builds in-tree, for the time being, masking the units could be viable option.
Is the removal of the bind mounts relevant for the functionality of this PR or is this a drive-by fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've re-added the units in /podvm
and added some lines to remove them during the mkosi build.
I still can't get a working pod deployment on AWS. This time I get an error
Most likely this error has nothing to do with this PR. @stevenhorsman or @mkulke any idea about the above error ? I'm using v0.8.0 on EKS and deployed the containerd (v1.7) via the operator. |
Able to run a pod finally. I had to install |
@bpradipt yes, that’s a known issue w/ nydus and coco 0.8. it will not work ootb on EKS, you need to install fuse |
I could also successfully spin up an AMD SEV-SNP instance. So my basic pod creation workflow is working fine with both cvm and non cvm. |
6cbbc9f
to
988912f
Compare
Removing the mount units as there currently isn't a way to secure them. This triggered a bug in kata-agent/ttrpc-rust as the path of the socket is not created automatically. Adding a workaround to the kata-agent unit until this is fixed upstream. Signed-off-by: Paul Meyer <[email protected]>
Disabling the newer systemd-measure services as they are not needed and use a lot of PCRs. Signed-off-by: Paul Meyer <[email protected]>
It is a commen task to check the pcrs when using measured boot, so we are providing an alias to do that with ease. Signed-off-by: Paul Meyer <[email protected]>
988912f
to
3f221c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks @katexochen
I couldn't test the latest changes due to issues on my side. But based on my previous tests and discussions this looks good.
3f221c4
to
8a347e5
Compare
To include confidential-containers/guest-components#401 Signed-off-by: Paul Meyer <[email protected]>
8a347e5
to
555648a
Compare
Fixed the commit message, which was to long. |
This introduces dm-verity protection for the root disk as well as fully reproducible builds.
Developers should notice that the
/
is now read only. All writes at runtime should happen to/run
.