Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to virtiofs #3428

Merged
merged 4 commits into from
Sep 7, 2023
Merged

Switch to virtiofs #3428

merged 4 commits into from
Sep 7, 2023

Conversation

cgwalters
Copy link
Member

@cgwalters cgwalters commented Apr 14, 2023

qemu: Refactor memory to actually use memfd

The previous change here didn't break anything, but didn't
actually work for virtiofs because we need to actually reference
the device. It appeared to work for me in local testing
because I accidentally duplicated the logic in the larger virtiofs
PR.


mantle/kola: Add COSA_VIRTIOFS=1 and dual 9p/virtiofs support

In #3428 I tried
a wholesale switch to virtiofs. That's a large change in one go;
it wasn't hard to factor things out so that it becomes a dynamic
choice, keeping the previous 9p support.

This way for e.g. local development one can now
env COSA_VIRTIOFS=1 cosa run --qemu-image rhcos.qcow2 --bind-ro ...

Further work can then build on this to switch to virtiofs by default,
and allow falling back to 9p. Once we're confident in virtiofs
we can drop the 9p support.


Switch to virtiofs by default

Closes: #1812

The key benefits here are:

  • This also works for RHEL (this is a big deal for my dev workflow
    parity)
  • It correctly handles symlinks
  • It's more maintained (e.g. used for kata containers) and hopefully
    we won't hit obscure bugs like the 9p OOM ones.

mantle: Drop 9p support

Per request to avoid carrying it as tech debt.


@dustymabe
Copy link
Member

Closes: #1812

Nice! Thanks for working on this. I guess we figured out if it would run without root privileges.

The key benefits here are:

* This also works for RHEL (this is a _big deal_ for my dev workflow parity)

* It correctly handles symlinks

* It's more maintained (e.g. used for kata containers) and hopefully we won't hit obscure bugs like the 9p OOM ones.

I always found the performance of 9p to be lacking too. Any anecdotal results you'd like to share on if you think this enhances performance?

mantle/platform/qemu.go Outdated Show resolved Hide resolved
@dustymabe
Copy link
Member

dustymabe commented Apr 14, 2023

Does this require running with a RHEL9+ kernel? i.e. in our pipeline we're running on OpenShift which is still RHEL8 based.

If so that could explain the pipeline failures.

@cgwalters cgwalters force-pushed the virtiofs branch 2 times, most recently from fa81b3c to 5b96688 Compare April 15, 2023 14:52
@cgwalters
Copy link
Member Author

cgwalters commented Apr 15, 2023

Does this require running with a RHEL9+ kernel?

The host kernel is not relevant here AFAIK; this is just a protocol between the qemu process and the guest kernel. virtiofs is also enabled in RHEL8, so e.g. my dev workflow of doing things like e.g. cosa run --qemu-image rhcos-4.12.3-qemu.qcow2 --bind-ro /var/srv/walters/,/run/walters to bind mount in my dev container now works.

If so that could explain the pipeline failures.

Nah, it was just racy because we were forking virtiofsd which will asynchronously create the socket, and effectively in parallel forking qemu which looks for it. It worked pretty reliably on my idle 16 core workstation, but the CI environment is much more likely to provoke races like this. I added code to synchronously wait for virtiofsd to create the socket, though a better fix would be to pre-allocate a socketpair.

@cgwalters
Copy link
Member Author

cgwalters commented Apr 15, 2023

Ah I see, looks like there's another unshare(CLONE_FS) that needs to be dropped here. Filed a MR https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/159

@cgwalters cgwalters marked this pull request as draft May 1, 2023 13:16
@cgwalters
Copy link
Member Author

travier
travier previously approved these changes May 10, 2023
Copy link
Member

@travier travier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not tested but code LGTM. Nice to see this happening!

arighi pushed a commit to arighi/virtiofsd that referenced this pull request May 11, 2023
See coreos/coreos-assembler#3428 (comment)

In `--sandbox none` scenarios, the calling process is already in
an isolated container, and we may not be capable of invoking `unshare`
again.

Signed-off-by: Colin Walters <[email protected]>
@cgwalters cgwalters marked this pull request as ready for review July 23, 2023 14:36
@cgwalters
Copy link
Member Author

re #1812 (comment)

Yep, rebased to pull in viritofsd from updates-testing! And I've sanity tested this works still locally (when running fcos/rhcos with --bind-ro).

@cgwalters
Copy link
Member Author

That's exciting...not seeing this locally

[2023-07-23T14:47:36.887Z] error: kvm run failed Bad address
[2023-07-23T14:47:36.887Z] RAX=0000000000000035 RBX=00000000059ec83c RCX=0000000000000006 RDX=0000000000000003
[2023-07-23T14:47:36.887Z] RSI=0000000000000002 RDI=0000000000000032 RBP=0000000076d80024 RSP=0000000005a1dad0
[2023-07-23T14:47:36.887Z] R8 =0000000076d81076 R9 =0000000076d820c8 R10=0000000000000009 R11=0000000076d8311a
[2023-07-23T14:47:36.887Z] R12=04f08b982bd31a7d R13=4f483ab4550a041a R14=000000000000000b R15=2950c34860101a0c
[2023-07-23T14:47:36.887Z] RIP=00000000059c3014 RFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
[2023-07-23T14:47:36.887Z] ES =0000 0000000000000000 00000000 00000000
[2023-07-23T14:47:36.887Z] CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
[2023-07-23T14:47:36.887Z] SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
[2023-07-23T14:47:36.887Z] DS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
[2023-07-23T14:47:36.887Z] FS =0000 0000000000000000 00000000 00000000
[2023-07-23T14:47:36.887Z] GS =0000 0000000000000000 00000000 00000000
[2023-07-23T14:47:36.887Z] LDT=0000 0000000000000000 00000000 00000000
[2023-07-23T14:47:36.887Z] TR =0020 0000000000000000 00000fff 00008b00 DPL=0 TSS64-busy
[2023-07-23T14:47:36.887Z] GDT=     00000000059e8950 0000002f
[2023-07-23T14:47:36.887Z] IDT=     00000000059e8990 000001ff
[2023-07-23T14:47:36.887Z] CR0=80050033 CR2=0000000000000000 CR3=0000000005a2d000 CR4=00000020
[2023-07-23T14:47:36.887Z] DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
[2023-07-23T14:47:36.887Z] DR6=00000000ffff0ff0 DR7=0000000000000400
[2023-07-23T14:47:36.887Z] EFER=0000000000000500
[2023-07-23T14:47:36.887Z] Code=24 10 c4 e2 fb f7 ff 4c 8d 14 7b 41 0f b6 7a 01 45 0f b6 12 <41> 88 78 fc 41 01 f2 c4 c2 f1 f7 f4 c4 e2 fb f7 f6 48 8d 3c 73 0f b6 77 01 0f b6 3f 41 88
[2023-07-23T18:06:09.334Z] Sending interrupt signal to process

travier
travier previously approved these changes Jul 25, 2023
Copy link
Member

@travier travier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't look at CI failures but the code change LGTM

@travier
Copy link
Member

travier commented Jul 25, 2023

/retest

@travier
Copy link
Member

travier commented Jul 26, 2023

OK, looks like there is a bug in QEMU or in our QEMU setup in this PR:

[2023-07-25T15:03:15.151Z] Running: rpm-ostree compose tree --touch-if-changed /srv/tmp/treecompose.changed --cachedir=/srv/cache --unified-core /srv/tmp/override/coreos-assembler-override-manifest.yaml --download-only --ex-lockfile=/srv/src/config/manifest-lock.x86_64.json --ex-lockfile=/srv/src/config/manifest-lock.overrides.yaml --ex-lockfile-strict
[2023-07-25T15:03:15.151Z] info: Missing CAP_SYS_ADMIN; using virt
[2023-07-25T15:03:15.151Z] Formatting 'cache2.qcow2.tmp', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=10737418240 lazy_refcounts=off refcount_bits=16
[2023-07-25T15:03:25.092Z] error: kvm run failed Bad address
[2023-07-25T15:03:25.092Z] RAX=0000000000000035 RBX=00000000059ec83c RCX=0000000000000006 RDX=0000000000000003
[2023-07-25T15:03:25.092Z] RSI=0000000000000002 RDI=0000000000000032 RBP=0000000046d80024 RSP=0000000005a1dad0
[2023-07-25T15:03:25.092Z] R8 =0000000046d81076 R9 =0000000046d820c8 R10=0000000000000009 R11=0000000046d8311a
[2023-07-25T15:03:25.092Z] R12=04f08b982bd31a7d R13=4f483ab4550a041a R14=000000000000000b R15=2950c34860101a0c
[2023-07-25T15:03:25.092Z] RIP=00000000059c3014 RFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
[2023-07-25T15:03:25.092Z] ES =0000 0000000000000000 00000000 00000000
[2023-07-25T15:03:25.092Z] CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
[2023-07-25T15:03:25.092Z] SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
[2023-07-25T15:03:25.092Z] DS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
[2023-07-25T15:03:25.092Z] FS =0000 0000000000000000 00000000 00000000
[2023-07-25T15:03:25.092Z] GS =0000 0000000000000000 00000000 00000000
[2023-07-25T15:03:25.092Z] LDT=0000 0000000000000000 00000000 00000000
[2023-07-25T15:03:25.092Z] TR =0020 0000000000000000 00000fff 00008b00 DPL=0 TSS64-busy
[2023-07-25T15:03:25.092Z] GDT=     00000000059e8950 0000002f
[2023-07-25T15:03:25.092Z] IDT=     00000000059e8990 000001ff
[2023-07-25T15:03:25.092Z] CR0=80050033 CR2=0000000000000000 CR3=0000000005a2d000 CR4=00000020
[2023-07-25T15:03:25.092Z] DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
[2023-07-25T15:03:25.092Z] DR6=00000000ffff0ff0 DR7=0000000000000400
[2023-07-25T15:03:25.092Z] EFER=0000000000000500
[2023-07-25T15:03:25.092Z] Code=24 10 c4 e2 fb f7 ff 4c 8d 14 7b 41 0f b6 7a 01 45 0f b6 12 <41> 88 78 fc 41 01 f2 c4 c2 f1 f7 f4 c4 e2 fb f7 f6 48 8d 3c 73 0f b6 77 01 0f b6 3f 41 88
[2023-07-26T09:02:29.376Z] Sending interrupt signal to process

@dustymabe
Copy link
Member

OK, looks like there is a bug in QEMU or in our QEMU setup in this PR:

FWIW I am seeing this kvm run failed Bad address when trying it out locally so maybe the failure isn't specific to the pipeline.

build.sh Outdated Show resolved Hide resolved
@cgwalters cgwalters added the needs-work/failing-ci Legitimate CI failure label Aug 29, 2023
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Sep 1, 2023
In coreos#3428 I tried
a wholesale switch to virtiofs.  There are some kernel/KVM crashes
we haven't yet debugged.

It wasn't hard to factor things out so that it becomes a dynamic
choice, keeping the previous 9p support.

This way for e.g. local development one can now
`env COSA_VIRTIOFS=1 cosa run --qemu-image rhcos.qcow2 --bind-ro ...`
@dustymabe
Copy link
Member

dustymabe commented Sep 6, 2023

One thing I hit was an error when I set FORCE_UNPRIVILEGED=1:

[coreos-assembler]$ export FORCE_UNPRIVILEGED=1
[coreos-assembler]$ cosa fetch && cosa build ostree && cosa buildextend-qemu
Config commit: eca47d715e66128d2fa6d0c35267c65023825aae
Using manifest: /srv/src/config/manifest.yaml
/srv/tmp/override/coreos-assembler-local-overrides.repo  /srv/tmp/override/fedora-next.repo     /srv/tmp/override/fedora.repo
/srv/tmp/override/fedora-coreos-pool.repo                /srv/tmp/override/fedora-rawhide.repo
Committing 05core: /srv/src/config/overlay.d/05core ... 8e906b3387223fe9c541b2ca8ecd621e5f7ee6bd0a1146533b552b39ad4be5da
Committing 08nouveau: /srv/src/config/overlay.d/08nouveau ... 156b0298d3da52748c939e58ec439422ea292c096c64b02a075b596bf485f787
Committing 09misc: /srv/src/config/overlay.d/09misc ... 61faa030c0f8e9be61499688ab172193ccae45f7a8311ebc83b3c654ec52bd3f
Committing 15fcos: /srv/src/config/overlay.d/15fcos ... d056c0e1ccee79412ba2fe030b39e6a353de55594668d313ae3fabfe6c5b7db4
Committing 16disable-zincati: /srv/src/config/overlay.d/16disable-zincati ... 08f3fa1941d61f806cf174cc04da857cbcfad7737dd163903451ee4eabaa07fb
Committing 17fedora-modularity: /srv/src/config/overlay.d/17fedora-modularity ... ce51c5c7cb00bc70bddecdd1c721d68e1e493206deefd70c32f0ccec98351313
Committing 18fwupd-refresh-timer: /srv/src/config/overlay.d/18fwupd-refresh-timer ... ac3417e6d85082c86956b2539e2b8af2de2e2d6be30af93917fb0dd3d284eefc
Committing 20platform-chrony: /srv/src/config/overlay.d/20platform-chrony ... 79173ee2c3ddc65079225e400be25a2c2246da6dd57ef8146cf3309f2ee8e559
Committing 25azure-udev-rules: /srv/src/config/overlay.d/25azure-udev-rules ... 864c5124b055918a52022be63957ab7d4da51e8fca61d102ce339d11676c74dd
Committing 30lvmdevices: /srv/src/config/overlay.d/30lvmdevices ... 1295ee2c30460c40018347c244dd52420c726472b1ed48a89eb871715f7e7f21
Committing cosa-image-json: /srv/tmp/override/imagejson ... f3de510f1e0ae2b8c5f0533e856bb4be8c7dbe8e41442763cb2f77362979bac5
Running: rpm-ostree compose tree --touch-if-changed /srv/tmp/treecompose.changed --cachedir=/srv/cache --unified-core /srv/tmp/override/coreos-assembler-override-manifest.yaml --download-only --ex-lockfile=/srv/src/config/manifest-lock.x86_64.json --ex-lockfile=/srv/src/config/manifest-lock.overrides.yaml
info: Detected FORCE_UNPRIVILEGED; using virt
/srv/tmp/build/cmd.sh: line 2: /usr/lib/coreos-assembler/compose.sh: Permission denied
failed to execute cmd-fetch: exit status 126

anyone else seeing something similar? If not I'll try to debug my setup.

@dustymabe
Copy link
Member

other than the FORCE_UNPRIVILEGED=1 issue all is looking good in local testing.

@dustymabe
Copy link
Member

I kicked off a testing-devel run using the code from this PR. Let's see if it makes it all the way through with no issues.

@travier
Copy link
Member

travier commented Sep 7, 2023

I kicked off a testing-devel run using the code from this PR. Let's see if it makes it all the way through with no issues.

Completed with success!

Copy link
Member

@travier travier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall! Let's figure out the answers to the pending questions and we should be good to go.

@cgwalters
Copy link
Member Author

cgwalters commented Sep 7, 2023

/srv/tmp/build/cmd.sh: line 2: /usr/lib/coreos-assembler/compose.sh: Permission denied

Well, is it executable for you? Not reproducing this here.

The continuous-integration/jenkins/pr-merge and ci/prow/rhcos jobs here also exercised the unprivileged path successfully.

@dustymabe
Copy link
Member

Well, is it executable for you? Not reproducing this here.

Ahh. It's not an executable file in the git repo and I have that volume mounted in. I'll open a PR to change that.

This LGTM.

@cgwalters cgwalters merged commit dedb310 into coreos:main Sep 7, 2023
2 checks passed
cgwalters added a commit that referenced this pull request Sep 7, 2023
In #3428 I tried
a wholesale switch to virtiofs.  That's a large change in one go;
it wasn't hard to factor things out so that it becomes a dynamic
choice, keeping the previous 9p support.

This way for e.g. local development one can now
`env COSA_VIRTIOFS=1 cosa run --qemu-image rhcos.qcow2 --bind-ro ...`

Further work can then build on this to switch to virtiofs by default,
and allow falling back to 9p.  Once we're confident in virtiofs
we can drop the 9p support.
@cgwalters
Copy link
Member Author

Ah yes, so #3087 would have helped you...

@cgwalters
Copy link
Member Author

Likely fallout in coreos/rpm-ostree#4584 (comment)

@cgwalters
Copy link
Member Author

OK looks like this is scoped to just [rpm-ostree and bootupd](https://github.com/search?q=org%3Acoreos%20%22cosaPod(runAsUser%3A%200%2C%22&type=code). Trying to drop the runAsUser: 0 in the former.

Will look at a virtiofsd patch too.

@cgwalters
Copy link
Member Author

cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request Sep 7, 2023
cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request Sep 8, 2023
cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request Sep 8, 2023
cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request Sep 8, 2023
cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request Sep 8, 2023
jlebon added a commit to jlebon/ostree that referenced this pull request Sep 9, 2023
jlebon added a commit to jlebon/ostree that referenced this pull request Sep 9, 2023
@jlebon
Copy link
Member

jlebon commented Sep 11, 2023

OK looks like this is scoped to just [rpm-ostree and bootupd](github.com/search?q=org%3Acoreos%20%22cosaPod(runAsUser%3A%200%2C%22&type=code). Trying to drop the runAsUser: 0 in the former.

Looks like ostree too. Applied the same workaround as part of ostreedev/ostree#2054, but can split it out.

jlebon added a commit to coreos/coreos-installer that referenced this pull request Sep 11, 2023
jlebon added a commit to coreos/coreos-installer that referenced this pull request Sep 11, 2023
jlebon added a commit to coreos/coreos-installer that referenced this pull request Sep 11, 2023
jlebon added a commit to jlebon/coreos-assembler that referenced this pull request Sep 13, 2023
PR coreos#3593 was written and opened before coreos#3428 but merged after it. CI had
passed but was not rerun, so this went in. This would be fixed by using
a merge bot on this repo that reruns CI before merge (See also
https://bors.tech/essay/2017/02/02/pitch/.)

Fixes: d812406 ("kola: Add `--qemu-bind-ro`")
dustymabe pushed a commit that referenced this pull request Sep 13, 2023
PR #3593 was written and opened before #3428 but merged after it. CI had
passed but was not rerun, so this went in. This would be fixed by using
a merge bot on this repo that reruns CI before merge (See also
https://bors.tech/essay/2017/02/02/pitch/.)

Fixes: d812406 ("kola: Add `--qemu-bind-ro`")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

switch from 9p to virtiofs
4 participants