Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HVM is significantly slower than PVH (Xen 4.14) #6174

Closed
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: Fedora C: templates C: Xen diagnosed Technical diagnosis has been performed (see issue comments). eol-4.1 Closed because Qubes 4.1 has reached end-of-life (EOL) P: major Priority: major. Between "default" and "critical" in severity. pr submitted A pull request has been submitted for this issue. r4.1-bullseye-stable r4.1-buster-stable r4.1-centos8-stable r4.1-dom0-stable r4.1-fc31-stable r4.1-fc32-stable r4.1-fc33-cur-test r4.1-stretch-cur-test

Comments

@marmarek
Copy link
Member

marmarek commented Nov 1, 2020

Observation

openQA test in scenario qubesos-4.1-pull-requests-x86_64-system_tests_pvgrub_salt_storage@64bit fails in
TC_41_HVMGrub_fedora-32

qubes.exc.QubesVMError: Cannot connect to qrexec agent for 90 seconds, see /var/log/xen/console/guest-test-inst-vm1.log for details

Possibly related VM console entries:

[2020-11-01 05:08:54] [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.8.16-200.fc32.x86_64 root=/dev/xvda3 ro root=/dev/mapper/dmroot console=tty0 console=hvc0 swiotlb=8192 noresume xen_scrub_pages=0
...
[2020-11-01 05:09:52] [�[0;32m  OK  �[0m] Found device �[0;1;39m/dev/mapper/dmroot�[0m.
[2020-11-01 05:09:52] [�[0;32m  OK  �[0m] Reached target �[0;1;39mInitrd Root Device�[0m.
[2020-11-01 05:09:53] [�[0;32m  OK  �[0m] Finished �[0;1;39mdracut initqueue hook�[0m.
[2020-11-01 05:09:53] [�[0;32m  OK  �[0m] Reached target �[0;1;39mRemote File Systems (Pre)�[0m.
[2020-11-01 05:09:53] [�[0;32m  OK  �[0m] Reached target �[0;1;39mRemote File Systems�[0m.
[2020-11-01 05:09:53]          Starting �[0;1;39mdracut pre-mount hook�[0m...
[2020-11-01 05:09:53] [   56.786209] audit: type=1130 audit(1604207392.968:15): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=dracut-initqueue comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[2020-11-01 05:09:53] [�[0;32m  OK  �[0m] Finished �[0;1;39mdracut pre-mount hook�[0m.
[2020-11-01 05:09:53]          Starting �[0;1;39mFile System Check on /dev/mapper/dmroot�[0m...
[2020-11-01 05:09:53] [   57.021420] audit: type=1130 audit(1604207393.190:16): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=dracut-pre-mount comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[2020-11-01 05:09:53] [�[0;32m  OK  �[0m] Finished �[0;1;39mFile System Check on /dev/mapper/dmroot�[0m.
[2020-11-01 05:09:53] [   57.516444] audit: type=1130 audit(1604207393.708:17): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-fsck-root comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[2020-11-01 05:09:54]          Mounting �[0;1;39m/sysroot�[0m...
[2020-11-01 05:09:54] [   57.852351] EXT4-fs (xvda3): mounted filesystem with ordered data mode. Opts: (null)
[2020-11-01 05:09:54] [�[0;32m  OK  �[0m] Mounted �[0;1;39m/sysroot�[0m.
[2020-11-01 05:09:54] [�[0;32m  OK  �[0m] Reached target �[0;1;39mInitrd Root File System�[0m.
[2020-11-01 05:09:54]          Starting �[0;1;39mReload Configuration from the Real Root�[0m...
[2020-11-01 05:09:54] [   58.138307] audit: type=1334 audit(1604207394.330:18): prog-id=5 op=UNLOAD
[2020-11-01 05:09:54] [   58.176370] audit: type=1334 audit(1604207394.346:19): prog-id=4 op=UNLOAD
[2020-11-01 05:09:54] [   58.200801] audit: type=1334 audit(1604207394.346:20): prog-id=3 op=UNLOAD
[2020-11-01 05:09:54] [   58.228330] audit: type=1334 audit(1604207394.360:21): prog-id=7 op=UNLOAD
[2020-11-01 05:09:54] [   58.252140] audit: type=1334 audit(1604207394.361:22): prog-id=6 op=UNLOAD

Note double root=. That isn't necessary the root cause.

The same test works on debian-10.

Test suite description

Setup fedora-32 StandaloneVM HVM with 'kernel' set to none.

Reproducible

Fails since (at least) Build 2020103122-4.1 (current job)

Expected result

Last good: 2020103116-4.1 (or more recent)

Further details

Always latest result in this scenario: latest

@marmarek
Copy link
Member Author

marmarek commented Nov 1, 2020

I cannot reproduce it locally, but it seems HVM is significantly slower than PVH (at least its boot time). This may be slow enough on openQA (thanks to nested virt) to hit the timeout.

@icequbes1
Copy link

Attempted to test this, just providing observations:

The very first boot, the VM did not start, but the errors observed on the console involved systemd-udevd invoking oom-killer 7 seconds after boot.

Unsure if related as the logs do not line up with what is originally posted. Attached: fed32-hvm-oom.txt

Could not reproduce after the very first boot, even with changing memory values, keeping in-vm kernel:

  • 64MB: locked memory results in Triple fault in hypervisor.log
  • 128MB: kernel page faults 2 seconds after kernel boot
  • 256MB: successful boot (though 100MB of swap in use in-vm right after boot)
  • other values: no issues booting, though cosmetic issues with app-linux-input-proxy v1.0.21 (r4.1) updates-status#2166 observed only for in-kernel HVM.

@marmarek
Copy link
Member Author

marmarek commented Nov 2, 2020

Ok, so this is OOM in initramfs, where the swap is not enabled yet. And apparently now Fedora is running fsck there...
Perhaps we should adjust our dracut module to enable swap in initramfs already? AFAIR @DemiMarie suggested that already in a private email.
While I don't want to mess too much with the system setup from dom0-provided kernel (as it should work with any distribution, so there is a great risk for incompatibilities), but for initramfs generated by Fedora's dracut inside that VM it should be fine.

That said, I now did once got oom-killer killing systemd-udevd in fedora-32 based VM (after initramfs phase, but still before enabling swap), with dom0-provided kernel.

Where are the times when Linux was happy with 128MB RAM in total...

@DemiMarie
Copy link

I suspect it largely depends on config options. There are Linux devices (Azure Sphere) running happily with 4MiB of RAM. And fsck is known to be somewhat RAM intensive, especially on large partitions. That’s why OpenBSD turns on swap first, and I think we should do the same.

@DemiMarie
Copy link

Ideally, we should enable swap before doing just about anything else.

marmarek added a commit to marmarek/qubes-core-agent-linux that referenced this issue Nov 2, 2020
Grub scripts are very persistent in trying to use what is currently
mounted as /. Even if currently (TemplateVM) /dev/xvda3 is mounted
directly, all the configuration should use /dev/mapper/dmroot, to work
also in AppVM.
GRUB_DEVICE is used in various places as root device (including
constructing root= parameter in some versions). Force it to
/dev/mapper/dmroot

QubesOS/qubes-issues#6174
marmarek added a commit to marmarek/qubes-core-agent-linux that referenced this issue Nov 2, 2020
fsck may require significant amount of RAM, enable swap earlier to avoid
out of memory condition

QubesOS/qubes-issues#6174
marmarek added a commit to marmarek/qubes-core-agent-linux that referenced this issue Nov 3, 2020
fsck may require significant amount of RAM, enable swap earlier to avoid
out of memory condition. Implement this as a separate service unit, not
a swap unit, because the latter requires udev running (implicit
dependency on dev-xvdc1.device) which is not the case before remounting
root filesystem read-write.

QubesOS/qubes-issues#6174
@andrewdavidwong andrewdavidwong added diagnosed Technical diagnosis has been performed (see issue comments). P: major Priority: major. Between "default" and "critical" in severity. labels Nov 3, 2020
@marmarek
Copy link
Member Author

marmarek commented Nov 8, 2020

Swap may be some factor here, but definitely not the only one. HVM seems to boot significantly slower even with the same kernel from dom0.
Here are two boot logs, one PVH, the other one HVM:

  • PVH took 6s to boot
  • HVM took 24s to boot

Part of it is because of stubdomain startup (10s between Logfile Opened line and the first log from the kernel), but it still doesn't explain the whole difference.

@andrewdavidwong andrewdavidwong added needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. pr submitted A pull request has been submitted for this issue. and removed diagnosed Technical diagnosis has been performed (see issue comments). labels Nov 8, 2020
@marmarek marmarek reopened this Nov 13, 2020
@qubesos-bot
Copy link

Automated announcement from builder-github

The package linux-utils has been pushed to the r4.1 testing repository for the CentOS centos8 template.
To test this update, please install it with the following command:

sudo yum update --enablerepo=qubes-vm-r4.1-current-testing

Changes included in this update

@qubesos-bot
Copy link

Automated announcement from builder-github

The package python3-qubesimgconverter-4.1.12-1.fc32 has been pushed to the r4.1 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

@marmarek
Copy link
Member Author

marmarek commented Jun 4, 2021

One more thing worth testing: iommu=noforce on the VM's cmdline. Linux tries to reserve 64MB for a bounce buffer for DMA, based on some unclear to me heuristic. If a VM has PCI device assigned, we add swiotlb=8192 argument, which lowers it to 16MB. But in fact, I think, this should not be needed at all, since we do require working IOMMU (I think it was needed in practice only for PV domains with PCI devices).
So, iommu=noforce should (in theory) free significant (compared to 400MB) part of VM's memory.

@icequbes1
Copy link

From quick testing iommu=noforce didn't appear to make a difference in VM startup time. I believe some reference indicated iommu=noforce was the default on Intel machines, but I don't know enough about what is being tested/modified.

What does appear to be constant is a ~4 second pause during Waiting for /dev/xvda* devices (with dom0-provided kernel) however I see that on R4.0 too, but I don't want to lose focus on the R4.1 HVM issue.

@andrewdavidwong andrewdavidwong added diagnosed Technical diagnosis has been performed (see issue comments). needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. and removed needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. diagnosed Technical diagnosis has been performed (see issue comments). labels Jun 13, 2021
@qubesos-bot
Copy link

Automated announcement from builder-github

The package xen-hvm-stubdom-linux-1.2.0-1.fc32 has been pushed to the r4.1 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

@andrewdavidwong andrewdavidwong added diagnosed Technical diagnosis has been performed (see issue comments). and removed needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. labels Jun 23, 2021
jandryuk pushed a commit to jandryuk/qubes-vmm-xen-stubdom-linux that referenced this issue Aug 3, 2021
Fixes QubesOS/qubes-issues#6174

Conflicts:
	qemu/patches/series

Fixup since other branch has more patches.
@qubesos-bot
Copy link

Automated announcement from builder-github

The package xen-hvm-stubdom-linux-1.2.0-1.fc32 has been pushed to the r4.1 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

@adrelanos
Copy link
Member

[0.048xxx] random: crng done (trusting CPU's manufacturer)

This! I've just rechecked the failed log, and I don't see trusting CPU's manufacturer part there. And indeed that CPU does not support RDRAND. This means, the extreme issue I see, applies only to quite old systems (and hopefully does not affect majority of our users - even good old x230 already has RDRAND). So, I'm lowering the priority. But it's still worth improving the situation.

Strongly discouraged to rely on RDRAND for security / entropy quality anyhow as per:
https://www.whonix.org/wiki/Dev/Entropy#RDRAND

@marmarek
Copy link
Member Author

marmarek commented Oct 6, 2021

Strongly discouraged to rely on RDRAND for security / entropy quality anyhow as per:

In context of this issue, it is not a problem, because stubdomain does not use RNG for any security critical task. There is not crypto involved etc. One could argue it may make ASLR for qemu less effective, but we don't consider qemu trusted, so it is not a huge deal (and remember the RDRAND issues are still very hypothetical - see below).

In a broader context of RDRAND, I don't think we should worry about backdoors there. Or rather: if you consider intentional backdoors in your CPU a valid threat, throw away that CPU. There is no really a difference how such hypothetical backdoor could work - whether that would be predictable RDRAND, reacting to some magic values to any other instruction, or anything else. We could worry about its effectiveness - not intentional bugs, which indeed is hard to reason about, since its being opaque.

marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Jan 18, 2023
Linux inside HVM will allocate 64MB for bouncing DMA (SWIOTLB) by
default. If no real PCI device is assigned, that's way too much, and
wastes over 15% of VM's initial memory.
With real PCI devices, it's usually too much too, but it's very device
specific, so don't risk breaking it. In other cases, reduce default to
4MB.

Note PVH domain will not allocate SWIOTLB anyway, as no PCI devices are
there at all. This difference contributes to the VM start time, so
reducing SWIOTLB should also improve that part.

QubesOS/qubes-issues#6174
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Jan 18, 2023
Linux inside HVM will allocate 64MB for bouncing DMA (SWIOTLB) by
default. If no real PCI device is assigned, that's way too much, and
wastes over 15% of VM's initial memory.
With real PCI devices, it's usually too much too, but it's very device
specific, so don't risk breaking it. In other cases, reduce default to
4MB.

Note PVH domain will not allocate SWIOTLB anyway, as no PCI devices are
there at all. This difference contributes to the VM start time, so
reducing SWIOTLB should also improve that part.

QubesOS/qubes-issues#6174
marmarek added a commit to QubesOS/qubes-core-admin that referenced this issue May 13, 2023
Linux inside HVM will allocate 64MB for bouncing DMA (SWIOTLB) by
default. If no real PCI device is assigned, that's way too much, and
wastes over 15% of VM's initial memory.
With real PCI devices, it's usually too much too, but it's very device
specific, so don't risk breaking it. In other cases, reduce default to
4MB.

Note PVH domain will not allocate SWIOTLB anyway, as no PCI devices are
there at all. This difference contributes to the VM start time, so
reducing SWIOTLB should also improve that part.

QubesOS/qubes-issues#6174

(cherry picked from commit c774fd4)
@andrewdavidwong andrewdavidwong added the affects-4.1 This issue affects Qubes OS 4.1. label Aug 8, 2023
@andrewdavidwong andrewdavidwong removed this from the Release 4.1 updates milestone Aug 13, 2023
@andrewdavidwong andrewdavidwong added the eol-4.1 Closed because Qubes 4.1 has reached end-of-life (EOL) label Dec 7, 2024
Copy link

github-actions bot commented Dec 7, 2024

This issue is being closed because:

If anyone believes that this issue should be reopened, please leave a comment saying so.
(For example, if a bug still affects Qubes OS 4.2, then the comment "Affects 4.2" will suffice.)

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: Fedora C: templates C: Xen diagnosed Technical diagnosis has been performed (see issue comments). eol-4.1 Closed because Qubes 4.1 has reached end-of-life (EOL) P: major Priority: major. Between "default" and "critical" in severity. pr submitted A pull request has been submitted for this issue. r4.1-bullseye-stable r4.1-buster-stable r4.1-centos8-stable r4.1-dom0-stable r4.1-fc31-stable r4.1-fc32-stable r4.1-fc33-cur-test r4.1-stretch-cur-test
Projects
None yet
7 participants