Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/qemu-vm: fix diskless VMs (round 2) #233847

Closed
wants to merge 1 commit into from

Conversation

RaitoBezarius
Copy link
Member

@RaitoBezarius RaitoBezarius commented May 24, 2023

Description of changes

This is a revert of #230386 and reintroduction of #228047.

I spent all the day debugging this with symbols and shit. This is a Nix bug, renameat2 is called with the wrong arguments and causes a XDEV error, if you patch Nix with:

From 4886c5db115ac4c8d8a78dbaa62729f070efe6f9 Mon Sep 17 00:00:00 2001
From: Raito Bezarius <[email protected]>
Date: Wed, 24 May 2023 21:16:44 +0200
Subject: [PATCH] localStore: do not cleanup for disk full situations for now

---
 src/libstore/build/local-derivation-goal.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/libstore/build/local-derivation-goal.cc b/src/libstore/build/local-derivation-goal.cc
index 7929cfe35..0c78710db 100644
--- a/src/libstore/build/local-derivation-goal.cc
+++ b/src/libstore/build/local-derivation-goal.cc
@@ -315,6 +315,7 @@ void LocalDerivationGoal::cleanupPostChildKill()
 
 bool LocalDerivationGoal::cleanupDecideWhetherDiskFull()
 {
+    return false;
     bool diskFull = false;
 
     /* Heuristically check whether the build failure may have
-- 
2.40.1

You can discover that the problem is that you never had kbd.dev in your closure and no bootstrap seeds stuff, so you cannot rebootstrap everything. :)

So while it's useful to perform the cleanup I am doing, I will remove this and reintroduce diskless VMs without them for now.

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.05 Release Notes (or backporting 22.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Original thoughts

Now I have more time to deal with the cross-device link issues that arised.

Context:

Cross-device link errors are: Hard links performed across different "filesystems" or "mountpoints".

Source of "different filesystems":

  • In a VM context, we mount /nix/.ro-store and /nix/.rw-store. That's already 2 different filesystems and mountpoints.
  • Then, we tie them together into a /nix/store via overlayfs (again another filesystem) via postDeviceCommand, but, an installation CD-DVD profile, will do it too! via mkImageMediaOverride in its filesystem entries.
  • Then, under boot.readOnlyNixStore = true;, we bind mount /nix/store on /nix/store

As you can see, we have multiple threats from multiple angles.~~~

@RaitoBezarius RaitoBezarius requested review from nikstur and vcunat and removed request for nikstur May 24, 2023 16:06
@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` labels May 24, 2023
Copy link
Member

@alyssais alyssais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks virtualisation.mountHostNixStore:

nixosTest {
  name = "test";
  nodes.machine = {
    virtualisation.mountHostNixStore = false;
  };
  testScript = "";
}
error: dynamic attribute '"/nix/store"' already defined at /home/qyliss/src/nixpkgs/nixos/modules/virtualisation/qemu-vm.nix:1099:9

       at /home/qyliss/src/nixpkgs/nixos/modules/virtualisation/qemu-vm.nix:1089:9:

         1088|         };
         1089|         "/nix/${if cfg.writableStore then ".ro-store" else "store"}" = lib.mkIf cfg.useNixStoreImage {
             |         ^
         1090|           device = "${lookupDriveDeviceName "nix-store" cfg.qemu.drives}";
(use '--show-trace' to show detailed location information)

@nikstur
Copy link
Contributor

nikstur commented May 24, 2023

More and more, I feel like we should leverage filesystem labels to identify devices instead of relying on /dev/vda, /dev/vda1 etc.

@alyssais
Copy link
Member

More and more, I feel like we should leverage filesystem labels to identify devices instead of relying on /dev/vda, /dev/vda1 etc.

For VM tests, we could use the PCI location of the disk, because QEMU lets us set that.

@RaitoBezarius
Copy link
Member Author

Or we could do like #233350 and rename all of them via udev. :)

@RaitoBezarius
Copy link
Member Author

This breaks virtualisation.mountHostNixStore:

nixosTest {
  name = "test";
  nodes.machine = {
    virtualisation.mountHostNixStore = false;
  };
  testScript = "";
}
error: dynamic attribute '"/nix/store"' already defined at /home/qyliss/src/nixpkgs/nixos/modules/virtualisation/qemu-vm.nix:1099:9

       at /home/qyliss/src/nixpkgs/nixos/modules/virtualisation/qemu-vm.nix:1089:9:

         1088|         };
         1089|         "/nix/${if cfg.writableStore then ".ro-store" else "store"}" = lib.mkIf cfg.useNixStoreImage {
             |         ^
         1090|           device = "${lookupDriveDeviceName "nix-store" cfg.qemu.drives}";
(use '--show-trace' to show detailed location information)

This is now fixed via "let's not do that overlay cleanup now" because it's not the root cause of the revert which is unrelated to the contents of that PR.

@RaitoBezarius
Copy link
Member Author

Pushing e6e049b to evaluate a baseline for this PR.

Copy link
Member

@alyssais alyssais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but this absolutely must be verified not to cause any test regressions on Hydra when compared to its base.

@RaitoBezarius
Copy link
Member Author

Pushed 3e5e088 to https://hydra.nixos.org/jobset/nixos/python-test-refactoring ; it should appear there in ~ 45 minutes.

@RaitoBezarius
Copy link
Member Author

This is failing for the exact same reasons as before kbd.dev is missing in the closure.

@nikstur
Copy link
Contributor

nikstur commented Jun 1, 2023

So what's the way forward here? If we reintroduce the reverted changes, won't it be a channel blocker again?

@RaitoBezarius
Copy link
Member Author

So what's the way forward here? If we reintroduce the reverted changes, won't it be a channel blocker again?

I am working towards a solution to make it safe. :)

@mweinelt mweinelt added the 12.approvals: 2 This PR was reviewed and approved by two reputable people label Jun 2, 2023
@RaitoBezarius RaitoBezarius marked this pull request as draft June 8, 2023 15:09
@nikstur
Copy link
Contributor

nikstur commented Jun 20, 2023

Can this be closed now? #236656 should've fixed it.

On a separate note, we should probably add a test for diskless VMs down the road.

@RaitoBezarius
Copy link
Member Author

RaitoBezarius commented Jun 20, 2023 via email

@nikstur
Copy link
Contributor

nikstur commented Jun 20, 2023

This should prove that diskless VMs with a volatile root work now: #238848

@nikstur
Copy link
Contributor

nikstur commented Jun 27, 2023

This can be closed now after #238848 has been merged (which proves that diskless images work). Please also note that #240175 is not relevant here because the systemd initrd never worked with a volatile root. I checked that even when the option to run a diskless VM was first introduced in this commit bf41254 it doesn't work with he systemd initrd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 1-10 10.rebuild-darwin: 1 10.rebuild-linux: 1-10 12.approvals: 2 This PR was reviewed and approved by two reputable people
Projects
Development

Successfully merging this pull request may close these issues.

4 participants