Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare R4.0 -> R4.1 upgrade procedure #5685

Closed
9 of 11 tasks
marmarek opened this issue Feb 23, 2020 · 30 comments
Closed
9 of 11 tasks

Prepare R4.0 -> R4.1 upgrade procedure #5685

marmarek opened this issue Feb 23, 2020 · 30 comments
Labels
C: dist upgrade The code and tools that support upgrading in-place from one Qubes OS release to another C: doc P: major Priority: major. Between "default" and "critical" in severity. T: task Type: task. An action item that is neither a bug nor an enhancement.
Milestone

Comments

@marmarek
Copy link
Member

marmarek commented Feb 23, 2020

Prepare and test the upgrade procedure, this will be similar to https://www.qubes-os.org/doc/upgrade-to-r3.2/

Things that will need a special care:

  • pacakages in R4.1 are compressed with zstd, which require rpm >= 4.14, R4.0 has rpm 4.13
  • UEFI boot in R4.1 uses grub2, while in R4.0 it uses xen.efi only - the conversion will require partition change (separate /boot, not only /boot/efi) and installing+configuring new bootloader
  • adjust LVM settings - especially size of metadata volume

Based on the above, tasks here:

  • write the procedure
  • prepare packages to upgrade rpm before the main upgrade
  • prepare a script to carve out /boot out of /boot/efi, and install+configure grub2
  • prepare a script to adjust LVM settings
  • order sys-usb startup before login
  • upgrade LUKS1 header to LUKS2 impossible from a running system, and benefits too small to fight with offline approach
  • call qvm-appmenus --all --update at the end
  • handle /etc/yum.repos.d/qubes-dom0.repo.rpmnew (otherwise "STAGE 4" won't use fc32 repos)
  • handle getting Xen cmdline for grub2-efi better - executing xl info before reboot may not work - save it at the beginning of upgrade?
  • verify qrexec policy before reboot? any issue prevents all the calls, not just a single service now
  • shutdown templates after upgrading them

All of the above should be reasonably safe to use - scripts should check assumptions before continuing and better refuse to start (for example on custom partition layout that don't have space for separate /boot) than to leave the system with broken bootloader.
Also, upgrade procedure should document supported configurations.

@marmarek marmarek added T: task Type: task. An action item that is neither a bug nor an enhancement. P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. labels Feb 23, 2020
@marmarek marmarek added this to the Release 4.1 milestone Feb 23, 2020
@marmarek
Copy link
Member Author

Here is example /etc/default/grub: https://gist.github.com/marmarek/b7717e71e3a72a7ab09aa3907b8afeb9
It needs to be adjusted to preserve Xen boot options (see xl info) and Linux boot options (see /proc/cmdline).

marmarek added a commit to marmarek/qubes-linux-dom0-updates that referenced this issue Feb 23, 2020
This set of packages will allow running rpm 4.14 in a fc25-based dom0
(R4.0), which is necessary to update to fc31-based dom0 (R4.1).

QubesOS/qubes-issues#5685
@marmarta
Copy link
Member

Procedure I followed to achieve successful upgrade:

For dom0 update itself:

  1. Update all templates and standalone VMs (the updater widget works here)
  2. Shut all unnecessary VMs down (sys-net, sys-firewall, sys-usb can stay)
  3. Open terminal in dom0
  4. Update dom0: sudo qubes-dom0-update
  5. Update qubes-release: sudo qubes-dom0-update --releasever=4.1 --enablerepo=*testing* --enablerepo=*unstable* qubes-release
  6. Fix a key that fails to import itself sudo rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-31-primary
  7. Update everything: sudo qubes-dom0-update --action=distro-sync --allowerasing --enablerepo=*testing* --enablerepo=*unstable*
  8. Reboot

And then to fix grub's not being there:

  1. Open terminal in dom0, sudo -s for convenience
  2. Copy /boot somewhere safe: cp -r /boot ~/
  3. umount /boot/efi
  4. fdisk /dev/sda (or whatever your disk is called) and there do the following:
  • check your current partition table (p)
  • delete EFI system partition (by default it will be the 1st partition, so perform d 1)
  • create a new partition 1 ( n, use default start, size +25M, remove the signature)
  • create another new partition (n, default start, default end, default the rest); it should get number 3 on a default install
  • change partition type of partition 1 to EFI system ( t, EFI system is type 1)
  • commit the changes and exit fdisk (w)
  1. create a VFAT file system on the EFI partition you just created, for me mkfs.vfat /dev/sda1
  2. create an ext4 file system on the other partition you just created, for me mkfs.ext4 /dev/sda3
  3. mount the ext4 partition in /mnt/: mount /dev/sda3 /mnt
  4. copy your safely stored /boot there: cp -r ~/boot/* /mnt/
  5. umount /mnt
  6. mount /dev/sda1 /boot/efi
  7. now comes the annoying part: use blkid to find out the UUID of this partition, and then edit /etc/fstab to fix it there; then add another line to /etc/fstab that references the ext4 partition - I copied the one for /boot/efi, fixed its UUID with one taken fromblkid, changed the location to /boot and changed the contents of the third column to just defaults
  8. qubes-dom0-update grub2-efi-x64 --enablerepo=*testing*
  9. grub2-mkconfig -o boot/efi/EFI/qubes/grub.cfg
  10. efibootmgr -v, fing the EFI line (it was 0003 for me)
  11. efibootmgr -b 0003 -B;efibootmgr -c -L Qubes -l '\EFI\qubes\grubx64.efi (replace 0003 with your number of EFI partition if needed)
  12. cp /usr/share/grub/unicode.pf2 /boot/efi/EFI/qubes/fonts/
  13. plymouth-set-default-theme qubes-dark
  14. dracut -f
  15. and finally, vim /etc/default/grub and fix the UUID in GRUB_CMDLINE_LINUX

@marmarek
Copy link
Member Author

@fepitre can you write a script that handle the grub part?
The above order works, but I'd recommend first installing grub2-efi-x64 and its configuration and only then repartition - otherwise you have a window where you have no bootloader at all and if the download fails, you're screwed.
Also, the script should check if there is really enough space for that /boot - default /boot/efi should be 500M, but on some early R4.0 builds it could be smaller - in that case better abort early.

@fepitre
Copy link
Member

fepitre commented Feb 23, 2020

Sure.

marmarek added a commit to marmarek/qubes-linux-dom0-updates that referenced this issue Feb 23, 2020
This set of packages will allow running rpm 4.14 in a fc25-based dom0
(R4.0), which is necessary to update to fc31-based dom0 (R4.1).

QubesOS/qubes-issues#5685
marmarek added a commit to marmarek/qubes-builder-rpm that referenced this issue Feb 23, 2020
This allows using various --with/--without options (or rather
--define="_with_* 1" and --define="_without_* 1" - for which
--with/--without are aliases).
For now it is used only for package rebuilt from a different Fedora
version (linux-dom0-updates repo).

Adjust also legacy build code that calls dnf builddep manually - here is
why it needs to be --define.
Change dependencies handling for src.rpm there, to extract spec file and
operate on it directly, because otherwise --define options takes no
effect (dnf normally looks at src.rpm dependencies in the header, not within the
spec file).
This issue does not apply to mock, because mock first rebuilds src.rpm
package, including various --define.

QubesOS/qubes-issues#5685
marmarek added a commit to marmarek/qubes-linux-dom0-updates that referenced this issue Feb 23, 2020
This set of packages will allow running rpm 4.14 in a fc25-based dom0
(R4.0), which is necessary to update to fc31-based dom0 (R4.1).

QubesOS/qubes-issues#5685
marmarek added a commit to marmarek/qubes-qubes-release that referenced this issue Feb 24, 2020
New repositories require rpm 4.14 (packages compressed with zstd) and
also new rpm requires qubes-core-dom0-linux >= 4.0.23 (different output
in rpm -K). Prevent installing new qubes-release unless those other
updates are installed too. Otherwise updating the system will stop
working.

QubesOS/qubes-issues#5685
marmarek added a commit to QubesOS/qubes-release-configs that referenced this issue Feb 24, 2020
marmarek added a commit to QubesOS/qubes-release-configs that referenced this issue Feb 25, 2020
@0spinboson
Copy link

0spinboson commented Feb 26, 2020

I'm still using legacy boot, so updating dom0 went swimmingly, after I noticed I had to adjust the repo links in /etc/yum.repos.d/qubes-dom0.repo (to fc31) after step 5.
Only niggle: the grub boot screens were missing. :)
And I of course had to add gnttab_max_frames and gnttab_max_maptrack_frames values to the xen cmndline in /etc/default/grub, then regenerate grub.cfg, before rebooting.

updating fedora-30 template led to some issues as dnf refused to update xen-libs because of dependencies, didn't pursue that very far because I already had a fedora-31-xfce template configured.

@marmarek
Copy link
Member Author

I'm still using legacy boot, so updating dom0 went swimmingly, after I noticed I had to adjust the repo links in /etc/yum.repos.d/qubes-dom0.repo (to fc31) after step 5.

That's probably because of not updated qubes-release package. Do you know what qubes-release package version you had at that step (was it updated in step 5 or not)?

@0spinboson
Copy link

I seem to recall seeing a package with 4.1 in the name, but not sure, sorry

@marmarek
Copy link
Member Author

marmarek commented Apr 1, 2020

There are some upgrade problems described in a duplicate issue here: #4867

@DemiMarie
Copy link

I will add that the upgrade process should fail-safe ― if there is a failure during install, the system should still be usable.

@marmarek
Copy link
Member Author

marmarek commented Jun 1, 2020

Generally yes (and the script already tries to do that), but making a backup before is still recommended.
The current WIP script: https://github.com/fepitre/qubes-migration

@0spinboson
Copy link

0spinboson commented Jun 1, 2020

Left after upgrade to fc32:

fedora-obsolete-packages
pythons2-futures
python2-iniparse
python2-msgpack
python2-nose
python2-pycurl
python2-pygpgme
python2-systemd
python2-backports
python2-zmq
python2-backports
python2-chardet
python2-crypto
python2-qubesadmin
python2-singledispatch
python2-backgports_abc

All are listed under 'removing dependent packages' when running qubes-dom0-update, yet all are left in place.

@marmarek
Copy link
Member Author

marmarek commented Jun 3, 2020

During --dist-upgrade phase I've got an error like below. I started from R4.0.3 default install, in UEFI mode:

Error: Transaction check error:
  file /usr/bin/rst2html from install of python3-docutils-0.15.2-4.fc32.noarch conflicts with file from package python2-docutils-0.13.1-3.fc25.noarch
  file /usr/bin/rst2html5 from install of python3-docutils-0.15.2-4.fc32.noarch conflicts with file from package python2-docutils-0.13.1-3.fc25.noarch
  file /usr/bin/rst2latex from install of python3-docutils-0.15.2-4.fc32.noarch conflicts with file from package python2-docutils-0.13.1-3.fc25.noarch
  file /usr/bin/rst2man from install of python3-docutils-0.15.2-4.fc32.noarch conflicts with file from package python2-docutils-0.13.1-3.fc25.noarch
  file /usr/bin/rst2odt from install of python3-docutils-0.15.2-4.fc32.noarch conflicts with file from package python2-docutils-0.13.1-3.fc25.noarch
  file /usr/bin/rst2odt_prepstyles from install of python3-docutils-0.15.2-4.fc32.noarch conflicts with file from package python2-docutils-0.13.1-3.fc25.noarch
  file /usr/bin/rst2pseudoxml from install of python3-docutils-0.15.2-4.fc32.noarch conflicts with file from package python2-docutils-0.13.1-3.fc25.noarch
  file /usr/bin/rst2s5 from install of python3-docutils-0.15.2-4.fc32.noarch conflicts with file from package python2-docutils-0.13.1-3.fc25.noarch
  file /usr/bin/rst2xetex from install of python3-docutils-0.15.2-4.fc32.noarch conflicts with file from package python2-docutils-0.13.1-3.fc25.noarch
  file /usr/bin/rst2xml from install of python3-docutils-0.15.2-4.fc32.noarch conflicts with file from package python2-docutils-0.13.1-3.fc25.noarch
  file /usr/bin/rstpep2html from install of python3-docutils-0.15.2-4.fc32.noarch conflicts with file from package python2-docutils-0.13.1-3.fc25.noarch
  file /usr/share/doc/python-systemd/NEWS from install of python3-systemd-234-12.fc32.x86_64 conflicts with file from package python2-systemd-232-1.fc25.x86_64
  file /usr/share/doc/python-systemd/README.md from install of python3-systemd-234-12.fc32.x86_64 conflicts with file from package python2-systemd-232-1.fc25.x86_64
  file /etc/sysconfig/network-scripts/ifdown-ppp from install of network-scripts-ppp-2.4.7-35.fc32.x86_64 conflicts with file from package ppp-2.4.7-9.fc24.x86_64
  file /etc/sysconfig/network-scripts/ifup-ppp from install of network-scripts-ppp-2.4.7-35.fc32.x86_64 conflicts with file from package ppp-2.4.7-9.fc24.x86_64
  file /etc/sysconfig/network-scripts/ifdown-Team from install of network-scripts-teamd-1.30-2.fc32.x86_64 conflicts with file from package teamd-1.27-1.fc25.x86_64
  file /etc/sysconfig/network-scripts/ifdown-TeamPort from install of network-scripts-teamd-1.30-2.fc32.x86_64 conflicts with file from package teamd-1.27-1.fc25.x86_64
  file /etc/sysconfig/network-scripts/ifup-Team from install of network-scripts-teamd-1.30-2.fc32.x86_64 conflicts with file from package teamd-1.27-1.fc25.x86_64
  file /etc/sysconfig/network-scripts/ifup-TeamPort from install of network-scripts-teamd-1.30-2.fc32.x86_64 conflicts with file from package teamd-1.27-1.fc25.x86_64
  file /usr/lib64/qt4/plugins/designer/libpyqt4.so from install of python3-PyQt4-4.12.3-11.fc32.x86_64 conflicts with file from package PyQt4-4.11.4-15.fc25.x86_64
  file /etc/dracut.conf.d/plymouth-missing-fonts.conf from install of qubes-artwork-plymouth-4.1.4-2.fc32.noarch conflicts with file from package qubes-artwork-4.0.1-2.fc25.noarch
  file /usr/share/plymouth/themes/qubes-dark/qubes-logo-outline.png from install of qubes-artwork-plymouth-4.1.4-2.fc32.noarch conflicts with file from package qubes-artwork-4.0.1-2.fc25.noarch
  file /usr/share/plymouth/themes/qubes-dark/qubes-logo-solid.png from install of qubes-artwork-plymouth-4.1.4-2.fc32.noarch conflicts with file from package qubes-artwork-4.0.1-2.fc25.noarch

@andrewdavidwong andrewdavidwong added P: major Priority: major. Between "default" and "critical" in severity. and removed P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. labels Jun 4, 2020
marmarek added a commit to QubesOS/qubes-linux-dom0-updates that referenced this issue Jun 5, 2020
Upgrade path to fc32 depends on a packaging change in
python-systemd-234-4. Otherwise python2-systemd conflicts with
python3-systemd on files, but lack appropriate package metadata about
it.

QubesOS/qubes-issues#5685
marmarek added a commit to marmarek/qubes-dist-upgrade that referenced this issue Nov 24, 2021
marmarek added a commit to marmarek/qubes-dist-upgrade that referenced this issue Nov 24, 2021
marmarek added a commit to QubesOS/openqa-tests-qubesos that referenced this issue Nov 29, 2021
- legacy file pool on LVM
- Debian (default one, and 10) as default template, including sys-net and
  sys-firewall

QubesOS/qubes-issues#5685
marmarek added a commit to marmarek/qubes-core-admin-linux that referenced this issue Nov 30, 2021
The updatevm already downloads packages only (either using
--downloadonly option internally, or using yumdownloader). The option
from the user matters for the dnf call in dom0 only.

Especially, sending --downloadonly breaks downloading updates via
debian-10, which uses yumdownloader, as it doesn't have --downloadonly
option.

This is mostly relevant for R4.0 -> R4.1 upgrade (and maybe some future
too?), as the process uses --downloadonly option internally.

QubesOS/qubes-issues#5685
marmarek added a commit to marmarek/qubes-core-admin-linux that referenced this issue Dec 3, 2021
The updatevm already downloads packages only (either using
--downloadonly option internally, or using yumdownloader). The option
from the user matters for the dnf call in dom0 only.

Especially, sending --downloadonly breaks downloading updates via
debian-10, which uses yumdownloader, as it doesn't have --downloadonly
option.

This is mostly relevant for R4.0 -> R4.1 upgrade (and maybe some future
too?), as the process uses --downloadonly option internally.

QubesOS/qubes-issues#5685

(cherry picked from commit 0018597)
@marmarek
Copy link
Member Author

marmarek commented Jan 17, 2022

  1. Stage 4 completely fails with debian-10 updatevm templates. Error: yumdownloader: no such option: --downloadonly
    This resulted in the system breaking (I couldn't switch the template anymore).

I tried to get it working with debian-10 as updatevm, but sadly the changes required are too severe. In short: yumdownloader cannot parse "rich dependencies", which are used in some newer packages, and there is no way to get dnf into debian-10. So, I'll adjust the tool to complain early about this case, and let the user change updatevm (template) while it is still possible.

@DemiMarie
Copy link

I recommend taking a (LVM or BTRFS) snapshot of the dom0 root filesystem, so that in the event of a disaster the user has a chance to recover.

@marmarek
Copy link
Member Author

I recommend taking a (LVM or BTRFS) snapshot of the dom0 root filesystem, so that in the event of a disaster the user has a chance to recover.

It may help in some situations, but actually not that many. To restore from such snapshot, you'd need to reboot. At this point (when recovering from failed stage 4), you'll have all the templates and standalones already upgraded to R4.1, and it won't be compatible with reverted dom0 (we provide only backward compatibility, not forward compatibility, and both GUI and qrexec protocols have changed). So, you may get dom0 back to running, but not much more. You could try to restore templates from snapshots too, if you really want (and have one)...

The dist upgrade tool tries to check assumptions before performing the upgrade and refuse to continue when there is still a chance (this now includes all the cases reported in this issue too). And it also prints a huge warning about doing the backup before. The cases that are easy to recover are handled already, but otherwise the recommended way is to either retry directly (which should work if the failure was some intermittent download issue), or reinstall (possibly R4.1 already) and restore from backup - like in the other upgrade procedure.
With complex(*) fallback options, I would really want to avoid situation where mitigation that would help in very few cases anyway, would risk breaking the common case. And it is the case here, because for example extra snapshot could make the upgrade run out of disk space (especially when the VMs are stored in dom0 filesystem too, but it is a risk even without that).

@DemiMarie
Copy link

I recommend taking a (LVM or BTRFS) snapshot of the dom0 root filesystem, so that in the event of a disaster the user has a chance to recover.

It may help in some situations, but actually not that many. To restore from such snapshot, you'd need to reboot. At this point (when recovering from failed stage 4), you'll have all the templates and standalones already upgraded to R4.1, and it won't be compatible with reverted dom0 (we provide only backward compatibility, not forward compatibility, and both GUI and qrexec protocols have changed). So, you may get dom0 back to running, but not much more. You could try to restore templates from snapshots too, if you really want (and have one)...

I wonder if we should intentionally reboot dom0 after upgrading it, and only upgrade the templates and standalones afterwards. What do you think?

@marmarek
Copy link
Member Author

I wonder if we should intentionally reboot dom0 after upgrading it, and only upgrade the templates and standalones afterwards. What do you think?

That could work, but it would make the upgrade process less convenient. Currently we exploit the fact that running VMs still works even after upgrading their templates, to perform the upgrade in the opposite order, requiring just one reboot instead of two. This way, you can call qubes-dist-upgrade --all --yes and leave it alone for some time, then just reboot and everything is done. With extra reboot, you'd need to sit there and do the reboot in the middle of the process.

@DemiMarie
Copy link

I wonder if we should intentionally reboot dom0 after upgrading it, and only upgrade the templates and standalones afterwards. What do you think?

That could work, but it would make the upgrade process less convenient. Currently we exploit the fact that running VMs still works even after upgrading their templates, to perform the upgrade in the opposite order, requiring just one reboot instead of two. This way, you can call qubes-dist-upgrade --all --yes and leave it alone for some time, then just reboot and everything is done. With extra reboot, you'd need to sit there and do the reboot in the middle of the process.

Personally, I consider the improved reliability to be worthwhile. Rebooting dom0 after a successful upgrade means that we can take a snapshot of the dom0 filesystem before the upgrade, and revert to it if the upgrade fails. We could also use Fedora’s offline system upgrade feature. We can also clone at least one of the templates before upgrading them, ensuring that the user will always be able to recover their system.

@marmarek
Copy link
Member Author

marmarek commented Feb 1, 2022

While some of those ideas are okay, it is way too late to do major restructure of the upgrade process, that has been confirmed to work for many users already. Please propose such major structure changes earlier, when we'll prepare upgrades for R4.2.

marmarek added a commit to marmarek/qubes-dist-upgrade that referenced this issue Feb 3, 2022
@marmarek marmarek closed this as completed Feb 4, 2022
@andrewdavidwong andrewdavidwong added C: dist upgrade The code and tools that support upgrading in-place from one Qubes OS release to another and removed C: other labels Jul 27, 2022
@ViliusSutkus89
Copy link

I think I've managed to upgrade qvm-appmenus before the 4.0 -> 4.1 dist upgrade, which breaks the dist upgrade.
qubes-dist-upgrade --post-reboot fails with an error:

usage: qvm-appmenus [--verbose] [--quiet] [--help] [--init] [--create]
                    [--remove] [--update] [--get-available] [--get-whitelist]
                    [--set-whitelist PATH | --set-default-whitelist PATH]
                    [--source SOURCE] [--force]
                    [--i-understand-format-is-unstable] [--file-field FIELDS]
                    VMNAME [VMNAME ...]
qvm-appmenus: error: the following arguments are required: VMNAME

@andrewdavidwong
Copy link
Member

@ViliusSutkus89: Please note that the issue tracker (qubes-issues) is not intended to serve as a help desk or tech support center. Instead, we've set up other venues where you can ask for help and support, ask questions, and have discussions. (By contrast, the issue tracker is more of a technical tool intended to support our developers in their work.) Thank you for your understanding!

@ViliusSutkus89
Copy link

@andrewdavidwong Oh no, I don't need help or tech support, I just wanted to report a possible failure scenario. No idea why I didn't open a new issue. Sorry about that.

BTW, Qubes is extra cool, thanks for working on this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: dist upgrade The code and tools that support upgrading in-place from one Qubes OS release to another C: doc P: major Priority: major. Between "default" and "critical" in severity. T: task Type: task. An action item that is neither a bug nor an enhancement.
Projects
None yet
Development

No branches or pull requests

8 participants