-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade build chroot and debian containers to buster #464
Conversation
a8728be
to
f7bfa96
Compare
When the build chroot fails to install, print out some of the debootstrap output, so that more information is available when the build fails only in CI.
The prometheus metric 'node_systemd_system_running' reports the state of 'systemctl is-system-running', which requires that no units have failed. As such, this will only succeed if all of our other checks succeed. It provides less helpful information than a service-by-service list of which services are wrong. So move it to the end of the function, after we do the checks that would give us more detailed reporting.
Previously, we didn't run the command to clean up malformed fakechroot links after installing debian packages. This meant that installing iptables for flannel would create a malformed link at the path /usr/sbin/iptables, which would prevent flannel from resolving it. This was uncovered by the upgrade to buster, in which iptables is now routed through /etc/alternatives/iptables.
Because we immediately would jump into issuing commands to the supervisor, it was possible that we would do so slightly before the SSH server was actually online, which would cause the install to abort. Introduce an explicit wait so that we don't proceed until the server is ready for us to do so.
8e8bcf4
to
3100ea8
Compare
ad4022e
to
79f29a2
Compare
@@ -23,3 +23,7 @@ | |||
continue | |||
os.remove(path) | |||
os.symlink(os.path.join("/", rootrel), path) | |||
|
|||
if os.path.islink(os.path.join(rootfs, "proc")): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add a comment or todo in the code as well.
We have a Jenkins-only bug where /proc is a symlink to itself in the flannel container, which breaks a lot of things. We don't know why, exactly, this is happening -- but we need to mitigate it.
We can't have flannel monitors until we have OCI launches, which we can't have until we have flannel working, which we can't have until we have OCI pulling. Split up the order so that debugging information is more useful.
79f29a2
to
3261875
Compare
Current status, so that I remember: I'm currently running into a problem where the newer qemu available on buster objects to having the same CD inserted into multiple VMs simultaneously:
This is reasonable, because they really should be read-only, but when I try to set them read-only, I get another error:
|
It looks like it's impossible to set |
@krawthekrow I believe this is ready for review now whenever you have the chance |
Can we see if we can use the |
We pull down the host key for the supervisor via a process involving ssh-keyscan and a set of fingerprints scraped from the MOTD of the supervisor. For some reason, on Buster, ssh isn't always running at the point the MOTD is printed -- so we don't always find the SSH hostkeys we need. Add a loop to wait for the SSH server to be up before continuing with the pull process.
This updates the script used to validate new versions of the debian ISO provided by upstream, along with the current version specification. This update is part of the overall upgrade from stretch to buster. Without it, the version of CRI-O we compile will crash due to the newer libc being missing.
- Remove transitional realpath package (replaced by coreutils) - Replace transitional btrfs-tools package with btrfs-progs - Replace unsupported openjdk-8-jdk with supported openjdk-11-jdk - Add libtinfo5 because it's now needed for container fakeroot setup - Add perl-openssl-defaults and dbus-user-session because they seem to be required for the chroot to install, though it's not quite clear why. - Add libbtrfs-dev, because it needs to be pulled in explicitly for the CRI-O build now. Note that this commit will not build correctly at this point, because using a buster chroot means the fakeroot/fakechroot invocation used to set up debian containers will not have the correct host libraries. As well, a buggy version of debootstrap is currently in the debian repo, which also breaks the debian container scripts. The next two commits will be required for a successful build.
debootstrap bug #946783 means that the latest version on buster does not function correctly with both --unpack-tarball and --foreign. Pin an older version without this bug until it's fixed on stable. Note that this commit is not expected to build; the next commit will be required for reasons previously described.
Stretch is an entire release behind -- and we're over a year out of date. Get us to the latest version of the latest release. This makes it possible to build again, because the build chroot now matches the container release closely enough that the fakeroot/ fakechroot combo can have the right libraries available.
As part of the larger upgrade effort.
There are still places where we use Stretch systems, so in some cases we accept using Stretch instead of Buster.
When we're using a buster deploy-chroot with a non-buster host, it's likely that the kernel versions might not match, which would lead to a failure to run modprobe in the chroot. When we actually do need to run modprobe, there's not much we can do, but when we don't actually need to run the command, we can easily bypass it.
Buster doesn't come with gpg by default. Make sure it's installed. I'm not sure if Stretch did, but I assume so.
Add instructions to set up a build on RHOMBI, including local rsync, but not including two minor issues that will be addressed in the next commits.
Currently, builds started on RHOMBI cannot upload their binaries to RHOMBI directly, because the relevant section of the filesystem is not accessible. Add an optional helper to set up a bind mount, and document how to use it.
When we have different versions of GPG inside and outside of the build chroot, which can occur when the host is stretch and the chroot is buster, the //upload target may fail to sign packages through reprepro. This is because the keys transferred in by the host might not be compatible with the new gpg version; this can be fixed by running any gpg command, so run gpg --list-keys on entry. This was confusing to debug, because the error was a "key not found" error, and if you then ran gpg --list-keys to check, you'd see it did actually exist, and then if you ran the upload again, it would succeed... but only until the next time you reopened the chroot.
It is frequently useful for debugging to have access to the exact invocations of the qemu instances that spire executes, but learning that information usually requires modifying the code to add debugging statements, or carefully extracting the info from /proc/<pid>/cmdline. Add an option to print the command lines before the VMs are launched, in a form that can be immediately executed in the debug shell.
It turns out that, if we send all the characters at once, newer qemus will end up dropping the characters sent after the first 15-byte buffer of input. So add a short delay between each character in order to let the launched VM handle them all.
Previously, the conditionals for user-grant were ordered such that, if there were the appropriate temporary credentials, it would attempt to verify that the service worked, even if it was disabled in the configuration. (This could happen if, for example, you were using spire with user-grant enabled, and then disabled it.) Reorder the conditionals so that user-grant is not verified in this case, because there's no way for verification to succeed.
The newer version of qemu provided by default on buster turns on file locking by default, which prevents us from installing multiple nodes at the same time. Turn on snapshot mode, so that none of the VMs will write back changes to the USB drive -- which is what we want, anyway. This lets us keep installing multiple VMs at the same time. We unfortunately can't just set the disk to read-only, because IDE apparently doesn't support that except for CDROMs, so qemu won't let us enable the option.
f5fbb0b
to
5e2d884
Compare
Merged according to @cryslith's approval of the non-rebased version. |
This PR upgrades us to the latest version of Debian, including working around bug 946783 in debootstrap.
Note that the deploy chroot does not have an automated test, and that I'm not intending to add one in this PR.
Fixes #428.
Checklist: