Release Flatcar Container Linux Alpha 3480.0.0, Beta 3446.1.0, Stable 3374.2.2 #946

sayanchowdhury · 2023-01-11T10:01:35Z

sayanchowdhury · 2023-01-11T10:03:29Z

alpha 3480.0.0 http://jenkins.infra.kinvolk.io:8080/job/container/job/sdk/516/cldsv/
beta 3446.1.0 http://jenkins.infra.kinvolk.io:8080/job/container/job/packages_all_arches/1045/cldsv/
stable 3374.2.2 http://jenkins.infra.kinvolk.io:8080/job/container/job/packages_all_arches/1046/cldsv/
lts 3033.3.9 http://jenkins.infra.kinvolk.io:8080/job/container/job/packages_all_arches/1034/cldsv/

schweinchendick · 2023-01-12T08:36:32Z

The alpha 3480.0.0 currently causes an install boot loop.
Coming from 3446.0.0 the logsmithd issues a reboot and tries to install the new system. Then fails and restores the previous alpha.
Investigation ongoing. But it worked for years previously.

schweinchendick · 2023-01-12T09:39:44Z

Switching the machine to beta channel: same loop.
Investigation currently impossible as the machine writes no logs during update; only in the "real" system.
More investigations follow on Sunday during planned maintenance window.

jepio · 2023-01-12T10:19:10Z

Thanks for testing alpha & beta. We're trying to repro but so far this seems isolated. Any chance you see something happening on the serial console that would hint to what is going wrong?

schweinchendick · 2023-01-12T10:26:42Z

I can not (currently). On Sunday no problem.
The machine is from around 2016 (the CoreOS days) and was automatically migrated since then. Unverified assumption: boot partition too small. Currently 11MB left. I'm currently running more tests on fresh installs, but too early to confirm anything.

pothos · 2023-01-12T11:44:05Z

Are you able to shortly log in with SSH? Woud be great to get the journalctl output.
Yes, the boot part problem is on the radar for some time and I'll look into freeing up some larger chunks.

schweinchendick · 2023-01-12T12:05:11Z

I can log in as long as often as you like, but can't afford long reboots and failing installs for now. But as mentioned above there is no journalctl output for the time period the (failing) update takes place. The last entries are "systemd[1]: Reached target System Reboot. (...) systemd-journald[694]: Journal stopped", then silence, and after ~10 minutes "kernel: Linux version 5.15.81-flatcar (...)".

pothos · 2023-01-12T12:18:27Z

Thanks for the quick response, so it doesn't seem to boot - journalctl -u update-engine --no-pager would be good to see if anything suspicious happened during download/installation of the new update.

pothos · 2023-01-12T12:22:49Z

I wonder if it could be related to the change about running systemd-tmpfile and flatcar-tmpfiles in the initrd. Sidenote: Currently it seems to be a mix of gracefully continuing and hard boot failures depending of the type of error the root setup script would encounter. Both approaches have pros and cons…

schweinchendick · 2023-01-12T12:50:17Z

The logs since July 2022 always show "Update successfully applied, waiting to reboot." Even for this release, The only meaningless difference is the ever increasing "Payload Attempt Number = 61" for this release. But as I have stopped the update-engine it does not try again. Resuming on Sunday...

On fresh installs the update works flawlessly. Must be something isolated which I cannot pin down at the moment. I report back as soon as I have full control over the machine.

schweinchendick · 2023-01-15T12:41:12Z

We recorded a boot sequence video available at https://lohmann.ml/sl/hausbesuch-wahnsinnig
The video starts with the shutdown sequence after successful installation. Then it boots right into emergency shell without any visible errors. And after the timeout expires just reboots into the old system.

During boot one can see the grub menu containing "CoreOS" entries as this is a former CoreOS. Maybe this could be a problem as the /boot partition is not upgraded at all. It has still all the files from 2016 when the machine was installed. I have never seen a CoreOS or FlatcarOS machine upgrading its /boot partition. Shouldn't that be done sometimes?

If you want I can try to provide an image of that machine without the confidential docker images and volumes for further analysis.

jepio · 2023-01-16T09:38:27Z

Can you press enter on that console, type journalctl and find out what happens after the rootfs is mounted (about 2.0 seconds into the boot in that video)? That's when the system decides to not proceed with the boot and fail in the initrd.

A snapshot of that vm without any of the docker images/volumes would work too, but we would need all other state in the root filesystem.

schweinchendick · 2023-01-16T20:24:41Z

Hitting Enter does not result into a real emergency shell. The system then enters a 15 second boot loop which can only be resolved by a (virtual) reset.
Therefore here is the machine disk image without the docker images and volumes: https://lohmann.ml/sl/gesamtjahr-rechtslage
Volume A contains the "good" system and Volume B is this release which shows the odd behavior.

pothos · 2023-01-16T20:48:57Z

Thanks a lot!
Hitting Enter works for me, maybe it depends on the console setup, I used QEMU via serial console and console=ttyS0.
I have to apologize, my change for running systemd-tmpfiles not only on first boot introduced the regression because it can't resolve the core group for some reason. Thanks a lot for reporting so quickly and preventing it from hitting Stable.

[    2.226858] systemd[1]: Starting Root filesystem setup...
[    2.243882] systemd-tmpfiles[503]: /sysroot/usr/lib/tmpfiles.d/baselayout-home.conf:1: Failed to resolve group 'core'.
[    2.245208] systemd-tmpfiles[503]: /sysroot/usr/lib/tmpfiles.d/baselayout-home.conf:2: Failed to resolve group 'core'.
[    2.246496] systemd-tmpfiles[503]: /sysroot/usr/lib/tmpfiles.d/baselayout-home.conf:3: Failed to resolve group 'core'.
[    2.247895] systemd-tmpfiles[503]: /sysroot/usr/lib/tmpfiles.d/baselayout-home.conf:4: Failed to resolve group 'core'.
[    2.249156] systemd-tmpfiles[503]: /sysroot/usr/lib/tmpfiles.d/baselayout-home.conf:5: Failed to resolve group 'core'.
[    2.250974] systemd[1]: initrd-setup-root.service: Main process exited, code=exited, status=65/DATAERR
[FAILED] Failed to start Root filesystem setup.

pothos · 2023-01-16T20:56:40Z

As workaround you can do echo core:x:500: | sudo tee -a /etc/group once and then the update will boot.

I guess we could fix https://github.com/flatcar/baselayout/blob/flatcar-master/scripts/flatcar-tmpfiles to also copy the list of wanted users over if they don't exist, not only if the file doesn't exist.

schweinchendick · 2023-01-17T09:46:30Z

Actually I don't need a workaround right now. I would rather like to see it fixed in the next release so that it is helpful for everybody. For me there is no real need to switch to this release. I'll skip it until the underlying issue is fixed.

schweinchendick · 2023-01-17T14:36:33Z

On the cloned machine the workaround solved the problem. On the real machine I'm waiting for a new release by halting the update-engine service.

sayanchowdhury added the kind/release label Jan 11, 2023

dongsupark changed the title ~~Release Flatcar Container Linux Alpha 3480.0.0, Beta 3446.1.0, Stable 3374.2.2, LTS 3033.3.9~~ Release Flatcar Container Linux Alpha 3480.0.0, Beta 3446.1.0, Stable 3374.2.2 Jan 11, 2023

pothos mentioned this issue Jan 17, 2023

flatcar-tmpfiles: Always copy missing entries over to the database flatcar/baselayout#26

Merged

pothos closed this as completed in flatcar/baselayout#26 Jan 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release Flatcar Container Linux Alpha 3480.0.0, Beta 3446.1.0, Stable 3374.2.2 #946

Release Flatcar Container Linux Alpha 3480.0.0, Beta 3446.1.0, Stable 3374.2.2 #946

sayanchowdhury commented Jan 11, 2023 •

edited

Loading

sayanchowdhury commented Jan 11, 2023 •

edited

Loading

schweinchendick commented Jan 12, 2023

schweinchendick commented Jan 12, 2023

jepio commented Jan 12, 2023

schweinchendick commented Jan 12, 2023

pothos commented Jan 12, 2023

schweinchendick commented Jan 12, 2023

pothos commented Jan 12, 2023

pothos commented Jan 12, 2023 •

edited

Loading

schweinchendick commented Jan 12, 2023

schweinchendick commented Jan 15, 2023

jepio commented Jan 16, 2023

schweinchendick commented Jan 16, 2023

pothos commented Jan 16, 2023 •

edited

Loading

pothos commented Jan 16, 2023

schweinchendick commented Jan 17, 2023

schweinchendick commented Jan 17, 2023

Release Flatcar Container Linux Alpha 3480.0.0, Beta 3446.1.0, Stable 3374.2.2 #946

Release Flatcar Container Linux Alpha 3480.0.0, Beta 3446.1.0, Stable 3374.2.2 #946

Comments

sayanchowdhury commented Jan 11, 2023 • edited Loading