CoreOS won't start, ends with audit log type 1131 #483

AlexGustafsson · 2020-05-18T18:25:28Z

Backstory

I've been running CoreOS for only a few weeks, it's mostly been working great, but a breaking issue has popped up every now and then.

I've installed CoreOS on bare metal on four different types of consumer grade hardware. During the installation, this issue came up a lot, causing me to reinstall the OS several times on different machine types.

Since the installation, it has occurred during reboot twice for two different machine types.

Issue description

It happens during boot (first boot after install, or a regular reboot)
The boot process seems to come a long way, but gets stuck at a log line which is always (mostly) the same
The line may be repeated, but nothing else is ever shown
If this is the first reboot, the machine will have to be reinstalled
If it is a "regular" boot, the machine can be reinstalled and will boot again

Here is a somewhat difficult-to-read image of one occurrence of the issue:

A transcription of the last log line:

audit: type=1131 audit(1588149656.742:68): pid=1 uid=0 auid=4294967295 ses=4294967295 msg=´unit=systemd-rfkill comm="systems" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success´

Here is another yet more difficult-to-read image of another occurrence on another machine:

A transcription of the last log lines:

audit: type=1131 audit(1589731456.755:68): pid=1 uid=0 auid=4294967295  ses=4294967295  subj=system_u:system_r:init_t:s0 msg=´unit=emergency comm="systems" comm="systems" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success´

audit: type=1131 audit(1589731461.010:69): pid=1 uid=0 auid=4294967295  ses=4294967295  subj=system_u:system_r:init_t:s0 msg=´unit=systemd-rfkill comm="systems" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success´

Output of sudo rpm-ostree status on one of the machines (identical setup):

$ sudo rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● ostree://fedora:fedora/x86_64/coreos/stable
                   Version: 31.20200420.3.0 (2020-05-06T16:19:21Z)
                BaseCommit: b3fc3a3e8513d7e424d0ced1e2517484cb766d238951f2fdec3da2fed3522efb
              GPGSignature: Valid signature by 7D22D5867F2A4236474BF7B850CB390B3C3359C4
           LayeredPackages: glusterfs-server

  ostree://fedora:fedora/x86_64/coreos/stable
                   Version: 31.20200420.3.0 (2020-05-06T16:19:21Z)
                BaseCommit: b3fc3a3e8513d7e424d0ced1e2517484cb766d238951f2fdec3da2fed3522efb
              GPGSignature: Valid signature by 7D22D5867F2A4236474BF7B850CB390B3C3359C4
           LayeredPackages: glusterfs

The text was updated successfully, but these errors were encountered:

lucab · 2020-05-19T08:04:27Z

Thanks for the report! I have a feeling that your machine may be spitting more output on the primary console, which is not the one you are looking at.

Can you please try tweaking the kernel args in grub to only keep the VGA console (as described here coreos/fedora-coreos-docs#74), and report back if there are some useful details?

lucab · 2020-05-19T08:06:04Z

Additionally, it looks like this machine has some further customizations (I see multiple deployments and some layered packages).
Can you please attach the Ignition config being used (redacting any secrets first)?

AlexGustafsson · 2020-05-19T08:31:03Z

The following ignition file was used:

variant: fcos
version: 1.0.0
passwd:
  users:
    - name: core
      ssh_authorized_keys:
        - "ssh-ed25519 AAA..."

The following commands were used to install the packages seen in the status report above:

sudo rpm-ostree install glusterfs
# Realizing it was the wrong package
sudo rpm-ostree uninstall glusterfs
# Installing the correct package
sudo rpm-ostree install glusterfs-server

These commands were run manually after some time. The issue still occurred during installation, but I'm pretty sure the second time it occurred were after rebooting after having installed the glusterfs-server package. One of the affected machines, however had restarted fine after installing it. So there might not be too much of a correlation.

As for the VGA console, I'll try to get back ASAP if I get more helpful output. During boot, there is a lot of output which ends with something along the line of "switching to colored terminal", then most of the output is removed, leaving either the prompt of the SSH keys used, or the screen shown in the images above.

AlexGustafsson · 2020-05-23T14:27:27Z

There were some splitting going on, just as you mentioned. I was able to get into the emergency mode by only leaving console=tty0 as the console option in the grub entry.

There is one failed unit:

ostree-remount.service

The only error during boot, via journalctl -xb

The last lines of journalctl -xb

It seems to me that my audit type=1131 is merely an affect of the mount error.

AlexGustafsson · 2020-05-23T14:40:39Z

I'll add that this time it happened at least twice in a row while rebooting, first during a "normal" reboot (I've been running a script to randomly restart servers throughout the day in order to test this), and then when rebooting after having changed the grub setting. Rebooting once more let me into the system. Without warnings or errors.

Would it be sensible to mitigate this by editing the systemd unit for the ostree mount and add a restart if it fails, instead of the current behavior discussed here ostreedev/ostree#1697 (I think)?

lucab · 2020-05-23T15:12:49Z

Thanks for the followup! I think this is another symptom of the same mount-race issue at #488.
If that is true, next testing release should be not suffering this anymore.

AlexGustafsson · 2020-05-23T15:34:58Z

@lucab Do you happen to know when the next testing release is scheduled to arrive?

dustymabe · 2020-05-23T22:13:53Z

@lucab Do you happen to know when the next testing release is scheduled to arrive?

You can follow along at coreos/fedora-coreos-streams#108. Note that this is the first time we're planning to switch over to Fedora 32, so it will be a big update.

If you just want to test to see if the problem goes away you can use a development build from https://builds.coreos.fedoraproject.org/browser. The latest testing-devel is 31.20200521.20.0.

jlebon · 2020-05-25T20:48:36Z

Yes, this is the same issue as #488. Let's close this one and track the fix there.

dustymabe · 2020-06-04T14:03:49Z

Yes, this is the same issue as #488. Let's close this one and track the fix there.

Assuming the above statement is correct, the change to correct this problem landed in coreos/fedora-coreos-config#416.

The fix for this went into testing stream release 32.20200601.2.1. Please try out the new release and report issues.

dustymabe · 2020-06-19T16:27:57Z

The fix for this went into stable stream release 32.20200601.3.0.

lucab added the needs/more-information label May 19, 2020

lucab added component/rpm-ostree kind/bug status/pending-testing-release Fixed upstream. Waiting on a testing release. and removed needs/more-information labels May 23, 2020

lucab mentioned this issue May 23, 2020

Read-only /sysroot remount racing with other systemd units #488

Closed

jlebon closed this as completed May 25, 2020

dustymabe added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed status/pending-testing-release Fixed upstream. Waiting on a testing release. labels Jun 4, 2020

dustymabe removed the status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. label Jun 19, 2020

khuedoan mentioned this issue Jun 17, 2021

CoreOS randomly failed to boot when install to disk on bare metal #870

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CoreOS won't start, ends with audit log type 1131 #483

CoreOS won't start, ends with audit log type 1131 #483

AlexGustafsson commented May 18, 2020

lucab commented May 19, 2020

lucab commented May 19, 2020

AlexGustafsson commented May 19, 2020

AlexGustafsson commented May 23, 2020

AlexGustafsson commented May 23, 2020 •

edited

Loading

lucab commented May 23, 2020

AlexGustafsson commented May 23, 2020

dustymabe commented May 23, 2020

jlebon commented May 25, 2020

dustymabe commented Jun 4, 2020

dustymabe commented Jun 19, 2020

CoreOS won't start, ends with audit log type 1131 #483

CoreOS won't start, ends with audit log type 1131 #483

Comments

AlexGustafsson commented May 18, 2020

Backstory

Issue description

lucab commented May 19, 2020

lucab commented May 19, 2020

AlexGustafsson commented May 19, 2020

AlexGustafsson commented May 23, 2020

AlexGustafsson commented May 23, 2020 • edited Loading

lucab commented May 23, 2020

AlexGustafsson commented May 23, 2020

dustymabe commented May 23, 2020

jlebon commented May 25, 2020

dustymabe commented Jun 4, 2020

dustymabe commented Jun 19, 2020

AlexGustafsson commented May 23, 2020 •

edited

Loading