Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flatcar boot.mount fails after restart with 3815.2.0 #1417

Open
aqilbeig opened this issue Apr 4, 2024 · 11 comments
Open

flatcar boot.mount fails after restart with 3815.2.0 #1417

aqilbeig opened this issue Apr 4, 2024 · 11 comments
Labels
kind/bug Something isn't working

Comments

@aqilbeig
Copy link

aqilbeig commented Apr 4, 2024

Description

We are migrating our k8s workers to flatcar 3815.2.0; however, we found that boot.mount service fails in case the VM gets rebooted:

× boot.mount - Boot partition
     Loaded: loaded (/usr/lib/systemd/system/boot.mount; static)
     Active: failed (Result: exit-code) since Thu 2024-04-04 16:41:42 UTC; 9min ago
TriggeredBy: ● boot.automount
      Where: /boot
       What: /dev/disk/by-label/EFI-SYSTEM
        CPU: 3ms

Apr 04 16:41:42 ip-10-71-12-10.ec2.internal systemd[1]: Mounting boot.mount - Boot partition...
Apr 04 16:41:42 ip-10-71-12-10.ec2.internal mount[1892]: mount: /boot: unknown filesystem type 'vfat'.
Apr 04 16:41:42 ip-10-71-12-10.ec2.internal mount[1892]:        dmesg(1) may have more information after failed mount system call.
Apr 04 16:41:42 ip-10-71-12-10.ec2.internal systemd[1]: boot.mount: Mount process exited, code=exited, status=32/n/a
Apr 04 16:41:42 ip-10-71-12-10.ec2.internal systemd[1]: boot.mount: Failed with result 'exit-code'.
Apr 04 16:41:42 ip-10-71-12-10.ec2.internal systemd[1]: Failed to mount boot.mount - Boot partition.

Impact

This is impacting other services like systemd-boot-update or systemd-sysext and they are failing too which is turn making the node as NotReady after reboot

Failed Units: 6
  boot.mount
  bpf-insights.service
  crio.service
  systemd-boot-update.service
  systemd-sysext.service
× systemd-sysext.service - Merge System Extension Images into /usr/ and /opt/
     Loaded: loaded (/usr/lib/systemd/system/systemd-sysext.service; disabled; preset: disabled)
     Active: failed (Result: exit-code) since Thu 2024-04-04 16:41:42 UTC; 15min ago
       Docs: man:systemd-sysext.service(8)
    Process: 1873 ExecStart=systemd-sysext merge (code=exited, status=1/FAILURE)
   Main PID: 1873 (code=exited, status=1/FAILURE)
        CPU: 9ms

Apr 04 16:41:42 ip-10-71-12-10.ec2.internal systemd[1]: Starting systemd-sysext.service - Merge System Extension Images into /usr/ and /opt/...
Apr 04 16:41:42 ip-10-71-12-10.ec2.internal systemd-sysext[1873]: Failed to read metadata for image docker-flatcar: No such device
Apr 04 16:41:42 ip-10-71-12-10.ec2.internal systemd[1]: systemd-sysext.service: Main process exited, code=exited, status=1/FAILURE
Apr 04 16:41:42 ip-10-71-12-10.ec2.internal systemd[1]: systemd-sysext.service: Failed with result 'exit-code'.
Apr 04 16:41:42 ip-10-71-12-10.ec2.internal systemd[1]: Failed to start systemd-sysext.service - Merge System Extension Images into /usr/ and /opt/.

Flatcar version information:

ip-10-71-12-10 ~ # uname -a
Linux ip-10-71-12-10.ec2.internal 6.1.77-flatcar #1 SMP PREEMPT Mon Feb 12 21:16:07 -00 2024 aarch64 GNU/Linux
ip-10-71-12-10 ~ # cat /etc/os-release
NAME="Flatcar Container Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=3815.2.0
VERSION_ID=3815.2.0
BUILD_ID=2024-02-12-2202
SYSEXT_LEVEL=1.0
PRETTY_NAME="Flatcar Container Linux by Kinvolk 3815.2.0 (Oklo)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar.org/"
BUG_REPORT_URL="https://issues.flatcar.org"
FLATCAR_BOARD="arm64-usr"
CPE_NAME="cpe:2.3:o:flatcar-linux:flatcar_linux:3815.2.0:*:*:*:*:*:*:*"

Environment and steps to reproduce

  • Launch a node with ami Flatcar-stable-3815.2.0-hvm on AWS
  • cordon/drain the node
  • reboot the node

Expected behavior

boot.mount should be running after restart

Additional information

Please add any information here that does not fit the above format.

@aqilbeig
Copy link
Author

aqilbeig commented Apr 4, 2024

dmesg.txt

@jepio
Copy link
Member

jepio commented Apr 4, 2024

Can you upload the full journal contents? sudo journalctl -b0

Can you share your ignition file as well?

@jepio
Copy link
Member

jepio commented Apr 4, 2024

Are you blocking modprobe somehow? This line from dmesg suggests something is wrong with module loading in general.

[    4.447521] request_module fs-squashfs succeeded, but still no fs?

@aqilbeig
Copy link
Author

aqilbeig commented Apr 4, 2024

output of cat /etc/modprobe.d/blacklist.conf

blacklist cramfs  # CIS v2.0.0 1.1.1.1
blacklist freevxfs  # CIS v2.0.0 1.1.1.2
blacklist jffs2  # CIS v2.0.0 1.1.1.3
blacklist hfs  # CIS v2.0.0 1.1.1.4
blacklist hfsplus  # CIS v2.0.0 1.1.1.5
# Docker and Containerd are now sysext images built with squashfs
# blacklist squashfs  # CIS v2.0.0 1.1.1.6
blacklist udf  # CIS v2.0.0 1.1.1.7
blacklist vfat  # CIS v2.0.0 1.1.1.8
blacklist usb-storage  # CIS v2.0.0 1.1.23
blacklist dccp  # CIS v2.0.0 3.4.1
blacklist sctp  # CIS v2.0.0 3.4.2
blacklist rds  # CIS v2.0.0 3.4.3
blacklist tipc  # CIS v2.0.0 3.4.4

@jepio
Copy link
Member

jepio commented Apr 4, 2024

Please remove these lines:

blacklist squashfs  # CIS v2.0.0 1.1.1.6
blacklist vfat  # CIS v2.0.0 1.1.1.8

@jepio
Copy link
Member

jepio commented Apr 4, 2024

And check that you don't also have an entry like this:

install squashfs /bin/true

@aqilbeig
Copy link
Author

aqilbeig commented Apr 4, 2024

install squashfs /bin/true

cpt-master-ethos11thrashor1-890 ~ # cat /etc/modprobe.d/squashfs.conf
install squashfs /bin/true

Do we have to remove it from here as well ^^

@aqilbeig
Copy link
Author

aqilbeig commented Apr 4, 2024

@jepio thanks a lot for quick replies..

@jepio
Copy link
Member

jepio commented Apr 5, 2024

install squashfs /bin/true

cpt-master-ethos11thrashor1-890 ~ # cat /etc/modprobe.d/squashfs.conf install squashfs /bin/true

Do we have to remove it from here as well ^^

Yes definitely. These modifications are directly responsible for the errors you are seeing. Also remove anything that says this:

install vfat /bin/true

May I ask why you have these config files?

@aqilbeig
Copy link
Author

aqilbeig commented Apr 5, 2024

This is because of the CIS standards we are following
CIS-1.1.1.6 Ensure mounting of squashfs filesystems is disabled

@jepio
Copy link
Member

jepio commented Apr 5, 2024

Can you share more? How could I validate myself what change this CIS standard is requesting? And are all of these changes manually applied by you or is some tool generating the configs?

Please be careful with this kind of hardening approach, there may be more things here that subtly break your system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
Status: 📝 Needs Triage
Development

No branches or pull requests

2 participants