Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Talos 1.8.0+ initial boot fails in phase meta (6/12) #9776

Open
smauermann opened this issue Nov 21, 2024 · 9 comments
Open

Talos 1.8.0+ initial boot fails in phase meta (6/12) #9776

smauermann opened this issue Nov 21, 2024 · 9 comments

Comments

@smauermann
Copy link

smauermann commented Nov 21, 2024

Bug Report

Hi team, I am failing to boot any Talos v1.8.0+ ISO from USB on an HP Elitedesk 800 g5 Mini (i5 9500T with vPro). The boot process fails in the meta phase at step 6/12.

I would really love to switch to Talos but I am at a loss right now on how to proceed. I would love any hints!

Description

Up until and including v1.7.7 I can boot just fine and apply the machine configs to have a function Kubernetes cluster. However, if I try to boot any Talos version greater than v1.8.0, the boot fails. Please see a screenshot of the failed boot process below.

I have tried various other images to verify nothing is wrong with the node: Debian, Proxmox, CoreOS, and of course different Talos version.

Besides the different versions, I have played with different extensions namely intel-ucode, i915-ucode, mei, utils-linux including all permutations of the selected extensions.

Interestingly, I was able to creat a cluster with 1.7.7 and upgrade to 1.8.0. Another upgrade to 1.8.3 (no extensions) failed, though. The screen just went black after the reboot and never came up. I am trying this now again with different extensions.

EDIT: I was able to upgrade from 1.7.7 without any extensions to 1.8.3 including the following extensions:

customization:
    systemExtensions:
        officialExtensions:
            - siderolabs/i915-ucode
            - siderolabs/intel-ucode
            - siderolabs/mei
            - siderolabs/util-linux-tools

Logs

"Screenshot" of the boot failure:
talos-boot-fail

Environment

  • Talos version: v1.8.3
  • Kubernetes version: na
  • Platform: metal on HP Elitedesk 800 g5 Mini (i5 9500T with vPro)
@smira
Copy link
Member

smira commented Nov 21, 2024

So there might be a mix of several issues here, with Talos 1.8 there's unfortunate side-effect for those having i915 - the i915-ucode should be included, otherwise the Linux kernel fails to boot (it will be fixed for 1.9+).

As for the error above in the screenshot, it is certainly a bug, but I don't understand how it ends up with way.

Does the disk contain any previous Talos install when booting from an ISO (USB)?

@smira
Copy link
Member

smira commented Nov 21, 2024

Oh yeah, I misread the picture. I guess you might have META partition somewhere on the disk.

Moreover, it might be related to incomplete wipe of the system disk. Please try to wipe the disks before installation.

@smauermann
Copy link
Author

Hi @smira, thanks for your swift reply! I did shred both internal disks before installing Talos and I performed a wipe via the disks machine config during the install of 1.7.7. I was pretty sure that I nuked everything before the installation. Is there any way of checking for the existence of META extraneous partitions?

@smauermann
Copy link
Author

Also, I'm happy to hear that the i915 issue will be fixed with the next minor version. Keep up the great work.

@smira
Copy link
Member

smira commented Nov 21, 2024

I don't see the logs, but I wonder if there's a message somewhere up from the VolumeManager controller about META partition being found (it shouldn't be).

@smauermann
Copy link
Author

I did not observe such a message, but then again the logs fly past pretty quickly and all I could capture is in the screenshot above 😄

@smira
Copy link
Member

smira commented Nov 21, 2024

One of the options is to record a video, it sometimes allows to see individual messages.

@smauermann
Copy link
Author

Would a talosctl reset get rid of any META partitions that could mess with any subsequent installs?

@eugene-marchanka
Copy link

I have a suspicion that I have similar issue with ASRock Motherboard Z690D4U-2L2T/G5
Solved by building and booting from image built with @smauermann parameters 👍🏻
Spent 2 days trying to get logs from SOL without success.
SOL is working fine to the point until Talos starts to boot 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants