Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infinite boot loop #9720

Open
smst329 opened this issue Nov 13, 2024 · 6 comments
Open

infinite boot loop #9720

smst329 opened this issue Nov 13, 2024 · 6 comments

Comments

@smst329
Copy link

smst329 commented Nov 13, 2024

Bug Report

Talos ISO just reboots infinitely forever and never stops.

#9702
^ In that bug report they kept saying I needed to wipe the disk/previous install.

Funny thing happened today, new hard drive came in the mail, and there is still an infinite boot loop. I didn't know hard drives came pre-installed with talos.

I'm just reporting the bug, in case it affects any potential or current customers.

Description

Logs

Environment

  • Talos version: ???
  • Kubernetes version: ???
  • Platform: ???
@smira
Copy link
Member

smira commented Nov 14, 2024

Without the logs, it's impossible to tell. If you have i915 by chance, it might be fixed by adding i915-ucode system extension. (This is going to be fixed in 1.9).

@erickuiper
Copy link

I just ran into this same issue again after destroying and recreating a cluster running on 1.8.3 while having the extension enabled on the node.

Will this be resolved in the mentioned 1.9 fix?

@smst329
Copy link
Author

smst329 commented Nov 15, 2024

If you have i915 maybe. If you don't then probably not. I am not sure they fully understand all the causes of their boot loops. I don't have an i915 so they're supposition is wrong again.

I'd like them to reconsider infinite boot loops as a strategy for responding to a problem. Like what conditions is a reboot changing where on 13th reboot things work again but they didn't on the 12th. Like does 12 reboots clear a previous install? Does 12 reboots cause a USB stick to fly out of the machine? Does 12 reboots fix the dhcp server?

They dont have to agree, but I think infinite boot loops are bad design. There are other kind of loops other than a boot loop. And they could even have a progressive backoff like the k8s crash loop backoff so its not a hot loop.

@rdenouden
Copy link

rdenouden commented Nov 15, 2024

The I915 drivers have some bad history of bootloops and crashes.

You can get into your machine again by adding i915.modeset=0 in the kernel parameters and it just runs fine for now.

@rdenouden
Copy link

At the moment of writing I can not add extensions to 1.8.3 machines. It was the same with the 1.8.2 upgrade for a while.
I added the i915.modeset=0 as extraKernelArgs and the intel NUC nodes with I915 video are now stable.

I have to say that this should be a stern warning not to jump on the latest version until it settles down. It's now in a short time I am evaluating talos with OMNI that we have seen such issues with the 1.8 releases. I love talos, but it's the release process which worries me a bit.

@smira
Copy link
Member

smira commented Nov 18, 2024

We plan to remove i915 driver out of base Talos in 1.9, so that it will use UEFI for the framebuffer (unless you want to add an extension). #9728

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants