Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nocloud deployment panics controller-runtime due to inability to parse network-config #9578

Closed
SeanWallace opened this issue Oct 27, 2024 · 0 comments · Fixed by #9588
Closed
Assignees

Comments

@SeanWallace
Copy link

Bug Report

When attempting to deploy a nocloud cluster (in my case on a Harvester cluster) the network platform config controller will continually panic attempting to read the network-config from a cidata partition. This is despite the partition containing the correct file.

Description

I have not had the opportunity to recompile Talos to enable more debugging but by all appearances #9351 has not only added the requirement for cidata/network-config but it also doesn't seem to be able to find it even when it is there...at least in my case. I have mounted my cloud-init partition onto a different linux host and verified the necessary files are there and legal (Ubuntu is happy to parse and use it for example):

root@test:~# mount /dev/vdb /mnt
mount: /mnt: WARNING: source write-protected, mounted read-only.
root@test:~# cd /mnt
root@test:/mnt# ls
meta-data  network-config  user-data
root@test:/mnt# cat network-config
network:
  version: 2
  ethernets:
    eth0:
      dhcp4: false

In truth I actually have no interest in using the network-config part of cloud-init, I only want to use the user-data that contains my network config in my machine template.

I should also note this doesn't seem to prevent the cluster from becoming usable, after bootstrapping everything works perfectly fine, but the logs are spammed with this failure.

Logs

 user: warning: [2024-10-27T02:11:58.936377322Z]: [talos] waiting for devices to be ready...
 user: warning: [2024-10-27T02:11:58.979707322Z]: [talos] found config disk (cidata) at /dev/vdb
 kern:   debug: [2024-10-27T02:11:58.981563322Z]: ISO 9660 Extensions: Microsoft Joliet Level 3
 kern:   debug: [2024-10-27T02:11:58.982009322Z]: ISO 9660 Extensions: RRIP_1991A
 user: warning: [2024-10-27T02:11:58.982071322Z]: [talos] fetching meta config from: cidata/meta-data
 user: warning: [2024-10-27T02:11:58.983794322Z]: [talos] fetching network config from: cidata/network-config
 user: warning: [2024-10-27T02:11:58.985538322Z]: [talos] failed to read network-config
 user: warning: [2024-10-27T02:11:58.986687322Z]: [talos] fetching machine config from: cidata/user-data
 kern:    info: [2024-10-27T02:11:58.989573322Z]: init[2139]: segfault at 0 ip 0000000000f36d8a sp 000000c000b67c18 error 4 in init[400000+2837000] likely on CPU 3 (core 3, socket 0)
 kern:    info: [2024-10-27T02:11:58.992675322Z]: Code: 0f 10 44 24 70 41 0f 11 40 10 48 89 f0 48 8b 8c 24 d0 00 00 00 48 8b 54 24 40 48 8b 9c 24 80 00 00 00 4c 8b 84 24 b8 00 00 00 <4d> 8b 08 49 83 f9 01 0f 84 a2 00 00 00 49 83 f9 02 75 42 4c 89 c3
 user: warning: [2024-10-27T02:11:58.996799322Z]: [talos] platform panicked {"component": "controller-runtime", "controller": "network.PlatformConfigController", "stack": "github.com/siderolabs/talos/internal/app/machined/pkg/controllers/network.(*PlatformConfigController).runWithPanicHandler.func1\n\t/src/internal/app/machined/pkg/controllers/network/platform_config.go:564\nruntime.gopanic\n\t/toolchain/go/src/runtime/panic.go:770\nruntime.panicmem\n\t/toolchain/go/src/runtime/panic.go:261\nruntime.sigpanic\n\t/toolchain/go/src/runtime/signal_unix.go:881\ngithub.com/siderolabs/talos/internal/app/machined/pkg/runtime/v1alpha1/platform/nocloud(*Nocloud).ParseMetadata\n\t/src/internal/app/machined/pkg/runtime/v1alpha1/platform/nocloud/nocloud.go:54\ngithub.com/siderolabs/talos/internal/app/machined/pkg/runtime/v1alpha1/platform/nocloud(*Nocloud).NetworkConfiguration\n\t/src/internal/app/machined/pkg/runtime/v1alpha1/platform/nocloud/nocloud.go:136\ngithub.com/siderolabs/talos/internal/app/machined/pkg/contr...
 user: warning: [2024-10-27T02:11:59.012959322Z]: [talos] restarting platform network config {"component": "controller-runtime", "controller": "network.PlatformConfigController", "interval": "1m0.8004681s", "error": "panic: runtime error: invalid memory address or nil pointer dereference"}
 user: warning: [2024-10-27T02:11:59.912971322Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: an error on the server (\"Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)\") has prevented the request from succeeding"}

Environment

  • Talos version: v1.8.1
  • Kubernetes version: v1.30.5
  • Platform: Harvester
@smira smira self-assigned this Oct 28, 2024
smira added a commit to smira/talos that referenced this issue Oct 28, 2024
The bug was logical: first the check was done for one of the values to
be non-nil, and after that one of the values was assumed to be non-nil,
while it could have been nil.

While fixing that, linter figured out that raw metadata config is never
needed outside of `acquireConfig`, so this got dropped as well,
simplifying the code even more.

Fixes siderolabs#9578

Signed-off-by: Andrey Smirnov <[email protected]>
smira added a commit to smira/talos that referenced this issue Oct 28, 2024
The bug was logical: first the check was done for one of the values to
be non-nil, and after that one of the values was assumed to be non-nil,
while it could have been nil.

While fixing that, linter figured out that raw metadata config is never
needed outside of `acquireConfig`, so this got dropped as well,
simplifying the code even more.

Fixes siderolabs#9578

Signed-off-by: Andrey Smirnov <[email protected]>
smira added a commit to smira/talos that referenced this issue Nov 13, 2024
The bug was logical: first the check was done for one of the values to
be non-nil, and after that one of the values was assumed to be non-nil,
while it could have been nil.

While fixing that, linter figured out that raw metadata config is never
needed outside of `acquireConfig`, so this got dropped as well,
simplifying the code even more.

Fixes siderolabs#9578

Signed-off-by: Andrey Smirnov <[email protected]>
(cherry picked from commit 3a0a17a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants