Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rasberry Pi loses IPv6 connectivity after DHCP lease elapses #9697

Open
TobiasChen opened this issue Nov 11, 2024 · 6 comments
Open

Rasberry Pi loses IPv6 connectivity after DHCP lease elapses #9697

TobiasChen opened this issue Nov 11, 2024 · 6 comments

Comments

@TobiasChen
Copy link

TobiasChen commented Nov 11, 2024

Bug Report

Description

A Raspi lets IPv6 connectivy laps, after the valid_lft time expires:

After a rebooting the node, a new IPv6 connection is established, talos shows both ipv4 and ipv6 ips under
talosctl get nodeip
and
ip addr using a pod on the hostnetwork
outputs the following:
`
4: enxe45f01a83c47: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether e4:5f:01:a8:3c:47 brd ff:ff:ff:ff:ff:ff

inet 10.0.0.3/22 brd 10.0.3.255 scope global enxe45f01a83c47
   valid_lft forever preferred_lft forever
inet6 fd20:2:2:2:e65f:1ff:fea8:3c47/64 scope global dynamic mngtmpaddr
   valid_lft 4009sec preferred_lft 409sec
inet6 2a02:8071:8281:fa0:e65f:1ff:fea8:3c47/64 scope global dynamic mngtmpaddr
   valid_lft 4009sec preferred_lft 409sec
inet6 fe80::e65f:1ff:fea8:3c47/64 scope link
   valid_lft forever preferred_lft forever

`

After the valid_lft time expires, kubelet is restartet and ip addr displays the following:

`
4: enxe45f01a83c47: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether e4:5f:01:a8:3c:47 brd ff:ff:ff:ff:ff:ff

inet 10.0.0.3/22 brd 10.0.3.255 scope global enxe45f01a83c47
   valid_lft forever preferred_lft forever
inet6 fe80::e65f:1ff:fea8:3c47/64 scope link
   valid_lft forever preferred_lft forever

`
Afterwards, the node is not reachable via ipv6, and pings work neither from nor to the node.

Other devices on the same network do not have this issue with IPv6 connectivity and I am a bit lost how to further debug this issue.
I found some posts suggesting issues with crashing DHCP services on Raspis if a large ammount of interfaces were present, but I would expect that to come up in the logs somwhere.

Logs

hestia.log

Environment

  • Talos version: 1.8.2
  • Platform: RB4b
  • Image: factory.talos.dev/installer/acd41da8ebb38e89794309e662d151b1bb5f3ddce6153b54cd00fd5b46c99582:v1.8.2
@TobiasChen
Copy link
Author

TobiasChen commented Nov 11, 2024

Network Config:

network:
    hostname: HESTIA
    interfaces:
            - deviceSelector:
                busPath: "fd580000.ethernet"
              dhcp: true
              # addresses:
              #   - fd20:2:2:2:e65f:1ff:fea8:3c48
              dhcpOptions:
                ipv6: true

support.zip:
support.zip

@smira
Copy link
Member

smira commented Nov 11, 2024

In the log you provided the address is still assigned, so it's not possible to understand why it gets lost.

There seems to be an issue with Talos try to assign the address many times, not sure what removes it in your setup (?).

@TobiasChen
Copy link
Author

TobiasChen commented Nov 11, 2024

Ah Im verry sorry, I'll create a new support.zip once the node has timed out again. I honestly have no idea what would lead Talos to assign the address over and over again.

My setup is a home router(Fritzbox 6660), and a shared network for all the devices.
All the devices get a unique local address(The fd20:2:2:2::/64) and the fritz box is assigning IPv6 addresses in the home network (2a02:8071:8281:fa0::/64) via DHCP6. I also tried to only assign the DNS server via DHCPv6 and let devices choose their own IPv6 addresses, however, the effect was the same(Ipv6 address getting assigned once, and then not getting renewed after the lease elapses).

The only peculiar thing about my setup is a pihole DNS server, which I run inside the cluster and announce from the fritzbox via DHCP and DHCPv6. This leads to DNS resolution not working until the cluster is available after a restart, and might also be responsible for some of the error serving DNS request messages early on in the log. I dont think this should impact the IP address assignment, though.

@TobiasChen
Copy link
Author

Here is the correct ZIP, sorry again for wasting your time
support.zip

@smira
Copy link
Member

smira commented Nov 12, 2024

Thank you, I can see the address being removed right now, even though it wasn't directly a Talos Linux action to remove it. (While it still might be a bug in Talos in the way it assigns the address).

I would like to look into IPv6 Talos support a bit more to cover all cases, but that might come only next year for Talos 1.10 release.

@TobiasChen
Copy link
Author

Hmm Interesting, thank you for all the effort with this project, i'll eagerly await talos 1.10.
If I can provide any more details/tests feel free to contact me anytime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants