Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to join workers with BGP Unnumbered RFC 5549 #2323

Closed
eknudtson opened this issue Oct 13, 2020 · 31 comments
Closed

Unable to join workers with BGP Unnumbered RFC 5549 #2323

eknudtson opened this issue Oct 13, 2020 · 31 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/network Categorizes an issue or PR as relevant to SIG Network.
Milestone

Comments

@eknudtson
Copy link

eknudtson commented Oct 13, 2020

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT

Versions

kubeadm version (use kubeadm version): 1.17.12

Environment:

  • Kubernetes version (use kubectl version): 1.17.12
  • Cloud provider or hardware configuration: baremetal
  • OS (e.g. from /etc/os-release): Centos 7
  • Kernel (e.g. uname -a): 3.10.0-1127.19.1.el7.x86_64
  • Others: Calico 3.13.3

What happened?

Related to the following:
#1156
kubernetes/kubernetes#83475

When I attempt to kubeadm join a worker with routing to the host (unicast IPv4 on the loopback, IPv6 link locals on interfaces), kubeadm fails to join workers with the following output:

$ kubeadm join --config=/etc/kubernetes/kubeadm-join.yaml
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: unable to select an IP from default routes.

verbose error:

[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
I1009 00:03:38.372308    8018 interface.go:400] Looking for default routes with IPv4 addresses
I1009 00:03:38.372350    8018 interface.go:405] Default route transits interface "ens15f0.300"
I1009 00:03:38.372793    8018 interface.go:208] Interface ens15f0.300 is up
I1009 00:03:38.372911    8018 interface.go:256] Interface "ens15f0.300" has 1 addresses :[fe80::a236:9fff:fe80:8258/64].
I1009 00:03:38.372959    8018 interface.go:223] Checking addr  fe80::a236:9fff:fe80:8258/64.
I1009 00:03:38.372984    8018 interface.go:236] fe80::a236:9fff:fe80:8258 is not an IPv4 address
I1009 00:03:38.373017    8018 interface.go:400] Looking for default routes with IPv6 addresses
I1009 00:03:38.373034    8018 interface.go:405] Default route transits interface "ens15f1.300"
I1009 00:03:38.373375    8018 interface.go:208] Interface ens15f1.300 is up
I1009 00:03:38.373465    8018 interface.go:256] Interface "ens15f1.300" has 1 addresses :[fe80::a236:9fff:fe80:825a/64].
I1009 00:03:38.373502    8018 interface.go:223] Checking addr  fe80::a236:9fff:fe80:825a/64.
I1009 00:03:38.373523    8018 interface.go:233] Non-global unicast address found fe80::a236:9fff:fe80:825a
I1009 00:03:38.373542    8018 interface.go:405] Default route transits interface "ens15f0.300"
I1009 00:03:38.374553    8018 interface.go:208] Interface ens15f0.300 is up
I1009 00:03:38.374644    8018 interface.go:256] Interface "ens15f0.300" has 1 addresses :[fe80::a236:9fff:fe80:8258/64].
I1009 00:03:38.374696    8018 interface.go:223] Checking addr  fe80::a236:9fff:fe80:8258/64.
I1009 00:03:38.374720    8018 interface.go:233] Non-global unicast address found fe80::a236:9fff:fe80:8258
I1009 00:03:38.374741    8018 interface.go:416] No active IP found by looking at default routes
unable to select an IP from default routes.
unable to fetch the kubeadm-config ConfigMap
k8s.io/kubernetes/cmd/kubeadm/app/cmd.fetchInitConfiguration
	/workspace/anago-v1.17.12-rc.0.60+02c8616ca83844/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:526
k8s.io/kubernetes/cmd/kubeadm/app/cmd.fetchInitConfigurationFromJoinConfiguration
	/workspace/anago-v1.17.12-rc.0.60+02c8616ca83844/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:494
k8s.io/kubernetes/cmd/kubeadm/app/cmd.(*joinData).InitCfg
	/workspace/anago-v1.17.12-rc.0.60+02c8616ca83844/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:456
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/join.runPreflight
	/workspace/anago-v1.17.12-rc.0.60+02c8616ca83844/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/join/preflight.go:95
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
	/workspace/anago-v1.17.12-rc.0.60+02c8616ca83844/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:234
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
	/workspace/anago-v1.17.12-rc.0.60+02c8616ca83844/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:422
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
	/workspace/anago-v1.17.12-rc.0.60+02c8616ca83844/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.NewCmdJoin.func1
	/workspace/anago-v1.17.12-rc.0.60+02c8616ca83844/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:170
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
	/workspace/anago-v1.17.12-rc.0.60+02c8616ca83844/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:826
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
	/workspace/anago-v1.17.12-rc.0.60+02c8616ca83844/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:914
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
	/workspace/anago-v1.17.12-rc.0.60+02c8616ca83844/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:864
k8s.io/kubernetes/cmd/kubeadm/app.Run
	/workspace/anago-v1.17.12-rc.0.60+02c8616ca83844/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
	_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
	/usr/local/go/src/runtime/proc.go:203
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1357
error execution phase preflight
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
	/workspace/anago-v1.17.12-rc.0.60+02c8616ca83844/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:235
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
	/workspace/anago-v1.17.12-rc.0.60+02c8616ca83844/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:422
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
	/workspace/anago-v1.17.12-rc.0.60+02c8616ca83844/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.NewCmdJoin.func1
	/workspace/anago-v1.17.12-rc.0.60+02c8616ca83844/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:170
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
	/workspace/anago-v1.17.12-rc.0.60+02c8616ca83844/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:826
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
	/workspace/anago-v1.17.12-rc.0.60+02c8616ca83844/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:914
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
	/workspace/anago-v1.17.12-rc.0.60+02c8616ca83844/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:864
k8s.io/kubernetes/cmd/kubeadm/app.Run
	/workspace/anago-v1.17.12-rc.0.60+02c8616ca83844/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
	_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
	/usr/local/go/src/runtime/proc.go:203
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1357

What you expected to happen?

It looks like the fix introduced in this PR only works for control plane nodes. I'd like to be able to join workers with the node's unicast address being on the loopback interface.

How to reproduce it (as minimally and precisely as possible)?

On a node with BGP Unnumbered via RFC 5549 + a loopback unicast, run kubeadm join --config=/etc/kubernetes/kubeadm-join.yaml.

kubeadm-join.yaml:

apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
discovery:
  file:
    kubeConfigPath: /etc/kubernetes/discovery.conf
  tlsBootstrapToken: 'boostrap-token-here'
nodeRegistration:
  name: worker-1.cluster.example.com
  kubeletExtraArgs:
    node_ip: 10.x.x.x

discovery.conf:

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: b64-encoded-ca-cert-here
    server: https://api.cluster.example.com
  name: ""
contexts: []
current-context: ""
kind: Config
preferences: {}
users: []

Anything else we need to know?

@neolit123
Copy link
Member

thanks for moving this ticket here and sorry for the delay.

BGP Unnumbered RFC 5549

i should point out that i'm not familiar with " BGP Unnumbered" and with big portions of RFC 5549.

having an overview here, i think this is a bug that should be fixed for 1.20, yet unclear whether we can backport it to older releases. probably not, due to the fact that the bug fix will introduce a change in behavior. to be discussed...

one overall problem here is that "kubeadm join" for a worker node, should not really fetch the "InitConfiguration" from the cluster, or at least i don't see a reason why it should. for CP nodes this is needed.
it then proceeds to apply dynamic defaults based on the apimachinery/interface code to find a public IP from the list of available interfaces on the node with the purpose to default the api server advertise address and it fails. this is truly not required for worker nodes.

@neolit123 neolit123 added the kind/bug Categorizes issue or PR as related to a bug. label Oct 14, 2020
@neolit123 neolit123 added this to the v1.20 milestone Oct 14, 2020
@neolit123 neolit123 added priority/backlog Higher priority than priority/awaiting-more-evidence. sig/network Categorizes an issue or PR as relevant to SIG Network. labels Oct 14, 2020
@neolit123
Copy link
Member

neolit123 commented Oct 14, 2020

Q: in case you trying a HA setup (more than one CP nodes) are you not seeing this problem, because you are being explicit about the advertiseAddress on such CP nodes via JoinConfiguration.controlPlane?

@neolit123
Copy link
Member

and another Q: can you use kubeadm join --skip-phases=preflight ... to workaround the problem?

@eknudtson
Copy link
Author

Thanks for the help here!

We do have a HA setup, I'm actually unable to join controllers as well. I'm wondering if the fix mentioned in the PR I linked at the top was only for kubeadm init:

$ /usr/bin/kubeadm join --config=/etc/kubernetes/kubeadm-join.yaml --skip-phases=preflight
W1014 16:41:48.010041    5755 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error execution phase control-plane-prepare/download-certs: unable to fetch the kubeadm-config ConfigMap: unable to select an IP from default routes.
To see the stack trace of this error execute with --v=5 or higher

Here's the kubeadm-join config on a controller:

apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
discovery:
  file:
    kubeConfigPath: /etc/kubernetes/discovery.conf
  tlsBootstrapToken: 'token here'
nodeRegistration:
  name: controller-0.cluster.example.com
  kubeletExtraArgs:
    node_ip: 10.x.x.x
controlPlane:
  localAPIEndpoint:
    advertiseAddress: 10.x.x.x
    bindPort: 6443
  certificateKey: 'cert-key-here'

Here's a worker join w/ --skip-phases=preflight:

$ /usr/bin/kubeadm join --skip-phases=preflight --config=/etc/kubernetes/kubeadm-join.yaml
W1014 16:21:48.437511    4048 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error execution phase kubelet-start: unable to fetch the kubeadm-config ConfigMap: unable to select an IP from default routes.
To see the stack trace of this error execute with --v=5 or higher

We're currently manually patching + building kubeadm with this in order to use kubeadm: https://patch-diff.githubusercontent.com/raw/kubernetes/kubernetes/pull/69578.diff

@randomvariable
Copy link
Member

@eknudtson this looks like a problem in API Machinery in terms of how the transport library selects the outbound interface for connecting to the API server. We should move this issue to k/k and mark it for SIG API Machinery for resolution.

Checking for unicast IP is acceptable for IPv4, but we should allow link local in the case of IPv6.

@eknudtson
Copy link
Author

FWIW, here's an example of our controller network interface configs:

$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet 10.10.10.10/32 brd 10.10.10.10 scope global lo:10
       valid_lft forever preferred_lft forever
    inet 10.10.10.40/32 brd 10.10.10.40 scope global lo:20
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp3s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
4: enp3s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether xx:xx:xx:xx:xx:yy brd ff:ff:ff:ff:ff:ff
6: enp3s0f0.300@enp3s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet6 fe80::xxxx:xxxx:xxxx:xxxx/64 scope link
       valid_lft forever preferred_lft forever
7: enp3s0f1.300@enp3s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet6 fe80::xxxx:xxxx:xxxx:xxxx/64 scope link
       valid_lft forever preferred_lft forever

Where we have BGP peering running off of the .300 VLAN subinterfaces. The 10.10.10.40 address is an anycast IP that functions as the control plane endpoint IP.

The setup there is anycast IP -> local haproxy running on each controller -> load balancing between controller IPs.

Kubeadm should select the first IPv4 on the loopback adapter, even better if we can just tell it which IP to use.

@eknudtson
Copy link
Author

To be sure: IPv6 in this case is only for peering with the connected routers automatically w/ BGP Unnumbered. RFC 5549 then kicks in and we advertise and receive IPv4 routes via the peering.

@neolit123
Copy link
Member

neolit123 commented Oct 14, 2020

Here's a worker join w/ --skip-phases=preflight:

$ /usr/bin/kubeadm join --skip-phases=preflight --config=/etc/kubernetes/kubeadm-join.yaml
W1014 16:21:48.437511 4048 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error execution phase kubelet-start: unable to fetch the kubeadm-config ConfigMap: unable to select an IP from default routes.
To see the stack trace of this error execute with --v=5 or higher

hm, are you sure this --config is not a control-plane config? the warning above indicates this is try to join a CP node.
does skipping the preflight work for workers?

@neolit123
Copy link
Member

neolit123 commented Oct 14, 2020

we had a discussion about this in the kubeadm office hours meeting today.

so there are a couple of issues here:

  1. kubeadm should stop fetching the configuration from the cluster in case of worker join
  2. the apimachinery helper functions for determining IPs fails:
I1009 00:03:38.372308    8018 interface.go:400] Looking for default routes with IPv4 addresses
I1009 00:03:38.372350    8018 interface.go:405] Default route transits interface "ens15f0.300"
I1009 00:03:38.372793    8018 interface.go:208] Interface ens15f0.300 is up
...

even if we make kubeadm tolerate your setup, @randomvariable had concerns that the rest of k8s will fail because they use the same utilities. so possibly we'd have to patch that code too (but that's not a kubeadm issue, per-se).

@randomvariable
Copy link
Member

randomvariable commented Oct 14, 2020

even if we make kubeadm tolerate your setup, @randomvariable had concerns that the rest of k8s will fail because they use the same utilities.

neolit123 pipped me to the post, but going by

To be sure: IPv6 in this case is only for peering with the connected routers automatically w/ BGP Unnumbered. RFC 5549 then kicks in and we advertise and receive IPv4 routes via the peering

So, if I understand it correctly, the IPv4 routes back to the API server migrate to the IPv6 interfaces? That would explain why kubelet actually works. I admit I don't understand the RFC in detail.

@neolit123
Copy link
Member

neolit123 commented Oct 14, 2020

cc @aojea do you happen to know if this use case is supported (k8s-wide, see OP)?

@eknudtson
Copy link
Author

eknudtson commented Oct 14, 2020

So, if I understand it correctly, the IPv4 routes back to the API server migrate to the IPv6 interfaces? That would explain why kubelet actually works. I admit I don't understand the RFC in detail.

I'll try and clarify as best I can:

On a machine, its global unicast IPv4 addresses are present on the loopback interface as /32's.

Each interface that peers with the upstream router has IPv6 running with Router Advertisements and neighbor discovery enabled. Thus, each end of the link knows the fe80 (EUI64 autogenerated) link local address of the other end, and the MAC of the other end.

In FRR (the routing suite we use), you can define BGP peers by their interface instead of by IP address (v4 or v6). This allows peering relationships to form without assigning IP addresses, as FRR will just use the IPv6 link locals present on each interface, knowing that the other end of the link is known as well by router advertisements + ND.

Once these IPv6 BGP peering relationships are formed, you can then exchange routes between peers. IPv4 routes are exchanged and programmed in using RFC5549.

If a host receives an IPv4 route, since the machine already knows the MAC of the nexthop (learned over IPv6 neighbor discovery), FRR simply programs in a dummy entry into the ARP table:
169.254.0.1 dev eth0 lladdr xx:xx:xx:xx:xx:xx PERMANENT
169.254.0.1 dev. eth1 lladdr xx:xx:xx:yy:yy:yy PERMANENT

Routes for 169.254.0.1 are added to the routing table by FRR for each route learned from a peer:

$ ip ro sho
default proto 186 metric 20
	nexthop via 169.254.0.1 dev eth0 weight 1 onlink
	nexthop via 169.254.0.1 dev eth1 weight 1 onlink

When a packet leaves the host via IPv4 along those default routes above (learned on each interface from the peers on the other end) it picks a path, consults the ARP table for 169.254.0.1 on the egress interface, finds that the MAC on the other end is already known (allowing us to skip an ARP lookup that would never succeed), and the packet is forwarded on its way.

Essentially, it allows us to learn the routes over IPv6 and then create fake ARP entries and routes that let us send the packet to the correct place with IPv4.

The IPv4s on the loopback are used as the source address for these packets, default is to pick the first one present I believe.

Let me know if I can clarify or add anything else!

@eknudtson
Copy link
Author

hm, are you sure this --config is not a control-plane config? the warning above indicates this is try to join a CP node.
does skipping the preflight work for workers?

I don't have any control plane config present in /etc/kubernetes/kubeadm-join.yaml :

apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
discovery:
  file:
    kubeConfigPath: /etc/kubernetes/discovery.conf
  tlsBootstrapToken: 'token_here'
nodeRegistration:
  name: worker-1.cluster.example.com
  kubeletExtraArgs:
    node_ip: 10.10.10.10

@randomvariable
Copy link
Member

Ah, ok, found the caller path finally:
((FetchInitConfigurationFromCluster -> k8s.io/cmd/kubeadm/app/util/config/SetInitDynamicDefaults) or (LoadOrDefaultJoinConfiguration -> DefaultedJoinConfiguration/documentMapToJoinConfiguration -> SetJoinDynamicDefaults -> k8s.io/cmd/kubeadm/app/util/config/SetJoinControlPlaneDefaults)) ->
k8s.io/cmd/kubeadm/app/util/config/SetAPIEndpointDynamicDefaults ->
k8s.io/cmd/kubeadm/app/util/config/ChooseAPIServerBindAddress ->
k8s.io/apimachinery/pkg/util/net.ResolveBindAddress ->
k8s.io/apimachinery/pkg/util/net.chooseHostInterface

Thankfully, it's not the client code causing this.

So, if we remove the download of the init config, then that should fix worker node joins at least.

@eknudtson
Copy link
Author

eknudtson commented Oct 14, 2020

FYI I'm also seeing errors on

/usr/bin/kubeadm config images pull
unable to select an IP from default routes.

@eknudtson
Copy link
Author

I watched the office hours video on this, and it would be great to be able to tell kubeadm which egress IPv4 to use if we know autodetection will fail.

We have several clusters running with BGP unnumbered, and we currently patch kubeadm with https://patch-diff.githubusercontent.com/raw/kubernetes/kubernetes/pull/69578.diff to make upgrading and joining work. Outside of that, everything works well since we're able to explicitly tell other components (kubelet, apiserver) what IPv4 they should bind to.

@neolit123
Copy link
Member

neolit123 commented Oct 14, 2020

@randomvariable

Thankfully, it's not the client code causing this.

yes, during the meeting i was trying to explain that this is code is used in multiple places in k8s.
in kubeadm for picking a node IP, but also in the kube-apiserver for picking the bind address:
https://github.com/kubernetes/kubernetes/blob/9488fbef64afd580d053347e57819817e51ac0f9/pkg/kubeapiserver/options/serving.go#L63

but if you pass an explicit IP it should work, and local network clusters it something we should support..

@eknudtson

/usr/bin/kubeadm config images pull

ok, as i suspected the problem in kubeadm is deeper. during any command, the process constructs a configuration object that is passed around for common values (such as "imageRepository" for "images pull"). this object is defaulted with dynamic values from the node via the problematic SetInitDynamicDefaults function. however it does not make sense to call these defaults at all for some commands including "images pull" or "kubeadm join" (for workers).

@neolit123
Copy link
Member

neolit123 commented Oct 14, 2020

similar problem to:
#2039 (comment)

where dynamic defaults break flag overrides of --cri-socket over the value in a config.

@fabriziopandini i think it's time to make the dynamic defaults apply only on demand on not by default during config fetch from cluster, config load from disk or even for commands like config images pull

maybe i can find time for this in 1.20.

@aojea
Copy link
Member

aojea commented Oct 14, 2020

I'm trying to catch up, so forgive me if I miss something

cc @aojea do you happen to know if this use case is supported (k8s-wide, see OP)?

why we should modify the kubernetes code to select the default route if BGP can install the routes?
I think that can use the loopback addresses as advertise address for the apiserver
ok, got it so the problem is that the worker need to be able to select the loopback address

@neolit123
Copy link
Member

why we should modify the kubernetes code to select the default route if BGP can install the routes?

i had to Google BGP when we logged this ticket so i'm really not familiar with this.

@neolit123
Copy link
Member

neolit123 commented Oct 14, 2020

yeah, I am, so you use global address on the interface because those are always reachable.
If you use the address of one interface, it is going to depend on the interface status, if it is down, and the other is up, you don' t have connectivity.
Will take a look tomorrow, is a bit late now

thanks

the apimachinery logic that fails in this case is located here:
https://github.com/kubernetes/apimachinery/blob/master/pkg/util/net/interface.go#L425

@eknudtson showed example output above.

so today during the kubeadm office hours we discussed the potential to open a ticket in k/k and ask apimachinery (in particular @sttts) whether a change in that code will be acceptable..

@aojea
Copy link
Member

aojea commented Oct 14, 2020

yeah, I am, so you use global address on the interface because those are always reachable.
If you use the address of one interface, it is going to depend on the interface status, if it is down, and the other is up, you don' t have connectivity despite there is one path up
Will take a look tomorrow, is a bit late now
/assign

@fabriziopandini
Copy link
Member

fabriziopandini commented Oct 14, 2020

Yes, as said today there are two problems at stake

  1. autodetecting the IP address.
  2. the overall config fetch & defaults management.

For 1, @eknudtson provided a possible fix, and I will ask people to have eyes on IT (this is a tricky piece of code, so more eyes, the better).

For 2, my suggestion is to avoid to rush a solution, and possibly track the problem in a separate issue
My initial reaction is that we should avoid changing the whole chain of fetch / defaulting because of the possible blast radio, instead I will explore if we can have a better distinction between join control plane (which requires init configuration) and join worker (which does not init configuration, but currently is reading it no matter of)

@neolit123
Copy link
Member

neolit123 commented Oct 14, 2020

For 1, @eknudtson provided a possible fix, and I will ask people to have eyes on IT (this is a tricky piece of code, so more eyes, the better).

ok, i see it was linked here:

We have several clusters running with BGP unnumbered, and we currently patch kubeadm with https://patch-diff.githubusercontent.com/raw/kubernetes/kubernetes/pull/69578.diff to make upgrading and joining work

For 2, my suggestion is to avoid to rush a solution, and possibly track the problem in a separate issue
My initial reaction is that we should avoid changing the whole chain of fetch / defaulting because of the possible blast radio, instead I will explore if we can have a better distinction between join control plane (which requires init configuration) and join worker (which does not init configuration, but currently is reading it no matter of)

it's a bit of a mess, and sadly the problem is also present in commands such as kubeadm config image... where we also apply dynamic defaults. in general i'm +1 to a smallest possible refactor to fix this, but something tells me that it's not possible (for this refactor to be small)

@aojea
Copy link
Member

aojea commented Oct 15, 2020

Sorry, I didn't meant to assign to me, just to watch the issue
/unassign
/cc

  1. autodetecting the IP address.

why kubeadm join from a worker has to to autodetect the IP address?

Can you point me to who is calling to ResolveBindAddress()?
I was able to narrow it down to here
https://github.com/kubernetes/kubernetes/blob/e1fd2d7ff57af153023347d72d17226effd917c8/cmd/kubeadm/app/util/config/cluster.go#L64
but I'm stuck there

@neolit123
Copy link
Member

neolit123 commented Oct 15, 2020 via email

@neolit123
Copy link
Member

neolit123 commented Oct 15, 2020 via email

@eknudtson
Copy link
Author

I just wanted to point out that BGP Unnumbered + RFC5549 do appear to be explicitly supported via kubernetes/kubernetes#83475

It appears the other functions in kubeadm weren't also fixed to work with it.

@neolit123
Copy link
Member

neolit123 commented Oct 16, 2020

ResolveBindAddress is also used by the kube-apiserver - if one does not pass an explicit value to its --bind-address flag, so presumably it will fail there too. i'd like to see what the owners of kube-apiserver think about the change that you did here:
https://patch-diff.githubusercontent.com/raw/kubernetes/kubernetes/pull/69578.diff

so potentially:

  • we can log a separate issue (again) in kubernetes/kubernetes about ResolveBindAddress and tag it with /sig api-machinery proposing the change. you can create this one and explain the use case it good detail for the SIG API Machinery maintainers.
  • log a new ticket in kubernetes/kubeadm about the dynamic default / don't download configuration for workers problem. i will log this ticket now.
  • close this issue.

@neolit123
Copy link
Member

neolit123 commented Oct 16, 2020

we can log a separate issue (again) in kubernetes/kubernetes about ResolveBindAddress and tag it with /sig api-machinery proposing the change. you can create this one and explain the use case it good detail for the SIG API Machinery maintainers.

also please ping me and @aojea on that ticket.

showing the full kubeadm output and the kubeadm use case is not directly relevant, only this part:

I1009 00:03:38.372308    8018 interface.go:400] Looking for default routes with IPv4 addresses
I1009 00:03:38.372350    8018 interface.go:405] Default route transits interface "ens15f0.300"
I1009 00:03:38.372793    8018 interface.go:208] Interface ens15f0.300 is up
I1009 00:03:38.372911    8018 interface.go:256] Interface "ens15f0.300" has 1 addresses :[fe80::a236:9fff:fe80:8258/64].
I1009 00:03:38.372959    8018 interface.go:223] Checking addr  fe80::a236:9fff:fe80:8258/64.
I1009 00:03:38.372984    8018 interface.go:236] fe80::a236:9fff:fe80:8258 is not an IPv4 address
I1009 00:03:38.373017    8018 interface.go:400] Looking for default routes with IPv6 addresses
I1009 00:03:38.373034    8018 interface.go:405] Default route transits interface "ens15f1.300"
I1009 00:03:38.373375    8018 interface.go:208] Interface ens15f1.300 is up
I1009 00:03:38.373465    8018 interface.go:256] Interface "ens15f1.300" has 1 addresses :[fe80::a236:9fff:fe80:825a/64].
I1009 00:03:38.373502    8018 interface.go:223] Checking addr  fe80::a236:9fff:fe80:825a/64.
I1009 00:03:38.373523    8018 interface.go:233] Non-global unicast address found fe80::a236:9fff:fe80:825a
I1009 00:03:38.373542    8018 interface.go:405] Default route transits interface "ens15f0.300"
I1009 00:03:38.374553    8018 interface.go:208] Interface ens15f0.300 is up
I1009 00:03:38.374644    8018 interface.go:256] Interface "ens15f0.300" has 1 addresses :[fe80::a236:9fff:fe80:8258/64].
I1009 00:03:38.374696    8018 interface.go:223] Checking addr  fe80::a236:9fff:fe80:8258/64.
I1009 00:03:38.374720    8018 interface.go:233] Non-global unicast address found fe80::a236:9fff:fe80:8258
I1009 00:03:38.374741    8018 interface.go:416] No active IP found by looking at default routes
...

your interface setup and the proposed DIFF are relevant, i guess.

log a new ticket in kubernetes/kubeadm about the dynamic default / don't download configuration for workers problem. i will log this ticket now.

this is now here:

#2328

close this issue.

/close

and sorry for the shuffle of issues, but your report surfaced a number of problems in old code.
thanks.

@k8s-ci-robot
Copy link
Contributor

@neolit123: Closing this issue.

In response to this:

we can log a separate issue (again) in kubernetes/kubernetes about ResolveBindAddress and tag it with /sig api-machinery proposing the change. you can create this one and explain the use case it good detail for the SIG API Machinery maintainers.

also please ping me and @aojea on that ticket.

showing the full kubeadm output and the kubeadm use case is not directly relevant, only this part:

I1009 00:03:38.372308    8018 interface.go:400] Looking for default routes with IPv4 addresses
I1009 00:03:38.372350    8018 interface.go:405] Default route transits interface "ens15f0.300"
...

your interface setup and the proposed DIFF too, i guess.

log a new ticket in kubernetes/kubeadm about the dynamic default / don't download configuration for workers problem. i will log this ticket now.

this is now here:

#2328

close this issue.

/close

and sorry for the shuffle of issues, but your report surfaced a number of old problems.
thanks.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests

6 participants