[BUG] Kubectl client outside of HA/multi-master Epiphany cluster fails to connect to server with invalid certificate #1520

ks4225 · 2020-08-03T23:23:20Z

Describe the bug
On a HA / multi-master, issuing kubectl commands from a machine outside the cluster (e.g. CI agent) will sometime fail with a certificate error. The thought is that the HAProxy on the k8s master machines ends up routing the kubectl in a way that mismatches with the config on the external machine.

To Reproduce
Steps to reproduce the behavior:

Build an Epiphany cluster with HA / multi-master (3 masters in this case)
Copy the kube config from one of the k8s master machines to an external machine (as part of this, localhost needs to be replaced in the kube config)
Issue kubectl commands from the external machine, which will fail periodically (depending on how traffic is routed)

Expected behavior
It should be possible to issue kubectl commands from the external machine that work consistently.

Config files
Key aspects of the config are:

components:
    kubernetes_master:
      count: 3
...
use_ha_control_plane: true

OS (please complete the following information):

OS: Ubuntu

Cloud Environment (please complete the following information):

Cloud Provider: MS Azure

Additional context
Add any other context about the problem here.

cc @jsmith085 @sunshine69

The text was updated successfully, but these errors were encountered:

ks4225 · 2020-08-03T23:25:38Z

Example error message is:
Unable to connect to the server: x509: certificate is valid for ###, ###, not ### (where ### are IPs)

sk4zuzu · 2020-08-04T10:40:57Z

Thank you for reporting the issue, @ks4225 !

I've checked that indeed in non-HA and HA clusters the kubeconfig handling differ.

I believe, two things need to be done to fix the problem:

In new clusters - during the kubeadm init run we need to provide addtional cert SANs in the kubeadm config file/map.
In existing clusters that already have this issue - we need to modify the config map and regenerate certificates.

All this should be done during the epicli apply run.

As a temporary workaround some kind of tcp proxy can be used, for example:

$ ssh -L 3446:localhost:3446 [email protected] -N

$ kubectl --kubeconfig admin.conf get nodes,pods -A
NAME        STATUS   ROLES    AGE     VERSION
node/x1a1   Ready    master   58m     v1.18.6
node/x1a2   Ready    master   10m     v1.18.6
node/x1a3   Ready    master   9m12s   v1.18.6
node/x1b1   Ready    <none>   56m     v1.18.6

NAMESPACE              NAME                                             READY   STATUS    RESTARTS   AGE
kube-system            pod/coredns-74c98659f4-5c6tj                     1/1     Running   0          57m
kube-system            pod/coredns-74c98659f4-hc7fw                     1/1     Running   0          57m
kube-system            pod/etcd-x1a1                                    1/1     Running   0          58m
kube-system            pod/etcd-x1a2                                    1/1     Running   0          10m
kube-system            pod/etcd-x1a3                                    1/1     Running   0          9m1s
kube-system            pod/kube-apiserver-x1a1                          1/1     Running   1          58m
kube-system            pod/kube-apiserver-x1a2                          1/1     Running   0          10m
kube-system            pod/kube-apiserver-x1a3                          1/1     Running   0          9m1s
kube-system            pod/kube-controller-manager-x1a1                 1/1     Running   2          58m
kube-system            pod/kube-controller-manager-x1a2                 1/1     Running   0          10m
kube-system            pod/kube-controller-manager-x1a3                 1/1     Running   0          9m1s
kube-system            pod/kube-flannel-ds-amd64-5cmmr                  1/1     Running   0          9m12s
kube-system            pod/kube-flannel-ds-amd64-9wk8s                  1/1     Running   0          58m
kube-system            pod/kube-flannel-ds-amd64-btbmt                  1/1     Running   1          10m
kube-system            pod/kube-flannel-ds-amd64-j7s4c                  1/1     Running   0          56m
kube-system            pod/kube-proxy-5zvck                             1/1     Running   1          56m
kube-system            pod/kube-proxy-nfgld                             1/1     Running   1          58m
kube-system            pod/kube-proxy-q5rnd                             1/1     Running   0          9m12s
kube-system            pod/kube-proxy-ww4tf                             1/1     Running   0          10m
kube-system            pod/kube-scheduler-x1a1                          1/1     Running   2          58m
kube-system            pod/kube-scheduler-x1a2                          1/1     Running   0          10m
kube-system            pod/kube-scheduler-x1a3                          1/1     Running   0          9m1s
kubernetes-dashboard   pod/dashboard-metrics-scraper-667d84869b-tv8d2   1/1     Running   0          57m
kubernetes-dashboard   pod/kubernetes-dashboard-78fbf9d49c-qs7nr        1/1     Running   0          57m

It's not very convenient though :(

atsikham · 2020-08-11T09:04:44Z

Hello @ks4225,
There is a simple workaround to use kubectl with --insecure-skip-tls-verify for example kubectl --insecure-skip-tls-verify get nodes
I will continue work on final solution.

ks4225 · 2020-08-13T04:01:51Z

Thank you for the update @tolikt.

We have actually been using --insecure-skip-tls-verify already. Good to hear it's the recommended workaround.

mkyc · 2020-08-25T11:19:08Z

@przemyslavic @atsikham why is it back in pipeline? Can you leave any comment?

przemyslavic · 2020-08-25T11:33:46Z

I did some testing by following the instructions posted here to reproduce the issue. I deployed an HA cluster with public IP addresses on Azure, then logged into one machine (other than the master/node), copied admin.conf from one of the masters, replaced localhost with the private IP address of the master node, and now try to run kubectl. I am getting the same error that is described in this task. The support for public IPs will probably be removed here for security reasons, but I think @atsikham will be able to provide more details about the fix.
The result of the kubectl command:

NAME                                            STATUS   ROLES    AGE   VERSION
ci-devhaazurubuflannel-kubernetes-master-vm-0   Ready    master   21h   v1.18.6
ci-devhaazurubuflannel-kubernetes-master-vm-1   Ready    master   22h   v1.18.6
ci-devhaazurubuflannel-kubernetes-master-vm-2   Ready    master   22h   v1.18.6
ci-devhaazurubuflannel-kubernetes-node-vm-0     Ready    <none>   21h   v1.18.6
ci-devhaazurubuflannel-kubernetes-node-vm-1     Ready    <none>   21h   v1.18.6
ci-devhaazurubuflannel-kubernetes-node-vm-2     Ready    <none>   21h   v1.18.6
[operations@ci-devhaazurubuflannel-logging-vm-0 ~]$ kubectl get nodes
Unable to connect to the server: x509: certificate is valid for 10.96.0.1, 10.1.1.9, 51.xx.yy.72, 51.xx.yy.71, 51.xx.yy.68, 127.0.0.1, not 10.1.1.6
[operations@ci-devhaazurubuflannel-logging-vm-0 ~]$ kubectl get nodes
Unable to connect to the server: x509: certificate is valid for 10.96.0.1, 10.1.1.7, 127.0.0.1, 51.xx.yy.72, 51.xx.yy.71, 51.xx.yy.68, not 10.1.1.6
[operations@ci-devhaazurubuflannel-logging-vm-0 ~]$ kubectl get nodes
NAME                                            STATUS   ROLES    AGE   VERSION
ci-devhaazurubuflannel-kubernetes-master-vm-0   Ready    master   21h   v1.18.6
ci-devhaazurubuflannel-kubernetes-master-vm-1   Ready    master   22h   v1.18.6
ci-devhaazurubuflannel-kubernetes-master-vm-2   Ready    master   22h   v1.18.6
ci-devhaazurubuflannel-kubernetes-node-vm-0     Ready    <none>   21h   v1.18.6
ci-devhaazurubuflannel-kubernetes-node-vm-1     Ready    <none>   21h   v1.18.6
ci-devhaazurubuflannel-kubernetes-node-vm-2     Ready    <none>   21h   v1.18.6

przemyslavic · 2020-08-26T11:17:56Z

Reported an issue [BUG] Duplicated SANs for K8s apiserver certificate #1587

Should be fixed in kubeadm v1.19 - kubernetes/kubernetes#92753

przemyslavic · 2020-08-26T14:25:59Z

The fix has been tested. Now there should be no issues with running kubectl commands on an HA cluster.

ks4225 added status/grooming-needed type/bug labels Aug 3, 2020

sk4zuzu removed the status/grooming-needed label Aug 4, 2020

sk4zuzu added this to the S20200813 milestone Aug 4, 2020

mkyc added area/kubernetes provider/any status/ready-for-development labels Aug 6, 2020

atsikham self-assigned this Aug 10, 2020

mkyc modified the milestones: S20200813, S20200827 Aug 13, 2020

atsikham mentioned this issue Aug 13, 2020

Update apiserver certificate SANs with k8s master IPs #1556

Merged

atsikham removed the status/ready-for-development label Aug 13, 2020

przemyslavic self-assigned this Aug 21, 2020

atsikham mentioned this issue Aug 25, 2020

Modified apiserver certificate SAN update process #1585

Merged

mkyc closed this as completed Aug 27, 2020

atsikham mentioned this issue Sep 9, 2020

Use pki config from manifest for apiserver certificates #1626

Merged

atsikham mentioned this issue Dec 31, 2021

K8s improvements #2828

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Kubectl client outside of HA/multi-master Epiphany cluster fails to connect to server with invalid certificate #1520

[BUG] Kubectl client outside of HA/multi-master Epiphany cluster fails to connect to server with invalid certificate #1520

ks4225 commented Aug 3, 2020

ks4225 commented Aug 3, 2020

sk4zuzu commented Aug 4, 2020 •

edited

Loading

atsikham commented Aug 11, 2020 •

edited

Loading

ks4225 commented Aug 13, 2020

mkyc commented Aug 25, 2020

przemyslavic commented Aug 25, 2020 •

edited

Loading

przemyslavic commented Aug 26, 2020 •

edited

Loading

przemyslavic commented Aug 26, 2020

[BUG] Kubectl client outside of HA/multi-master Epiphany cluster fails to connect to server with invalid certificate #1520

[BUG] Kubectl client outside of HA/multi-master Epiphany cluster fails to connect to server with invalid certificate #1520

Comments

ks4225 commented Aug 3, 2020

ks4225 commented Aug 3, 2020

sk4zuzu commented Aug 4, 2020 • edited Loading

atsikham commented Aug 11, 2020 • edited Loading

ks4225 commented Aug 13, 2020

mkyc commented Aug 25, 2020

przemyslavic commented Aug 25, 2020 • edited Loading

przemyslavic commented Aug 26, 2020 • edited Loading

przemyslavic commented Aug 26, 2020

sk4zuzu commented Aug 4, 2020 •

edited

Loading

atsikham commented Aug 11, 2020 •

edited

Loading

przemyslavic commented Aug 25, 2020 •

edited

Loading

przemyslavic commented Aug 26, 2020 •

edited

Loading