Kubeadm fails to bring up a HA cluster due to EOF error when uploading configmap #1321

iverberk · 2018-12-13T20:56:50Z

What keywords did you search in kubeadm issues before filing this one?

EOF uploading config

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version : kubeadm version: &version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T21:02:01Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Kubernetes version : Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T21:04:45Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider or hardware configuration: Virtualbox VM
OS (e.g. from /etc/os-release): Ubuntu 18.04.1 LTS
Kernel (e.g. uname -a):Linux controller-1 4.15.0-38-generic kubeadm output should be clear and beautiful #41-Ubuntu SMP Wed Oct 10 10:59:38 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Other: I have three VM's running that are connected via a host network with Virtualbox (10.10.0.11, 10.10.0.12 and 10.10.0.13). There is a docker container running on my host that binds to the gateway address for the host-network (10.10.0.1) to provide a control plane endpoint that the controller nodes can use. This works flawlessly with 1.12 version of Kubernetes (also kubeadm install).

What happened?

I'm trying to set up a HA cluster with three control plane nodes. I can successfully bootstrap the first controller but when I try to join the second controller it fails. After writing the etcd pod manifest it tries to write the new kubeadm-config (I guess with the updated controller api endpoints) but it fails with:

error uploading configuration: Get https://10.10.0.1:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config: unexpected EOF

I'm using a haproxy loadbalancer in front of the three (to-be) API server nodes. HAproxy is quering the health endpoint of the API server and getting successful responses. Before joining the second controller I can successfully curl the endpoint with:

watch -n0.5 curl -k https://10.10.0.1:6443/api/v1/namespaces/kube-public/configmaps/cluster-info

When the second controller joins the above curl will fail with an EOF and only start working about 20-30 seconds later. In the meantime the join command tries to upload the new config and crashes.

What you expected to happen?

I would have expected that the config uploading was successful, either by waiting for a healthy control plane or no problems with the API server in the first place.

How to reproduce it (as minimally and precisely as possible)?

I'm setting this up in a Vagrant environment but I guess it's no different than described on the https://kubernetes.io/docs/setup/independent/high-availability/ page. Here is my kubeadm config for the first controller:

apiVersion: kubeadm.k8s.io/v1beta1
kind: InitConfiguration
bootstrapTokens:
- ttl: 1s
nodeRegistration:
  name: controller-1
  kubeletExtraArgs:
    node-ip: 10.10.0.11
    hostname-override: controller-1
localAPIEndpoint:
  advertiseAddress: 10.10.0.11
---
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: 1.13.0
clusterName: local
useHyperKubeImage: true
apiServer:
  certSANs:
  - "10.10.0.1"
  extraArgs:
    oidc-ca-file: /etc/kubernetes/pki/front-proxy-ca.crt
    oidc-issuer-url: https://keycloak.k8s.local/auth/realms/Kubernetes
    oidc-client-id: kubernetes
    oidc-username-claim: preferred_username
    oidc-username-prefix: user-
    oidc-groups-claim: groups
    oidc-groups-prefix: group-
    advertise-address: 10.10.0.11
    etcd-servers: "https://10.10.0.11:2379,https://10.10.0.12:2379,https://10.10.0.13:2379"
controlPlaneEndpoint: "10.10.0.1:6443"
networking:
  podSubnet: "10.200.0.0/16"

Anything else we need to know?

These are some of the api server logs when the etcd join happens:

E1213 19:03:31.444542       1 status.go:64] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: request timed out"}
I1213 19:03:31.444774       1 trace.go:76] Trace[753985076]: "Create /api/v1/namespaces/kube-system/pods" (started: 2018-12-13 19:03:24.441828372 +0000 UTC m=+114.693461957) (total time: 7.00
2869639s):
Trace[753985076]: [7.002869639s] [7.002606089s] END
I1213 19:03:33.109321       1 trace.go:76] Trace[1420167129]: "GuaranteedUpdate etcd3: *core.Node" (started: 2018-12-13 19:03:31.709712754 +0000 UTC m=+121.961346387) (total time: 1.399576303
s):
Trace[1420167129]: [1.399371901s] [1.398742251s] Transaction committed
I1213 19:03:33.109473       1 trace.go:76] Trace[769327697]: "Patch /api/v1/nodes/controller-1/status" (started: 2018-12-13 19:03:31.709618011 +0000 UTC m=+121.961251600) (total time: 1.39984
0945s):
Trace[769327697]: [1.399731089s] [1.39927972s] Object stored in database
I1213 19:03:33.112016       1 trace.go:76] Trace[2022830437]: "Create /api/v1/namespaces/default/events" (started: 2018-12-13 19:03:26.478524582 +0000 UTC m=+116.730158162) (total time: 6.633
469874s):
Trace[2022830437]: [6.633398498s] [6.633329302s] Object stored in database
I1213 19:03:33.112605       1 trace.go:76] Trace[1385314816]: "Create /api/v1/namespaces/kube-system/events" (started: 2018-12-13 19:03:28.97488789 +0000 UTC m=+119.226521475) (total time: 4.
137566155s):
Trace[1385314816]: [4.137467107s] [4.137345776s] Object stored in database
E1213 19:03:35.352793       1 status.go:64] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
I1213 19:03:35.353023       1 trace.go:76] Trace[1661503768]: "Get /api/v1/namespaces/kube-system/endpoints/kube-controller-manager" (started: 2018-12-13 19:03:25.365882522 +0000 UTC m=+115.6
17516103) (total time: 9.987128454s):
Trace[1661503768]: [9.987128454s] [9.987104356s] END
E1213 19:03:35.886799       1 status.go:64] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
I1213 19:03:35.887162       1 trace.go:76] Trace[853992113]: "Get /api/v1/namespaces/kube-system/endpoints/kube-controller-manager" (started: 2018-12-13 19:03:25.887702792 +0000 UTC m=+116.13
9336376) (total time: 9.999444244s):
Trace[853992113]: [9.999444244s] [9.999421457s] END
E1213 19:03:35.932627       1 status.go:64] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
E1213 19:03:35.933601       1 status.go:64] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}

This is my haproxy config:

defaults
  timeout connect 5000ms
  timeout check 5000ms
  timeout server 30000ms
  timeout client 30000

global
  tune.ssl.default-dh-param 2048

listen stats
  bind :9000
  mode http
  stats enable
  stats hide-version
  stats realm Haproxy\ Statistics
  stats uri /stats

listen apiserver
  bind :6443
  mode tcp
  balance roundrobin
  option httpchk GET /healthz
  http-check expect string ok

  server apiserver1 10.10.0.11:6443 check check-ssl verify none
  server apiserver2 10.10.0.12:6443 check check-ssl verify none
  server apiserver3 10.10.0.13:6443 check check-ssl verify none

listen ingress
  bind :80
  mode http
  balance roundrobin

  server worker1 10.10.0.21:30080 check
  server worker2 10.10.0.22:30080 check
  server worker3 10.10.0.23:30080 check

listen ingress-443
  bind :443 ssl crt /usr/local/etc/haproxy/local-ssl.pem
  mode http
  balance roundrobin

  server worker1 10.10.0.21:30080 check
  server worker2 10.10.0.21:30080 check
  server worker3 10.10.0.23:30080 check

The text was updated successfully, but these errors were encountered:

fabriziopandini · 2018-12-13T21:33:22Z

@iverberk Is there a reason for adding the following extra args:

etcd-servers: "https://10.10.0.11:2379,https://10.10.0.12:2379,https://10.10.0.13:2379"

Could you try with this setting?

iverberk · 2018-12-13T21:34:36Z

No, that is a left-over from the 1.12 configuration. I will try to remove that and update the issue.

iverberk · 2018-12-13T21:41:38Z

sigh I guess sometimes you need someone else to tell you the obvious... that was the culprit, sorry for the hassle. This did work well in 1.12 and for some reason, that I can't remember anymore, this was a necessary configuration parameter to make it work.

iverberk · 2018-12-18T08:02:52Z

I thought this was solved but the problem still remains, even with the settings removed. Sometimes it works though. I'm not sure what the exact reason is, but most likely some kind of race condition. I've created a test repository to isolate the problem. In this repo: https://github.com/iverberk/kubeadm-cp-test you can find a test setup with Vagrant and Docker that show the problem when joining the second controller to the first controller. Hopefully this will illustrate the problem and make way for a solution.

iverberk · 2018-12-19T13:37:53Z

Ok, new information: bootstrapping is successful if I pre-pull the hyperkube image...I tested this with my own Ansible environment but will update the test repo as well to see if it works.

iverberk · 2018-12-19T13:50:28Z

I can't quite reproduce the same result in the test repo. I stumbled upon this because after the first installation I would reset the kubeadm installation and run it again. The second time it would succeed. I guess one of the differences is that the image is already there at that point. It is still a weird issue and I'd like to know how you test the bootrstrapping scenario and why you never see this behaviour. If this is a Vagrant thing we should be able to pinpoint it.

fabriziopandini · 2018-12-23T09:10:09Z

@iverberk could you kindly retest after kubernetes/kubernetes#72030 merged?

wafflespeanut · 2019-01-06T20:00:49Z

I'm hitting the same issue regardless of pre-pulling the images. The first master bootstraps successfully whereas the second master fails with unexpected EOF when uploading some configuration. The weirdest part is that when I try adding another master node, it bootstraps successfully!

waffles@kube-master-1:~$ kubectl get nodes
NAME            STATUS   ROLES    AGE   VERSION
kube-master-1   Ready    master   28m   v1.13.1
kube-master-2   Ready    <none>   18m   v1.13.1
kube-master-3   Ready    master   86s   v1.13.1
kube-worker-1   Ready    <none>   26m   v1.13.1
kube-worker-2   Ready    <none>   25m   v1.13.1
kube-worker-3   Ready    <none>   25m   v1.13.1

Apparently, the second master isn't a master, but the third master has no trouble. The images were pre-pulled in all machines, so I guess pre-pulling doesn't affect anything.

My kubeadm-config.yaml was the same as the one in docs (only difference is in the pod subnet because I was using Flannel):

apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: stable
networking:
  podSubnet: 10.244.0.0/16
apiServer:
  certSANs:
  - "PUBLIC_IP"
controlPlaneEndpoint: "PUBLIC_IP:PORT"

@iverberk Could you confirm this by adding another master after the second one fails?

iverberk · 2019-01-06T21:25:41Z

I can confirm that the pre-pulling doesn't always work. It seemed to work reliably but the last time I got the same error. This remains some kind of timing issue unfortunately.

…

Op 6 jan. 2019 om 21:00 heeft Ravi Shankar ***@***.***> het volgende geschreven: I'm hitting the same issue regardless of pre-pulling the images. The first master bootstraps successfully whereas the second master fails with unexpected EOF when uploading some configuration. The weirdest part is that when I try adding another master node, it bootstraps successfully! ***@***.***:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION kube-master-1 Ready master 28m v1.13.1 kube-master-2 Ready <none> 18m v1.13.1 kube-master-3 Ready master 86s v1.13.1 kube-worker-1 Ready <none> 26m v1.13.1 kube-worker-2 Ready <none> 25m v1.13.1 kube-worker-3 Ready <none> 25m v1.13.1 Here, the images were pre-pulled in all machines, so I guess pre-pulling doesn't . Apparently, the second master isn't a master, but the third master has no trouble. @iverberk Could you confirm this by adding another master after the second one fails? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

fmehrdad · 2019-01-12T01:10:57Z

I have exact same problem.

fabriziopandini · 2019-01-12T09:08:51Z

@iverberk @fmehrdad @wafflespeanut
IMO opinion the EOF error isn't related to image pre pull at all.
This was related to a race conditions fixed by kubernetes/kubernetes#72030 on master and then cherry picked in v1.13.2

Could you kindly repeated the test against on one of the above version?

iverberk · 2019-01-12T09:36:56Z

I just retested with the repo that I created to reproduce this issue. This is the result of adding the second controller:

controller-2: Setting up docker-ce-cli (5:18.09.1~3-0~ubuntu-bionic) ...
    controller-2: Setting up kubeadm (1.13.2-00) ...
    controller-2: Setting up pigz (2.4-1) ...
    controller-2: Setting up docker-ce (5:18.09.1~3-0~ubuntu-bionic) ...
    controller-2: update-alternatives: using /usr/bin/dockerd-ce to provide /usr/bin/dockerd (dockerd) in auto mode
    controller-2: Created symlink /etc/systemd/system/multi-user.target.wants/docker.service → /lib/systemd/system/docker.service.
    controller-2: Created symlink /etc/systemd/system/sockets.target.wants/docker.socket → /lib/systemd/system/docker.socket.
    controller-2: Processing triggers for ureadahead (0.100.0-20) ...
    controller-2: Processing triggers for libc-bin (2.27-3ubuntu1) ...
    controller-2: Processing triggers for systemd (237-3ubuntu10.4) ...
    controller-2: kubelet set on hold.
    controller-2: kubeadm set on hold.
    controller-2: kubectl set on hold.
    controller-2: [preflight] Running pre-flight checks
    controller-2:       [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.1. Latest validated version: 18.06
    controller-2: [discovery] Trying to connect to API Server "10.11.0.1:6443"
    controller-2: [discovery] Created cluster-info discovery client, requesting info from "https://10.11.0.1:6443"
    controller-2: [discovery] Cluster info signature and contents are valid and no TLS pinning was specified, will use API Server "10.11.0.1:6443"
    controller-2: [discovery] Successfully established connection with API Server "10.11.0.1:6443"
    controller-2: [join] Reading configuration from the cluster...
    controller-2: [join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
    controller-2: [join] Running pre-flight checks before initializing the new control plane instance
    controller-2:       [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.1. Latest validated version: 18.06
    controller-2: [certs] Using the existing "front-proxy-client" certificate and key
    controller-2: [certs] Using the existing "etcd/peer" certificate and key
    controller-2: [certs] Using the existing "etcd/healthcheck-client" certificate and key
    controller-2: [certs] Using the existing "etcd/server" certificate and key
    controller-2: [certs] Using the existing "apiserver-etcd-client" certificate and key
    controller-2: [certs] Using the existing "apiserver" certificate and key
    controller-2: [certs] Using the existing "apiserver-kubelet-client" certificate and key
    controller-2: [certs] valid certificates and keys now exist in "/etc/kubernetes/pki"
    controller-2: [certs] Using the existing "sa" key
    controller-2: [kubeconfig] Writing "admin.conf" kubeconfig file
    controller-2: [kubeconfig] Writing "controller-manager.conf" kubeconfig file
    controller-2: [kubeconfig] Writing "scheduler.conf" kubeconfig file
    controller-2: [etcd] Checking Etcd cluster health
    controller-2: [kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.13" ConfigMap in the kube-system namespace
    controller-2: [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
    controller-2: [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
    controller-2: [kubelet-start] Activating the kubelet service
    controller-2: [tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
    controller-2: [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "controller-2" as an annotation
    controller-2: [etcd] Announced new etcd member joining to the existing etcd cluster
    controller-2: [etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
    controller-2: [uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
    controller-2: error uploading configuration: Get https://10.11.0.1:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config: unexpected EOF
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.

So unless my repo is not following the correct procedure, or the version of Kubadm that is used (1.13.2) is not correct, this is still not fixed.

@fabriziopandini would it be possible for you to evaluate my repo and make an assessment if it is correct and representative of a vanilla kubeadm HA bootstrap flow?

fabriziopandini · 2019-01-12T10:28:52Z

@ereslibre could you kindly check again the EOF problem on your vagrant setup?

ereslibre · 2019-01-12T11:54:11Z

@ereslibre could you kindly check again the EOF problem on your vagrant setup?

I will have a look at it tonight, the race condition fixed by kubernetes/kubernetes#72030 was slightly different, I never saw this one, but I'm happy to try to reproduce the issue and try to find the root cause. I will report back.

masantiago · 2019-01-13T17:16:21Z

I have experimented the same behaviour that you guys. The weirdest thing is to find that the first join (master-2) fails, but the second one is successful (master-3).

vagrant@k8-master1:~$ kubectl get nodes
NAME         STATUS   ROLES    AGE     VERSION
k8-master1   Ready    master   10m     v1.13.2
k8-master2   Ready    <none>   8m36s   v1.13.2
k8-master3   Ready    master   5m39s   v1.13.2

master-2 joining process yields:

[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
error uploading configuration: Get https://172.168.33.20:16443/api/v1/namespaces/kube-system/configmaps/kubeadm-config: unexpected EOF

ereslibre · 2019-01-13T18:06:29Z

With https://github.com/iverberk/kubeadm-cp-test it's 100% reproducible. I didn't hit it with my project https://github.com/ereslibre/kubernetes-cluster-vagrant though.

I am still in the process of checking what's wrong, but some observations: upon joining the second master, some processes on the first master crash: namely, the controller-manager, the scheduler and etcd afterwards.

Scheduler:

E0113 17:56:42.710557       1 server.go:261] lost master
lost lease

Controller manager:

I0113 17:56:43.379606       1 leaderelection.go:249] failed to renew lease kube-system/kube-controller-manager: failed to tryAcquireOrRenew context deadline exceeded
F0113 17:56:43.379824       1 controllermanager.go:254] leaderelection lost

etcd keeps restarting on the first master too. etcd on the second master keeps failing to start:

2019-01-13 18:03:14.092724 I | etcdmain: rejected connection from "10.11.0.11:50946" (error "remote error: tls: bad certificate", ServerName "")
2019-01-13 18:03:14.107213 I | etcdmain: rejected connection from "10.11.0.11:50948" (error "remote error: tls: bad certificate", ServerName "")
2019-01-13 18:03:14.163532 W | etcdserver: could not get cluster response from https://10.11.0.11:2380: Get https://10.11.0.11:2380/members: EOF
2019-01-13 18:03:14.164947 C | etcdmain: cannot fetch cluster info from peer urls: could not retrieve cluster information from the given urls

@masantiago are you experiencing similar errors in your setup?

masantiago · 2019-01-13T19:36:24Z

Yes, indeed. In fact, I ended up with the final status after several restarts of the Weave POD. It seems to be an unstable situation, confirmed when I shutdown the master-3. I am not able to access the cluster since then. It always responds with an unexpected EOF

Do not hesitate to ask me any trace you require.

I will have a look to your project @ereslibre. Did you also try with 13.x version?

ereslibre · 2019-01-13T19:45:20Z

I will have a look to your project @ereslibre. Did you also try with 13.x version?

Kubernetes-cluster-vagrant is merely a project to be able to code easier on Kubernetes itself (soon will be deprecated completely by kind). I didn't focus on deploying existing released versions, but it shouldn't be hard to extend to support that usecase.

@iverberk I am changing the code of your project a bit to confirm some things: iverberk/kubeadm-cp-test@master...ereslibre:master

The main change I did is to avoid copying too many certificates and keys, because some certificates won't be valid if you copy all of them directly (certificate SANS use the detected IP address on each machine and some certificates just cannot be reused on all machines). After this change I no longer see crashes when growing etcd.

So, what I can see at this point (with the changes applied) is that etcd takes a bit longer to start on the second controller, and since we are automatically using the stacked etcd cluster, the phase uploadconfig timeouts. I don't consider this a race condition (if the theory proves to be right), but rather a timeout that is too low, taking into account that some images need to be pulled on the new machine.

I think I'm not getting this problem on my project because in my case I create a base box with all the dependencies pulled, used for all machines. This means that I don't wait to pull for images on kubernetes-cluster-vagrant.

I will report back when I have more information.

ereslibre · 2019-01-13T20:52:44Z

I think my theory stands. So, what I did was to use: iverberk/kubeadm-cp-test@master...ereslibre:master. It's very important to only copy the certificates that need copying, and not the rest, because if certain certificates exist they won't be generated and their SANs won't include the proper IP addresses on the new machines.

With the changes I previously linked I ran:

# ./create-controllers.sh
# vagrant provision controller-2

And the node joined just fine, without any timeout. This is because I added the docker pull of the images before the kubeadm join is called (and only copied the certificates and keys that needed copying). I will create a PR with a fix for the timeout issue.

MalloZup · 2019-01-14T10:57:43Z

ok hopefully this can fixed then with #1341

ereslibre · 2019-01-14T12:53:02Z

So, two things after discussing with @fabriziopandini:

Copying the whole /etc/kubernetes/pki from one machine to another will lead to this problem.
- When we do a kubeadm join we are only checking if the certificates are present, not checking if the SANs match what we expect. @fabriziopandini is +1 to not only check for the presence of the certificate, but to check also the SANs of it, and if it doesn't match what we expect, we regenerate them. Priority #0 is etcd, then kubeadm join on slave node fails preflight checks #1 is apiserver.
The lack of image pre-pullling is addressed in issue kubeadm join controlplane not pulling images and fails #1341 with a PR in the works here: Kubeadm/HA: pull images during join for control-plane kubernetes#72870.

As for 1., despite we name the explicit certificates that need copying on the documentation (https://kubernetes.io/docs/setup/independent/high-availability/#steps-for-the-rest-of-the-control-plane-nodes), I think we can expect more people to try to copy /etc/kubernetes/pki directly between machines, basically because it's handier. If we address the issue of the SAN checking when doing a kubeadm join this wouldn't be a problem, because the certificates that don't match what we expect would simply be recreated.

I see this as a temporary solution until we have completely addressed the automatic transmission of secrets between masters when creating an HA cluster with kubeadm. Until we have a proper solution this would make it easier to copy things around between machines.

fabriziopandini · 2019-01-14T17:17:23Z

@ereslibre thanks for the wrap up

despite we name the explicit certificates that need copying on the documentation ...

What about adding an explicit warning on the document: don't copy all the certs!

I see this as a temporary solution

Unfortunately automatic transmission of certs will be optional so, this fix should is necessary for v1.14 too; nevertheless, let's check if this can be fixed with a small change/eligible for backport in v1.13 .

if it doesn't match what we expect, we regenerate them

Might be better error out with a better error message (instead of silently changing certs). Wdyt?

ereslibre · 2019-01-14T17:27:49Z

Might be better error out with a better error message (instead of silently changing certs). Wdyt?

I would also agree with this solution and feel it's a better one, because by automatically fixing the certificates we would be "promoting" the bad habit of copying everything when not everything is needed, so I'm good with erroring out and panicking the join in that case. We don't try to be smart, we just check if a certificate exists and if it doesn't match what we expect we abort.

fmehrdad · 2019-01-14T17:54:14Z

FYI I am only copying the files listed in https://kubernetes.io/docs/setup/independent/high-availability/ and still have this problem.

ereslibre · 2019-01-14T17:59:54Z

@fmehrdad Can you double check if prepulling the images in the new node before calling to kubeadm join helps? If that's the case #1341 is the issue.

fmehrdad · 2019-01-14T20:17:19Z

I have been pre pulling the images using "kubeadm config images pull" ahead of time. did not help. I also tried 1.13.2 with no success.

…

On Mon, Jan 14, 2019 at 9:59 AM Rafael Fernández López < ***@***.***> wrote: @fmehrdad <https://github.com/fmehrdad> Can you double check if prepulling the images in the new node before calling to kubeadm join helps? If that's the case kubernetes/kubernetes#72870 <kubernetes/kubernetes#72870> is the issue. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1321 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC2rtDs82V_j0uQnd5YO2rp7UV8foiHSks5vDMWegaJpZM4ZSaP7> .

ereslibre · 2019-01-14T20:55:15Z

I have been pre pulling the images using "kubeadm config images pull" ahead of time. did not help.
I also tried 1.13.2 with no success.

I need more information in order to know what's going on in your setup @fmehrdad. This issue reported by @iverberk comes with a repository that shows two different issues.

etcd failing to grow, this is going to be fixed by a PR that checks certificates are correct if they exist.
Pre-pulling does not happen (kubeadm join controlplane not pulling images and fails #1341).

Both problems cause the same visible error, but the issues have a different nature. Can you please paste the configurations you are using to deploy the cluster and how your setup is done (HA...)?

masantiago · 2019-01-14T22:00:01Z

I've just tested with pre-pulling in second master, and the same behavior remains like @fmehrdad. My configuration is like that:

cni_version: 0.6.0-00
kubelet_version: 1.13.2-00
kubeadm_version: 1.13.2-00
docker_version: 18.06.0ce3-0~ubuntu

Vagrant like

BOX = "ubuntu/xenial64"
config.vm.define "k8-master1" do |app|
    	app.vm.box = BOX
		app.vm.network "private_network", ip: "172.168.33.10"
		app.vm.hostname = "k8-master1"
...
config.vm.define "k8-master2" do |app|
		app.vm.box = BOX
		app.vm.network "private_network", ip: "172.168.33.11"
		app.vm.hostname = "k8-master2"
...
config.vm.define "k8-master3" do |app|
		app.vm.box = BOX
		app.vm.network "private_network", ip: "172.168.33.12"
		app.vm.hostname = "k8-master3"

kubeadm-config.yaml

apiVersion: kubeadm.k8s.io/v1beta1
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: "172.168.33.10"
  bindPort: 6443
---
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: 1.13.2
apiServer:
  certSANs:
  - "172.168.33.20"
controlPlaneEndpoint: "172.168.33.20:16443"

where 172.168.33.20 is the VIP for the three masters using a keepalived and nginx load balancing.

master1

sudo kubeadm init --config=kubeadm-config.yaml
...
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

Copying exactly the certs indicated in https://kubernetes.io/docs/setup/independent/high-availability/#stacked-control-plane-and-etcd-nodes

master2

sudo kubeadm config images pull
sudo kubeadm join 172.168.33.20:16443 --token n44hpu.7goanq56edi9v2dl --discovery-token-ca-cert-hash sha256:b40c6a97c2b9c984f471b46c7bf1c40f90a826eec5996d49a63ce8bf19b67608 --experimental-control-plane --apiserver-advertise-address 172.168.33.11

master3

sudo kubeadm config images pull
sudo kubeadm join 172.168.33.20:16443 --token n44hpu.7goanq56edi9v2dl --discovery-token-ca-cert-hash sha256:b40c6a97c2b9c984f471b46c7bf1c40f90a826eec5996d49a63ce8bf19b67608 --experimental-control-plane --apiserver-advertise-address 172.168.33.12

Result:

vagrant@k8-master1:~$ kubectl get nodes
NAME         STATUS   ROLES    AGE   VERSION
k8-master1   Ready    master   17m   v1.13.2
k8-master2   Ready    <none>   13m   v1.13.2
k8-master3   Ready    master   11m   v1.13.2

ereslibre · 2019-01-15T13:01:00Z

I can see something that will be probably related. When joining the second master to the cluster the https://172.28.128.25:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config endpoint takes far longer to answer than on the third.

Master 2 (~14 seconds):

[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
I0115 12:52:38.633739    2027 round_trippers.go:419] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.14.0 (linux/amd64) kubernetes/1b28775" 'https://172.28.128.25:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config'
I0115 12:52:52.960533    2027 round_trippers.go:438] GET https://172.28.128.25:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config 200 OK in 14325 milliseconds
I0115 12:52:52.960568    2027 round_trippers.go:444] Response Headers:
I0115 12:52:52.960576    2027 round_trippers.go:447]     Content-Type: application/json
I0115 12:52:52.960593    2027 round_trippers.go:447]     Content-Length: 1149
I0115 12:52:52.960604    2027 round_trippers.go:447]     Date: Tue, 15 Jan 2019 12:52:54 GMT

Master 3 (~40 milliseconds):

[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
I0115 12:54:02.519284    2118 round_trippers.go:419] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.14.0 (linux/amd64) kubernetes/1b28775" 'https://172.28.128.25:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config'
I0115 12:54:02.557663    2118 round_trippers.go:438] GET https://172.28.128.25:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config 200 OK in 38 milliseconds
I0115 12:54:02.557732    2118 round_trippers.go:444] Response Headers:
I0115 12:54:02.557741    2118 round_trippers.go:447]     Content-Type: application/json
I0115 12:54:02.557747    2118 round_trippers.go:447]     Content-Length: 1218
I0115 12:54:02.557753    2118 round_trippers.go:447]     Date: Tue, 15 Jan 2019 12:54:02 GMT

So I can confirm that the first control plane join takes a bit longer, so it seems that we have yet another cause for this issue and this would match with @masantiago and @fmehrdad descriptions.

I did the kubeadm join with -v10 in order to find what was the request that was blocking. I'll keep digging to see the root cause of this, but we have 3 different issues causing the same visible problem, maybe it's time to split this issue :)

fmehrdad · 2019-01-15T23:37:25Z

My problem was not related to k8s. My nginx-lb config had a very short timeout.

I changed my proxy_timeout from 3s to 24h.

Here is my config
user nginx;
worker_processes 1;

error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;

events {
worker_connections 1024;
}

http {
include /etc/nginx/mime.types;
default_type application/octet-stream;

log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                  '$status $body_bytes_sent "$http_referer" '
                  '"$http_user_agent" "$http_x_forwarded_for"';

access_log  /var/log/nginx/access.log  main;

sendfile        on;
#tcp_nopush     on;

keepalive_timeout  65;

#gzip  on;

include /etc/nginx/conf.d/*.conf;

}

stream {
upstream apiserver {
#server IP1:6443 weight=5 max_fails=9 fail_timeout=30s;
server IP2:6443 weight=5 max_fails=9 fail_timeout=30s;
#server IP3:6443 weight=5 max_fails=9 fail_timeout=30s;
}

server {
    listen 16443;
    proxy_connect_timeout 1s;
    proxy_timeout 24h;
    proxy_pass apiserver;
}
log_format proxy '$remote_addr [$time_local] '
             '$protocol $status $bytes_sent $bytes_received '
             '$session_time "$upstream_addr" '
             '"$upstream_bytes_sent" "$upstream_bytes_received" "$upstream_connect_time"';
access_log  /var/log/nginx/access.log  proxy;

}

ereslibre · 2019-01-16T16:45:36Z

/assign

k8s-ci-robot · 2019-01-16T16:45:38Z

@ereslibre: GitHub didn't allow me to assign the following users: ereslibre.

Note that only kubernetes members and repo collaborators can be assigned and that issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

rosti · 2019-01-16T16:50:37Z

@ereslibre will be working on this one.

/lifecycle active

masantiago · 2019-01-16T16:51:41Z

@fmehrdad Yesterday, I made a similar test but removing the LB (nginx) at all. I got to put all three masters in active, but:

They scheduled correctly three PODs (the masters were untainted, I do not have any workers).
When start shutting down master, the PODs were not re-scheduled to the remaining masters. So the situation was not stable.

Can you check if it is your case?

masantiago · 2019-01-16T22:17:19Z

Just tested @fmehrdad with your nginx config and I got the unexpected behaviour when shutting down masters. The replicas of PODs are not rescheduled and, moreover, when only the master 1 is left, I retrieve:

>>kubectl get pods -o wide
Unable to connect to the server: EOF

It seems that the HA of either etcd or control plane is not working properly.

I really appreciate your feedback in such case.

ereslibre · 2019-01-16T22:50:49Z

It seems that the HA of either etcd or control plane is not working properly.

If you have grown your cluster to 3 masters etcd was also grown to 3. As per the etcd admin guide in a cluster of 3 the fault tolerance is of 1. If you shut down 2 out of 3 etcd instances your etcd cluster will be unavailable.

Please, let's keep this issue from now on for the certificate issue, since the reporter provided a repository that contained this problem and we are keeping this issue open because of that. This issue is split in 3 different cases:

Bad etcd certificates: this issue.
Not pre-pulling images when joining a control plane: kubeadm join controlplane not pulling images and fails #1341 (fix already merged in master)
Not explicitly waiting for etcd to be healthy when we grow the cluster: kubeadm join does not explicitly wait for etcd to have grown when joining secondary control plane #1353

For any related issue please refer to the explicit issues linked above, otherwise please let's open a new bug report since this one is already mixing very different things. Thank you!

iverberk · 2019-01-25T17:30:59Z

@ereslibre and others, just wanted to say a big thank you for investigating this! It's great to see such commitment on making kubeadm a great tool to use.

ReSearchITEng · 2019-04-03T06:51:11Z

Hello,
I have something very similar in the logs, and can't figure it out what is causing these messages in the apiserver logs.

Using k8s v1.12.1 in HA mode.
We have a 3 master cluster up and running. Difference from above is that we use a keepalived to move the VIP from one master to another (therefore no haproxy in front).

E0403 06:41:59.216530 1 status.go:64] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"The order in patch list:\n[map[address: CLUSTER_VIP_ADDR type:ExternalIP] map[address: ACTIVE_MASTER_IP_ADDR type:ExternalIP] map[address:CLUSTER_VIP_ADDR type:InternalIP] map[address:ACTIVE_MASTER_IP_ADDR type:InternalIP]]\n doesn't match $setElementOrder list:\n[map[type:ExternalIP] map[type:InternalIP] map[type:ExternalIP] map[type:InternalIP] map[type:Hostname]]\n"}

ACTIVE_MASTER_IP_ADDR -> is the current master node address where the keepalived is currently in master state (the other 2 are in slave/backup mode).
CLUSTER_VIP_ADDR -> is the VIP address which keepalived moves based on the health of the apiserver.

HOW to reproduce:
The entire setup is done with the project we maintain here for quite a while: https://github.com/ReSearchITEng/kubeadm-playbook/

iverberk closed this as completed Dec 13, 2018

iverberk reopened this Dec 18, 2018

neolit123 added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. area/HA priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Dec 18, 2018

timothysc added this to the v1.14 milestone Jan 7, 2019

ereslibre mentioned this issue Jan 16, 2019

kubeadm join does not explicitly wait for etcd to have grown when joining secondary control plane #1353

Closed

k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Jan 16, 2019

ereslibre added a commit to ereslibre/kubernetes-cluster-vagrant that referenced this issue Jan 18, 2019

Reproduce kubernetes/kubeadm#1321 original issue

5aa8b2b

ereslibre mentioned this issue Jan 18, 2019

kubeadm: verify that present certificates contain at least the required SANs kubernetes/kubernetes#73093

Merged

k8s-ci-robot closed this as completed in kubernetes/kubernetes#73093 Jan 25, 2019

ereslibre mentioned this issue Jan 27, 2019

REQUEST: New membership for ereslibre kubernetes/org#421

Closed

6 tasks

Kubeadm fails to bring up a HA cluster due to EOF error when uploading configmap #1321

Kubeadm fails to bring up a HA cluster due to EOF error when uploading configmap #1321

Comments

iverberk commented Dec 13, 2018 • edited Loading

What keywords did you search in kubeadm issues before filing this one?

Is this a BUG REPORT or FEATURE REQUEST?

Versions

What happened?

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

fabriziopandini commented Dec 13, 2018 • edited Loading

iverberk commented Dec 13, 2018

iverberk commented Dec 13, 2018

iverberk commented Dec 18, 2018

iverberk commented Dec 19, 2018

iverberk commented Dec 19, 2018

fabriziopandini commented Dec 23, 2018

wafflespeanut commented Jan 6, 2019 • edited Loading

iverberk commented Jan 6, 2019 via email

fmehrdad commented Jan 12, 2019

fabriziopandini commented Jan 12, 2019

iverberk commented Jan 12, 2019 • edited Loading

fabriziopandini commented Jan 12, 2019

ereslibre commented Jan 12, 2019

masantiago commented Jan 13, 2019

ereslibre commented Jan 13, 2019

masantiago commented Jan 13, 2019

ereslibre commented Jan 13, 2019

ereslibre commented Jan 13, 2019 • edited Loading

MalloZup commented Jan 14, 2019

ereslibre commented Jan 14, 2019 • edited Loading

fabriziopandini commented Jan 14, 2019

ereslibre commented Jan 14, 2019

fmehrdad commented Jan 14, 2019

ereslibre commented Jan 14, 2019 • edited Loading

fmehrdad commented Jan 14, 2019 via email

ereslibre commented Jan 14, 2019

masantiago commented Jan 14, 2019 • edited Loading

ereslibre commented Jan 15, 2019

fmehrdad commented Jan 15, 2019

ereslibre commented Jan 16, 2019

k8s-ci-robot commented Jan 16, 2019

rosti commented Jan 16, 2019

masantiago commented Jan 16, 2019

masantiago commented Jan 16, 2019

ereslibre commented Jan 16, 2019

iverberk commented Jan 25, 2019

ReSearchITEng commented Apr 3, 2019

iverberk commented Dec 13, 2018 •

edited

Loading

fabriziopandini commented Dec 13, 2018 •

edited

Loading

wafflespeanut commented Jan 6, 2019 •

edited

Loading

iverberk commented Jan 12, 2019 •

edited

Loading

ereslibre commented Jan 13, 2019 •

edited

Loading

ereslibre commented Jan 14, 2019 •

edited

Loading

ereslibre commented Jan 14, 2019 •

edited

Loading

masantiago commented Jan 14, 2019 •

edited

Loading