Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CABPK creates multiple machines with kubeadm init #3072

Closed
kanwar-saad opened this issue May 17, 2020 · 11 comments
Closed

CABPK creates multiple machines with kubeadm init #3072

kanwar-saad opened this issue May 17, 2020 · 11 comments
Labels
area/bootstrap Issues or PRs related to bootstrap providers kind/support Categorizes issue or PR as a support question. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.

Comments

@kanwar-saad
Copy link

kanwar-saad commented May 17, 2020

What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]
I am creating control plane machines directly without kubeadmcontrolplanecontroller. If I create multiple controlplane machines back to back, the CAPK starts provisioning all of them at once and all machines have kubeadm init in their userdata instead of only for the first master.
When creating KubeadmConfig object I set kubeadm init config only for first master and join config fields for the other two masters.

bash-4.2$ kubectl get machine -n ccd-ovb-60-capi    
NAME                  PROVIDERID   PHASE
ccd-ovb-60-cp-m0                   Provisioning
ccd-ovb-60-cp-m1                   Provisioning
ccd-ovb-60-cp-m2                   Provisioning
ccd-ovb-60-pool1-m0                Pending
ccd-ovb-60-pool1-m1                Pending

What did you expect to happen:

  • Only First master ccd-ovb-60-cp-m0 should have kubeadm init in userdata, the other two masters should have kubeadm join.
  • Not all control plane machines should be provisioned at once.

Anything else you would like to add:
I tried adding a 4 sec delay between creation of machines but still the same result.

Environment:

  • Cluster-api version: v0.3.5
  • Minikube/KIND version:
  • Kubernetes version: (use kubectl version): v1.17.3
  • OS (e.g. from /etc/os-release): SLES

/kind bug

capi_ha.log

/area bootstrap
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

@k8s-ci-robot
Copy link
Contributor

@kanwar-saad: The label(s) area/[capi_ha.log](https://github.com/kubernetes-sigs/cluster-api/files/4639730/capi_ha.log) cannot be applied, because the repository doesn't have them

In response to this:

What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]
I am creating control plane machines directly without kubeadmcontrolplanecontroller. If I create multiple controlplane machines back to back, the CAPK starts provisioning all of them at once and all machines have kubeadm init in their userdata instead of only for the first master.
When creating KubeadmConfig object I set kubeadm init config only for first master and join config fields for the other two masters.

bash-4.2$ kubectl get machine -n ccd-ovb-60-capi    
NAME                  PROVIDERID   PHASE
ccd-ovb-60-cp-m0                   Provisioning
ccd-ovb-60-cp-m1                   Provisioning
ccd-ovb-60-cp-m2                   Provisioning
ccd-ovb-60-pool1-m0                Pending
ccd-ovb-60-pool1-m1                Pending

What did you expect to happen:

  • Only First master ccd-ovb-60-cp-m0 should have kubeadm init in userdata, the other two masters should have kubeadm join.
  • Not all control plane machines should be provisioned at once.

Anything else you would like to add:
I tried adding a 4 sec delay between creation of machines but still the same result.

Environment:

  • Cluster-api version: v0.3.5
  • Minikube/KIND version:
  • Kubernetes version: (use kubectl version): v1.17.3
  • OS (e.g. from /etc/os-release): SLES

/kind bug

/area
capi_ha.log

bootstrap
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 17, 2020
@kanwar-saad
Copy link
Author

/area bootstrap

@k8s-ci-robot k8s-ci-robot added the area/bootstrap Issues or PRs related to bootstrap providers label May 17, 2020
@fabriziopandini
Copy link
Member

/remove-kind bug
/triage support

@kanwar-saad I have tried to reproduce locally but it is working just fine for me:

Only one master gets provisioned at first

kubectl get machines
NAME                   PROVIDERID   PHASE
test2-controlplane-0                Pending
test2-controlplane-1                Pending
test2-controlplane-2                Provisioning

After the first master is up, the 2nd and 2rd start provisioning

kubectl get machines
NAME                   PROVIDERID                              PHASE
test2-controlplane-0                                           Provisioning
test2-controlplane-1                                           Provisioning
test2-controlplane-2   docker:////test2-test2-controlplane-2   Running

At the end I get 3 master up and running (NotReady is because I have not installed CNI)

kubectl --kubeconfig /etc/kubernetes/admin.conf get nodes
NAME                         STATUS     ROLES    AGE    VERSION
test2-test2-controlplane-0   NotReady   master   65s    v1.17.0
test2-test2-controlplane-1   NotReady   master   64s    v1.17.0
test2-test2-controlplane-2   NotReady   master   116s   v1.17.0

The only thing I can notice in your logs is

E0516 17:08:50.585863       1 kubeadmconfig_controller.go:374] controllers/KubeadmConfig "msg"="failed to store bootstrap data" "error"="failed to create bootstrap data secret for KubeadmConfig ccd-ovb-60-capi/ccd-ovb-60-cp-m0: secrets \"ccd-ovb-60-cp-m0\" already exists" "kind"="Machine" "kubeadmconfig"={"Namespace":"ccd-ovb-60-capi","Name":"ccd-ovb-60-cp-m0"} "name"="ccd-ovb-60-cp-m0" "version"="2830" 
...
E0516 17:09:01.751909       1 kubeadmconfig_controller.go:374] controllers/KubeadmConfig "msg"="failed to store bootstrap data" "error"="failed to create bootstrap data secret for KubeadmConfig ccd-ovb-60-capi/ccd-ovb-60-cp-m1: secrets \"ccd-ovb-60-cp-m1\" already exists" "kind"="Machine" "kubeadmconfig"={"Namespace":"ccd-ovb-60-capi","Name":"ccd-ovb-60-cp-m1"} "name"="ccd-ovb-60-cp-m1" "version"="2925" 
....
E0516 17:09:13.756520       1 kubeadmconfig_controller.go:374] controllers/KubeadmConfig "msg"="failed to store bootstrap data" "error"="failed to create bootstrap data secret for KubeadmConfig ccd-ovb-60-capi/ccd-ovb-60-cp-m2: secrets \"ccd-ovb-60-cp-m2\" already exists" "kind"="Machine" "kubeadmconfig"={"Namespace":"ccd-ovb-60-capi","Name":"ccd-ovb-60-cp-m2"} "name"="ccd-ovb-60-cp-m2" "version"="3008" 

And this makes me to think that somehow you are not starting from a clean state. Could you make a test ensuring all the secrets are removed at first?

@k8s-ci-robot k8s-ci-robot added kind/support Categorizes issue or PR as a support question. and removed kind/bug Categorizes issue or PR as related to a bug. labels May 18, 2020
@vincepri
Copy link
Member

This bug should have been fixed in v0.3.6, specifically in #3002

@vincepri
Copy link
Member

/milestone Next
/priority awaiting-more-evidence

@k8s-ci-robot k8s-ci-robot added this to the Next milestone May 19, 2020
@k8s-ci-robot k8s-ci-robot added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label May 19, 2020
@fabriziopandini
Copy link
Member

@vincepri this issue is referred to a cluster without KCP, so I don't think #3002 is relevant here

@furkatgofurov7
Copy link
Member

@fabriziopandini true, it is referred to a cluster without KCP in this case.

@vincepri
Copy link
Member

Apologies, I missed the without in the description

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 17, 2020
@fabriziopandini
Copy link
Member

/close
given there are no updates about this issue for some time now
feel free to re-open if necessary

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closing this issue.

In response to this:

/close
given there are no updates about this issue for some time now
feel free to re-open if necessary

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bootstrap Issues or PRs related to bootstrap providers kind/support Categorizes issue or PR as a support question. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Projects
None yet
Development

No branches or pull requests

6 participants