Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pivot failing following Getting Started guide in docs #809

Closed
aaroniscode opened this issue Jun 5, 2019 · 5 comments · Fixed by #813
Closed

Pivot failing following Getting Started guide in docs #809

aaroniscode opened this issue Jun 5, 2019 · 5 comments · Fixed by #813
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@aaroniscode
Copy link
Contributor

/kind bug

What steps did you take and what happened:
[A clear and concise description of what the bug is.]
Using v0.3.0 templates, trying to spin up a cluster following the getting started guide.

clusterctl create cluster -v 4 \
  --bootstrap-type kind \
  --bootstrap-cluster-cleanup=false \
  --provider aws \
  -m ./aws/out/machines.yaml \
  -c ./aws/out/cluster.yaml \
  -p ./aws/out/provider-components.yaml \
  -a ./aws/out/addons.yaml

Output on the command line from clusterctl

I0605 12:34:28.834463   69576 clusterclient.go:965] Waiting for Cluster v1alpha resources to be listable...
I0605 12:34:28.917959   69576 pivot.go:93] Parsing list of cluster-api controllers from provider components
I0605 12:34:28.918019   69576 decoder.go:224] decoding stream as YAML
I0605 12:34:28.957690   69576 pivot.go:101] Scaling down controller aws-provider-system/aws-provider-controller-manager
I0605 12:34:29.028028   69576 pivot.go:101] Scaling down controller cluster-api-system/cluster-api-controller-manager
I0605 12:34:29.268288   69576 pivot.go:107] Retrieving list of MachineClasses to move
I0605 12:34:29.291255   69576 pivot.go:212] Preparing to copy MachineClasses: []
I0605 12:34:29.291292   69576 pivot.go:117] Retrieving list of Clusters to move
I0605 12:34:29.352697   69576 pivot.go:171] Preparing to move Clusters: [capa]
I0605 12:34:29.352742   69576 pivot.go:234] Moving Cluster default/capa
I0605 12:34:29.352753   69576 pivot.go:236] Ensuring namespace "default" exists on target cluster
I0605 12:34:29.814684   69576 pivot.go:247] Retrieving list of MachineDeployments to move for Cluster default/capa
I0605 12:34:29.850429   69576 pivot.go:287] Preparing to move MachineDeployments: []
I0605 12:34:29.850462   69576 pivot.go:256] Retrieving list of MachineSets not associated with a MachineDeployment to move for Cluster default/capa
I0605 12:34:29.898478   69576 pivot.go:331] Preparing to move MachineSets: []
I0605 12:34:29.898533   69576 pivot.go:265] Retrieving list of Machines not associated with a MachineSet to move for Cluster default/capa
I0605 12:34:29.927399   69576 pivot.go:374] Preparing to move Machines: [controlplane-0]
I0605 12:34:29.927428   69576 pivot.go:385] Moving Machine default/controlplane-0
I0605 12:34:30.423144   69576 clusterclient.go:986] Waiting for Machine controlplane-0 to become ready...
I0605 12:34:30.721315   69576 pivot.go:399] Successfully moved Machine default/controlplane-0
I0605 12:34:30.912357   69576 pivot.go:278] Successfully moved Cluster default/capa
I0605 12:34:30.912380   69576 pivot.go:127] Retrieving list of MachineDeployments not associated with a Cluster to move
I0605 12:34:30.921310   69576 pivot.go:287] Preparing to move MachineDeployments: []
I0605 12:34:30.921349   69576 pivot.go:136] Retrieving list of MachineSets not associated with a MachineDeployment or a Cluster to move
I0605 12:34:30.937735   69576 pivot.go:331] Preparing to move MachineSets: []
I0605 12:34:30.937778   69576 pivot.go:145] Retrieving list of Machines not associated with a MachineSet or a Cluster to move
I0605 12:34:31.124769   69576 pivot.go:374] Preparing to move Machines: [controlplane-0]
I0605 12:34:31.124831   69576 pivot.go:385] Moving Machine default/controlplane-0
F0605 12:34:33.172794   69576 create_cluster.go:61] unable to pivot cluster api stack to target cluster: unable to pivot cluster API objects: failed to move Machine default:controlplane-0: error copying Machine default/controlplane-0 to target cluster: error creating a machine object in namespace default: machines.cluster.k8s.io "controlplane-0" already exists

Last few lines of the logs from the CAPA controller on the KIND cluster

I0605 19:31:57.429471       1 securitygroups.go:48] [cluster-actuator]/cluster.k8s.io/v1alpha1/default/capa "level"=2 "msg"="Reconciling security groups"
I0605 19:31:57.767515       1 securitygroups.go:119] [cluster-actuator]/cluster.k8s.io/v1alpha1/default/capa "level"=2 "msg"="Revoked ingress rules from security group"  "revoked-ingress-rules"=[{"description":"IP-in-IP (calico)","protocol":"4","fromPort":0,"toPort":0,"cidrBlocks":null,"sourceSecurityGroupIds":["sg-020a010ee2a52dfc2","sg-026369e80e07cee84"]}] "security-group-id"="sg-026369e80e07cee84"
I0605 19:31:58.004553       1 securitygroups.go:128] [cluster-actuator]/cluster.k8s.io/v1alpha1/default/capa "level"=2 "msg"="Authorized ingress rules in security group"  "authorized-ingress-rules"=[{"description":"IP-in-IP (calico)","protocol":"4","fromPort":-1,"toPort":65535,"cidrBlocks":null,"sourceSecurityGroupIds":["sg-020a010ee2a52dfc2","sg-026369e80e07cee84"]}] "security-group-id"="sg-026369e80e07cee84"
I0605 19:31:58.230619       1 securitygroups.go:119] [cluster-actuator]/cluster.k8s.io/v1alpha1/default/capa "level"=2 "msg"="Revoked ingress rules from security group"  "revoked-ingress-rules"=[{"description":"IP-in-IP (calico)","protocol":"4","fromPort":0,"toPort":0,"cidrBlocks":null,"sourceSecurityGroupIds":["sg-020a010ee2a52dfc2","sg-026369e80e07cee84"]}] "security-group-id"="sg-020a010ee2a52dfc2"
I0605 19:31:58.469504       1 securitygroups.go:128] [cluster-actuator]/cluster.k8s.io/v1alpha1/default/capa "level"=2 "msg"="Authorized ingress rules in security group"  "authorized-ingress-rules"=[{"description":"IP-in-IP (calico)","protocol":"4","fromPort":-1,"toPort":65535,"cidrBlocks":null,"sourceSecurityGroupIds":["sg-020a010ee2a52dfc2","sg-026369e80e07cee84"]}] "security-group-id"="sg-020a010ee2a52dfc2"
I0605 19:31:58.475288       1 network.go:53] [cluster-actuator]/cluster.k8s.io/v1alpha1/default/capa "level"=2 "msg"="Reconcile network completed successfully"
I0605 19:31:58.475395       1 bastion.go:45] [cluster-actuator]/cluster.k8s.io/v1alpha1/default/capa "level"=2 "msg"="Reconciling bastion host"
I0605 19:31:58.655907       1 bastion.go:74] [cluster-actuator]/cluster.k8s.io/v1alpha1/default/capa "level"=2 "msg"="Reconcile bastion completed successfully"
I0605 19:31:58.656346       1 loadbalancer.go:38] [cluster-actuator]/cluster.k8s.io/v1alpha1/default/capa "level"=2 "msg"="Reconciling load balancers"
I0605 19:31:59.140865       1 loadbalancer.go:67] [cluster-actuator]/cluster.k8s.io/v1alpha1/default/capa "level"=2 "msg"="Reconcile load balancers completed successfully"
I0605 19:31:59.150282       1 scope.go:217] [cluster-actuator]/cluster.k8s.io/v1alpha1/default/capa "level"=1 "msg"="updating cluster status"
rpc error: code = Unknown desc = an error occurred when try to find container "eabc59aea5e8ac7f01e848e183a83830bc16d94192ad0abd9eff21a6bc10b29e": does not exist

What did you expect to happen:
Cluster to come up correctly.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
I am not using released clusterctl binaries. I built from source on the master branch.
snippet from git log

commit c85586c59121d5a9ac89c589c919a73722b5af3f (HEAD -> master, origin/master, origin/HEAD)
Author: Jason DeTiberus <[email protected]>
Date:   Tue Jun 4 17:43:53 2019 -0400

    Build and update images for k8s v1.14.2 (#805)
clusterctl version
Version Info: GitReleaseTag: "v0.3.0", MajorVersion: "0", MinorVersion:"3", GitReleaseCommit:"359991ec850866", GitTreeState:"clean"

Environment:

  • Cluster-api-provider-aws version: v0.3.0
  • Kubernetes version: (use kubectl version): v1.14.2 (on KIND)
  • OS (e.g. from /etc/os-release):
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 5, 2019
@aaroniscode
Copy link
Contributor Author

Couple updates:

  1. I was able to recreate running on the clusterctl v0.3.0 binary on the github release page. Just to rule out an issue with my custom build.

  2. This looks almost exactly like Pivot fails: machines.cluster.k8s.io "controlplane-0" already exists cluster-api#815, but there was a fix that was merged.

@aaroniscode
Copy link
Contributor Author

aaroniscode commented Jun 6, 2019

I think I found the root cause -- a race condition in clusterctl during the pivot phase. It appears that this was fixed in kubernetes-sigs/cluster-api#944.

This fix is in cluster-api master branch, but CAPA is using the release-0.1 branch.

@detiber
Copy link
Member

detiber commented Jun 6, 2019

I'm working on backporting the fix (and other recent fixes) to the release-0.1 branch now.

/assign

@detiber
Copy link
Member

detiber commented Jun 6, 2019

Backport PR is here: kubernetes-sigs/cluster-api#986

After that merges, a PR to this repo will need to be done to update the vendored dependency.

@detiber
Copy link
Member

detiber commented Jun 7, 2019

I'm waiting to create the PR to update the dependency until kubernetes-sigs/cluster-api#989 merges as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants