We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When an eks-rolling-update job failes, the previous cluster state is not automatically recovered, instead requiring manual intervention:
│ iris-devops-rolling-node-update-manual-42b-hrvhd 2023-10-09 10:25:13,138 INFO InstanceId i-026bce300ffa7d8d0 is node ip-10-208-33-228.eu-central-1.compute.internal in kubernetes land │ │ iris-devops-rolling-node-update-manual-42b-hrvhd 2023-10-09 10:25:13,138 INFO Draining worker node with kubectl drain ip-10-208-33-228.eu-central-1.compute.internal --ignore-daemonsets --delete-emptydir-data --timeout=300s... │ │ iris-devops-rolling-node-update-manual-42b-hrvhd node/ip-10-208-33-228.eu-central-1.compute.internal already cordoned │ │ iris-devops-rolling-node-update-manual-42b-hrvhd error: unable to drain node "ip-10-208-33-228.eu-central-1.compute.internal" due to error:cannot delete Pods declare no controller (use --force to override): gitlab-runner/runner-8e3ydb │ │ hn-project-1998-concurrent-8-ek1wq7qg, continuing command... │ │ iris-devops-rolling-node-update-manual-42b-hrvhd There are pending nodes to be drained: │ │ iris-devops-rolling-node-update-manual-42b-hrvhd ip-10-208-33-228.eu-central-1.compute.internal │ │ iris-devops-rolling-node-update-manual-42b-hrvhd cannot delete Pods declare no controller (use --force to override): gitlab-runner/runner-8e3ydbhn-project-1998-concurrent-8-ek1wq7qg │ │ iris-devops-rolling-node-update-manual-42b-hrvhd 2023-10-09 10:25:13,990 INFO Node not drained properly. Exiting │ │ iris-devops-rolling-node-update-manual-42b-hrvhd 2023-10-09 10:25:13,990 ERROR ('Rolling update on ASG failed', 'ci-runner-kas-20230710121010942300000012') │ │ iris-devops-rolling-node-update-manual-42b-hrvhd 2023-10-09 10:25:13,990 ERROR *** Rolling update of ASG has failed. Exiting *** │ │ iris-devops-rolling-node-update-manual-42b-hrvhd 2023-10-09 10:25:13,990 ERROR AWS Auto Scaling Group processes will need resuming manually │ │ iris-devops-rolling-node-update-manual-42b-hrvhd 2023-10-09 10:25:13,990 ERROR Kubernetes Cluster Autoscaler will need resuming manually │
Most notably, auto-scaling will be scaled to 0. This is an issue as our workloads (especially CI) heavily depend on functioning auto-scaling.
The text was updated successfully, but these errors were encountered:
These exceptions should cause an uncordon on the affected nodes:
https://github.com/deinstapel/eks-rolling-update/blob/master/eksrollup/lib/k8s.py#L195-L198
@martin31821 please look into it. Thx 🙂
Sorry, something went wrong.
martin31821
Successfully merging a pull request may close this issue.
When an eks-rolling-update job failes, the previous cluster state is not automatically recovered, instead requiring manual intervention:
Most notably, auto-scaling will be scaled to 0. This is an issue as our workloads (especially CI) heavily depend on functioning auto-scaling.
The text was updated successfully, but these errors were encountered: