-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MachinePool remains in WaitingForReplicasReady because CAPA does not reconcile node references after instance refresh #4618
Comments
This issue is currently awaiting triage. If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
In your opinion, is this resolved by the MachinePools Machines implementation in CAPA? If not, what is missing that would need to be added? |
@cnmcavoy I think your PR is separate from this issue. CAPI reconciles based on |
Correct... I agree that this isn't solved by #4527. My understanding is that the solution requires a way to detect any change in the status of an ASGs instances and trigger a new reconcile of the AWSMachinePool. One approach would be to implement this ontop of the work in #4527 and have the AWSMachine enqueue their AWSMachinePool when their status changes. Alternatively, another approach would be to use the AWS events and set up the resources to receive those. I believe there is a way to have AWS send something when the ASG changes. |
A bulletproof solution would be to reconcile every 1-5 minutes (configurable?!) for There's Amazon EventBridge, but it can mainly perform actions in other AWS services, so I'm not sure if it could trigger a call to a controller webhook in order for it to reconcile. I like the idea of observing the If we don't have a clear idea, should we first fix the low-hanging fruit and use a regular reconciliation interval ( |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
#5174 may fix this, given that it ensures updating the instances/nodes list regularly /remove-lifecycle rotten |
/kind bug
What steps did you take and what happened:
Related to kubernetes-sigs/cluster-api#8858, #4071
CAPA's
AWSMachinePool
reconciler unconditionally returnsreturn ctrl.Result{}, r.reconcileNormal(ctx, machinePoolScope, infraScope, infraScope)
, i.e. does not schedule reconciliation of the ASG's EC2 instances into.Status.Instances
at regular intervals.I made a change where CAPA triggers an instance refresh (e.g. change of AMI IDs), rolling out new EC2 instances. The parent
MachinePool
object remained in non-ready state with reasonWaitingForReplicasReady
, with CAPI continuously loggingNodeRefs != ReadyReplicas
messages. Only the next, random reconciliation of myAWSMachinePool
object solves this by checking which instances exist in the ASG.What did you expect to happen:
CAPA should regularly reconcile in order to check the ASG for a changed set of instances. Particularly if it's expected because CAPA triggered an instance refresh.
Environment:
kubectl version
): v1.24.14/etc/os-release
): Flatcar LinuxThe text was updated successfully, but these errors were encountered: