-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turn automatic-recovery
off by default
#94
Comments
automatic-recovery
offautomatic-recovery
off by default
Initial discussion could be found here https://sap-ti.slack.com/archives/CBVQLMS6N/p1659345327978849 It needs to be tested whether this kind of scenario could actually happen |
Another question is : |
@D063648 do you still find this feature relevant ? |
Grooming Decision:- Check if auto-recovery is progressing (Research is needed to see if this is possible). If yes, relook at the health timeout to allow the instance to recover. |
This may not be required if we introduce gardener/machine-controller-manager#755. |
What would you like to be added:
MCM should by default turn the automatic recovery(https://aws.amazon.com/about-aws/whats-new/2022/03/amazon-ec2-default-automatic-recovery/) off for an instance.
automatic-recovery
is a feature offered by AWS which will recover the instance on a new host in case of host failure, with the same instance id , volume attached.Why is this needed:
Currently MCM itself has a health check mechanism where it terminates a machine in case its unhealthy(kubelet not responding or some other conditions) for over
healthTimeout
(by default 10 min). This means we have two health check actions which could race against each other.If AWS autorecovery , happens before health-check , then its fine
But if it takes longer time (means the instance is still in transfer mode from one instance to other, volumes are detaching) then MCM recovery would kick in and it'll delete the instance on new host to start a new instance all together , leading to detachment of volumes again and a longer recovery which is undesirable.
The text was updated successfully, but these errors were encountered: