Turn `automatic-recovery` off by default #94

himanshu-kun · 2022-08-12T05:14:55Z

What would you like to be added:
MCM should by default turn the automatic recovery(https://aws.amazon.com/about-aws/whats-new/2022/03/amazon-ec2-default-automatic-recovery/) off for an instance. automatic-recovery is a feature offered by AWS which will recover the instance on a new host in case of host failure, with the same instance id , volume attached.

Why is this needed:
Currently MCM itself has a health check mechanism where it terminates a machine in case its unhealthy(kubelet not responding or some other conditions) for over healthTimeout(by default 10 min). This means we have two health check actions which could race against each other.
If AWS autorecovery , happens before health-check , then its fine
But if it takes longer time (means the instance is still in transfer mode from one instance to other, volumes are detaching) then MCM recovery would kick in and it'll delete the instance on new host to start a new instance all together , leading to detachment of volumes again and a longer recovery which is undesirable.

The text was updated successfully, but these errors were encountered:

himanshu-kun · 2022-08-12T05:17:12Z

Initial discussion could be found here https://sap-ti.slack.com/archives/CBVQLMS6N/p1659345327978849

It needs to be tested whether this kind of scenario could actually happen

himanshu-kun · 2022-08-12T05:19:48Z

Another question is :
Whether to keep the parameter configurable by the user. MCM health recovery is currently enough though, but there could be scenarios where customer would want to depend on the AWS recovery method

himanshu-kun · 2023-02-27T09:06:49Z

@D063648 do you still find this feature relevant ?

rishabh-11 · 2024-07-03T06:35:04Z

Grooming Decision:-

Check if auto-recovery is progressing (Research is needed to see if this is possible). If yes, relook at the health timeout to allow the instance to recover.

ashwani2k · 2024-10-16T05:35:49Z

This may not be required if we introduce gardener/machine-controller-manager#755.
@rishabh-11 to take a call if we need to do this here or we can close this in favor of React faster if VM instance is gone (i.e. don’t wait until full machineHealthTimeout/machineCreationTimeout lapses) #755

himanshu-kun added the kind/enhancement Enhancement, improvement, extension label Aug 12, 2022

himanshu-kun changed the title ~~Turn automatic-recovery off~~ Turn automatic-recovery off by default Aug 12, 2022

himanshu-kun added kind/discussion Discussion (enaging others in deciding about multiple options) kind/test Test area/performance Performance (across all domains, such as control plane, networking, storage, etc.) related priority/2 Priority (lower number equals higher priority) labels Aug 12, 2022

gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Feb 8, 2023

elankath mentioned this issue Feb 27, 2023

Improve Monitoring/Alerting/Metrics gardener/machine-controller-manager#211

Open

7 tasks

gardener-robot added lifecycle/rotten Nobody worked on this for 12 months (final aging stage) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Nov 6, 2023

rishabh-11 assigned unmarshall Jul 3, 2024

rishabh-11 removed lifecycle/rotten Nobody worked on this for 12 months (final aging stage) needs/planning Needs (more) planning with other MCM maintainers labels Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Turn `automatic-recovery` off by default #94

Turn `automatic-recovery` off by default #94

himanshu-kun commented Aug 12, 2022 •

edited

Loading

himanshu-kun commented Aug 12, 2022

himanshu-kun commented Aug 12, 2022

himanshu-kun commented Feb 27, 2023

rishabh-11 commented Jul 3, 2024

ashwani2k commented Oct 16, 2024

Turn automatic-recovery off by default #94

Turn automatic-recovery off by default #94

Comments

himanshu-kun commented Aug 12, 2022 • edited Loading

himanshu-kun commented Aug 12, 2022

himanshu-kun commented Aug 12, 2022

himanshu-kun commented Feb 27, 2023

rishabh-11 commented Jul 3, 2024

ashwani2k commented Oct 16, 2024

Turn `automatic-recovery` off by default #94

Turn `automatic-recovery` off by default #94

himanshu-kun commented Aug 12, 2022 •

edited

Loading