Skip to content

Commit

Permalink
AZ Rebalance Recommendation Document Fix (#755)
Browse files Browse the repository at this point in the history
  • Loading branch information
LikithaVemulapalli authored Jan 27, 2023
1 parent 57b3a3d commit c27b6e7
Showing 1 changed file with 5 additions and 2 deletions.
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@

## Project Summary

This project ensures that the Kubernetes control plane responds appropriately to events that can cause your EC2 instance to become unavailable, such as [EC2 maintenance events](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-instances-status-check_sched.html), [EC2 Spot interruptions](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html), [ASG Scale-In](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroupLifecycle.html#as-lifecycle-scale-in), [ASG AZ Rebalance](https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-benefits.html#AutoScalingBehavior.InstanceUsage), and EC2 Instance Termination via the API or Console. If not handled, your application code may not stop gracefully, take longer to recover full availability, or accidentally schedule work to nodes that are going down.
This project ensures that the Kubernetes control plane responds appropriately to events that can cause your EC2 instance to become unavailable, such as [EC2 maintenance events](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-instances-status-check_sched.html), [EC2 Spot interruptions](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html), [ASG Scale-In](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroupLifecycle.html#as-lifecycle-scale-in), ASG AZ Rebalance, and EC2 Instance Termination via the API or Console. If not handled, your application code may not stop gracefully, take longer to recover full availability, or accidentally schedule work to nodes that are going down.

The aws-node-termination-handler (NTH) can operate in two different modes: Instance Metadata Service (IMDS) or the Queue Processor.

Expand Down Expand Up @@ -65,7 +65,7 @@ Must be deployed as a Kubernetes **Deployment**. Also requires some **additional
- Instance Rebalance Recommendations
- ASG Termination Lifecycle Hooks to handle the following:
- [ASG Scale-In](https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks.html)
- [Availability Zone Rebalance](https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-capacity-rebalancing.html)
- [Availability Zone Rebalance](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-instance-termination.html#:~:text=are%20replaced%20first.-,Availability%20Zone%20rebalancing,-Amazon%20EC2%20Auto)
- [Unhealthy Instances](https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-health-checks.html), and more
- [Instance State Change events](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-instance-state-changes.html)

Expand All @@ -75,6 +75,7 @@ Must be deployed as a Kubernetes **Deployment**. Also requires some **additional
| Spot Instance Termination Notifications (ITN) |||
| Scheduled Events |||
| Instance Rebalance Recommendation |||
| AZ Rebalance Recommendation |||
| ASG Termination Lifecycle Hooks |||
| Instance State Change Events |||

Expand All @@ -90,6 +91,8 @@ IMDS Processor Mode allows for a fine-grained configuration of IMDS paths that a

By default, IMDS mode will only Cordon in response to a Rebalance Recommendation event (all other events are Cordoned and Drained). Cordon is the default for a rebalance event because it's not known if an ASG is being utilized and if that ASG is configured to replace the instance on a rebalance event. If you are using an ASG w/ rebalance recommendations enabled, then you can set the `enableRebalanceDraining` flag to true to perform a Cordon and Drain when a rebalance event is received.

Rebalance Recommendation is an early indicator to notify the Spot Instances that they can be interrupted soon. Node Termination Handler supports AZ Rebalance Recommendation only in Queue Processor mode using ASG Lifecycle Hooks. For AZ rebalances the instances are just terminated, using Lifecycle Hooks and EventBridge rule for `EC2 Instance-terminate Lifecycle Action` we can handle OD Instances.

The `enableSqsTerminationDraining` must be set to false for these configuration values to be considered.

The Queue Processor Mode does not allow for fine-grained configuration of which events are handled through helm configuration keys. Instead, you can modify your Amazon EventBridge rules to not send certain types of events to the SQS Queue so that NTH does not process those events. All events when operating in Queue Processor mode are Cordoned and Drained unless the `cordon-only` flag is set to true.
Expand Down

0 comments on commit c27b6e7

Please sign in to comment.