Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix queue-processor rbn draining by default #478

Merged
merged 4 commits into from
Aug 17, 2021

Conversation

bwagner5
Copy link
Contributor

Issue #, if available:
N/A

Description of changes:

  • Queue-Processor mode was only cordoning nodes when a Rebalance Recommendation was received. The default behavior should be to cordon and drain when in queue-processor mode.
  • Properly formatted the function (sorry on the diff), the only change is the in the if condition on line 331

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@bwagner5 bwagner5 requested a review from haugenj August 16, 2021 16:55
Copy link
Contributor

@brycahta brycahta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update CLI help text to align with ReadMe

@@ -80,9 +80,11 @@ IMDS Processor Mode allows for a fine-grained configuration of IMDS paths that a
- `enableRebalanceMonitoring`
- `enableScheduledEventDraining`

By default, IMDS mode will only Cordon in response to a Rebalance Recommendation event (all other events are Cordoned and Drained). Cordon is the default for a rebalance event because it's not known if an ASG is being utilized and if that ASG is configured to replace the instance on a rebalance event. If you are using an ASG w/ rebalance recommendations enabled, then you can set the `enableRebalanceDraining` flag to true to perform a Cordon and Drain when a rebalance event is received.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very clear 👍

I think adding another row for this "configuration granularity" would also be helpful in the feature table. Not required for this PR so sending 🚀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back-and-forth on adding to the feature table. I think it actually makes sense to leave it off. The configuration granularity is available in queue-processor mode as well, you just have to do it via the EventBridge rules rather than command line flags to NTH. For example, you could not create an EventBridge rule for rebalance recommendations and have rebalance recs turned off on your ASG to ignore those. The cordon only action isn't possible on its own, but if cordon only logic is desired for a specific purpose, it would be fairly easy to setup with lifecycle hook timeouts and increasing the pod termination grace period. It's probably never optimal to cordon-only without another system taking further action at some point. We only do it for safety so that you don't drain workloads that are not going to be backfilled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah so just change the row name to helm configuration granularity, but actually these are fair points 👍 I'm good leaving it off

brycahta
brycahta previously approved these changes Aug 17, 2021
Copy link
Contributor

@brycahta brycahta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -153,7 +153,7 @@ GET_ATTRS_SQS_CMD="awslocal sqs get-queue-attributes --queue-url ${queue_url} --

cordoned=0
tainted=0
not_evicted=0
evicted=0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

brycahta
brycahta previously approved these changes Aug 17, 2021
Copy link
Contributor

@haugenj haugenj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants