Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added uniformAcrossAzUpdate strategy #27

Merged
merged 11 commits into from
Dec 7, 2019
Merged

Added uniformAcrossAzUpdate strategy #27

merged 11 commits into from
Dec 7, 2019

Conversation

narayanan
Copy link
Contributor

@narayanan narayanan commented Nov 29, 2019

  • Externalized the node selectors behind an interface

  • Refactored the restack logic to leverage node selectors to get a list of instances to restack on each iteration

  • Added a strategy for RandomUpdate

  • Added a new strategy to perform updates across AZ in same proportion. This new strategy picks either same number of nodes or same percentage of nodes from each AZ for updating in each iteration.

  • Added unit tests - 80% coverage for controllers

go test ./api/... ./controllers/... -coverprofile cover.out
ok      github.com/keikoproj/upgrade-manager/api/v1alpha1       5.145s  coverage: 1.4% of statements
ok      github.com/keikoproj/upgrade-manager/controllers        16.179s coverage: 80.9% of statements
  • Tested the strategy by performing Random & rolling update in a k8s cluster

  • Sample RollingUpgrade object

apiVersion: upgrademgr.keikoproj.io/v1alpha1
kind: RollingUpgrade
metadata:
  name: rollingupgrade-nodes
  namespace: kube-system
spec:
  asgName: nodes
  nodeIntervalSeconds: 300
  postDrain: {}
  postDrainDelaySeconds: 90
  postTerminate: {}
  preDrain: {}
  region: us-west-2
  strategy:
    drainTimeout: 180
    maxUnavailable: 100%
    type: uniformAcrossAzUpdate

@claassistantio
Copy link

claassistantio commented Nov 29, 2019

CLA assistant check
All committers have signed the CLA.

@shrinandj
Copy link
Collaborator

@narayanan: This is awesome! Thanks a lot for implementing this feature. Certainly makes upgrades more flexible.

- `strategy.type`: This field is optional and currently only random update strategy is supported. Refer to [random_update_strategy.yaml](examples/random_update_strategy.yaml) for sample custom resource definition.
- `strategy.type`: This field is optional and currently two strategies are supported
- `randomUpdate` - Default is type is not specified. Picks nodes randomly for updating. Refer to [random_update_strategy.yaml](examples/random_update_strategy.yaml) for sample custom resource definition.
- `uniformAcrossAzUpdate` - Picks same number of nodes or same percentage of nodes from each AZ for update. Refer to [uniform_across_az_update_strategy.yaml](examples/uniform_across_az_update_strategy.yaml) for sample custom resource definition.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior when the strategy.type is uniformAcrossAzUpdate and maxUnavailable is say 3? Does it mean that there will be 3 nodes per az that will be taken down at a time or will it be 1 node per az (assuming there are 3 zones).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 nodes per AZ will be taken down. So if there are 3 AZ's, total a max of 9 nodes will be restacked on each iteration

func getMaxUnavailable(strategy upgrademgrv1alpha1.UpdateStrategy, totalNodes int) int {
// Below are the constants set in intstr package
// const (
// Int Type = iota // The IntOrString holds an int.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these comments necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were there already, I moved the method from rolling_upgrade_controller,go to helpers.go as rolling_upgrade_controller,go grew to 1000+ lines.

I can remove them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

// maxUnavailable has to be atleast 1 when there are nodes in the ASG
if totalNodes > 0 && maxUnavailable < 1 {
maxUnavailable = 1
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth logging what the computed value of maxUnavailable is. This would help debugging.

Copy link
Contributor Author

@narayanan narayanan Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done... added log messages from where this method is getting called, decided add the logging at the calling place as callers have more relevant context and can be logged with more relevant details.

@@ -0,0 +1,174 @@
package controllers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job on writing these tests! Fabulous!!

@shrinandj shrinandj merged commit 96e2f50 into keikoproj:master Dec 7, 2019
@shrinandj
Copy link
Collaborator

Thanks for your contributions @narayanan!

kianjones4 added a commit to kianjones4/upgrade-manager that referenced this pull request Dec 9, 2019
* check error message instead of code for instance not found

* fix test

* Fixes for autoscaling API changes (keikoproj#26)

* Add semaphore.yml

* build

* build

* squashed commits

* remove badge

* readme

* test

* pr

* fix kafka issue

* update to fix potential throttling issue

* Mod fix (#7)

* revert go sum

* fix mod build

* add go mod files for logger

* Fix terminate error (#8)

* check error message instead of code for instance not found

* fix test

* add build badge

* fix image

* Release v0.2 (keikoproj#28)

* Bump version to 0.3-dev. (keikoproj#29)

* Fix keikoproj#30 and release v0.3 (keikoproj#31)

Testing Done:

- Verified that the new docker image can be built with aws-sdk v1.25.0
- Verified that rolling upgrade actually completed.

* Bump version to 0.4-dev. (keikoproj#32)

* Added uniformAcrossAzUpdate strategy (keikoproj#27)

* Modified ClusterState to capture AZ details of a node

* - Added UniformAcrossAzUpdate strategy
- Added Unit tests (More to come)

* Added Unit tests

* Updated validation

* Added uniformAcrossAzUpdate update strategy sample

* Updated README

* Externalize node selectors behind interface

* Removed unnecessary comments

* Log maxUnavailable value
shrinandj pushed a commit that referenced this pull request Dec 19, 2019
* Add semaphore.yml

* build

* build

* squashed commits

* remove badge

* readme

* test

* pr

* fix kafka issue

* update to fix potential throttling issue

* Mod fix (#7)

* revert go sum

* fix mod build

* add go mod files for logger

* Fix terminate error (#8)

* check error message instead of code for instance not found

* fix test

* Add badge (#9)

* check error message instead of code for instance not found

* fix test

* Fixes for autoscaling API changes (#26)

* Add semaphore.yml

* build

* build

* squashed commits

* remove badge

* readme

* test

* pr

* fix kafka issue

* update to fix potential throttling issue

* Mod fix (#7)

* revert go sum

* fix mod build

* add go mod files for logger

* Fix terminate error (#8)

* check error message instead of code for instance not found

* fix test

* add build badge

* fix image

* Release v0.2 (#28)

* Bump version to 0.3-dev. (#29)

* Fix #30 and release v0.3 (#31)

Testing Done:

- Verified that the new docker image can be built with aws-sdk v1.25.0
- Verified that rolling upgrade actually completed.

* Bump version to 0.4-dev. (#32)

* Added uniformAcrossAzUpdate strategy (#27)

* Modified ClusterState to capture AZ details of a node

* - Added UniformAcrossAzUpdate strategy
- Added Unit tests (More to come)

* Added Unit tests

* Updated validation

* Added uniformAcrossAzUpdate update strategy sample

* Updated README

* Externalize node selectors behind interface

* Removed unnecessary comments

* Log maxUnavailable value

* Add badge (#10)

* check error message instead of code for instance not found

* fix test

* add build badge

* fix image

* update readme

* readme

* Add badge (#11)

* check error message instead of code for instance not found

* fix test

* add build badge

* fix image

* update readme

* fix mod
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants