Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added uniformAcrossAzUpdate strategy #27

Merged
merged 11 commits into from
Dec 7, 2019
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@ RollingUpgrade provides a Kubernetes native mechanism for doing rolling-updates
- `postDrain.postWaitScript`: The script to run after the node is drained and the waitSeconds have passed. This can be used for ensuring that the drained pods actually were able to start elsewhere.
- `nodeIntervalSeconds`: The amount of time in seconds to wait after each node in the ASG is terminated.
- `postTerminate.script`: Optional bash script to execute after the node has terminated.
- `strategy.type`: This field is optional and currently only random update strategy is supported. Refer to [random_update_strategy.yaml](examples/random_update_strategy.yaml) for sample custom resource definition.
- `strategy.type`: This field is optional and currently two strategies are supported
- `randomUpdate` - Default is type is not specified. Picks nodes randomly for updating. Refer to [random_update_strategy.yaml](examples/random_update_strategy.yaml) for sample custom resource definition.
- `uniformAcrossAzUpdate` - Picks same number of nodes or same percentage of nodes from each AZ for update. Refer to [uniform_across_az_update_strategy.yaml](examples/uniform_across_az_update_strategy.yaml) for sample custom resource definition.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior when the strategy.type is uniformAcrossAzUpdate and maxUnavailable is say 3? Does it mean that there will be 3 nodes per az that will be taken down at a time or will it be 1 node per az (assuming there are 3 zones).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 nodes per AZ will be taken down. So if there are 3 AZ's, total a max of 9 nodes will be restacked on each iteration

- `strategy.maxUnavailable`: Optional field. The number of nodes that can be unavailable during rolling upgrade, can be specified as number of nodes or the percent of total number of nodes. Default is "1".
- `strategy.drainTimeout`: Optional field. Node will be terminated after drain timeout even if `kubectl drain` has not been completed and value has to be specified in seconds. Default is -1.

Expand Down
3 changes: 3 additions & 0 deletions api/v1alpha1/rollingupgrade_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,9 @@ const (
// RandomUpdate strategy treats all the Azs as a single unit and picks random nodes for update
RandomUpdateStrategy UpdateStrategyType = "randomUpdate"

// RandomUpdate strategy treats all the Azs as a single unit and picks random nodes for update
UniformAcrossAzUpdateStrategy UpdateStrategyType = "uniformAcrossAzUpdate"

// Other update strategies such as rolling update by Az or rolling update with a predifined instance list
// can be implemented in future by adding more update strategy types
)
Expand Down
72 changes: 72 additions & 0 deletions controllers/helpers.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
package controllers

import (
"github.com/aws/aws-sdk-go/service/autoscaling"
upgrademgrv1alpha1 "github.com/keikoproj/upgrade-manager/api/v1alpha1"
"log"
"strconv"
"strings"
)

// getMaxUnavailable calculates and returns the maximum unavailable nodes
// takes an update strategy and total number of nodes as input
func getMaxUnavailable(strategy upgrademgrv1alpha1.UpdateStrategy, totalNodes int) int {
maxUnavailable := 1
if strategy.MaxUnavailable.Type == 0 {
maxUnavailable = int(strategy.MaxUnavailable.IntVal)
} else if strategy.MaxUnavailable.Type == 1 {
strVallue := strategy.MaxUnavailable.StrVal
intValue, _ := strconv.Atoi(strings.Trim(strVallue, "%"))
maxUnavailable = int(float32(intValue) / float32(100) * float32(totalNodes))
}
// setting maxUnavailable to total number of nodes when maxUnavailable is greater than total node count
if totalNodes < maxUnavailable {
log.Printf("Reducing maxUnavailable count from %d to %d as total nodes count is %d",
maxUnavailable, totalNodes, totalNodes)
maxUnavailable = totalNodes
}
// maxUnavailable has to be atleast 1 when there are nodes in the ASG
if totalNodes > 0 && maxUnavailable < 1 {
maxUnavailable = 1
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth logging what the computed value of maxUnavailable is. This would help debugging.

Copy link
Contributor Author

@narayanan narayanan Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done... added log messages from where this method is getting called, decided add the logging at the calling place as callers have more relevant context and can be logged with more relevant details.

return maxUnavailable
}

// getNextAvailableInstances checks the cluster state store for the instance state
// and returns the next set of instances available for update
func getNextAvailableInstances(
asgName string,
numberOfInstances int,
instances []*autoscaling.Instance,
state ClusterState) []*autoscaling.Instance {
return getNextSetOfAvailableInstancesInAz(asgName, "", numberOfInstances, instances, state)
}

// getNextSetOfAvailableInstancesInAz checks the cluster state store for the instance state
// and returns the next set of instances available for update in the given AX
func getNextSetOfAvailableInstancesInAz(
asgName string,
azName string,
numberOfInstances int,
instances []*autoscaling.Instance,
state ClusterState,
) []*autoscaling.Instance {

var instancesForUpdate []*autoscaling.Instance
for instancesFound := 0; instancesFound < numberOfInstances; {
instanceId := state.getNextAvailableInstanceIdInAz(asgName, azName)
if len(instanceId) == 0 {
// All instances are updated, no more instance to update in this AZ
break
}

// check if the instance picked is part of ASG
for _, instance := range instances {
if *instance.InstanceId == instanceId {
instancesForUpdate = append(instancesForUpdate, instance)
instancesFound++
}
}
}
return instancesForUpdate
}
174 changes: 174 additions & 0 deletions controllers/helpers_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
package controllers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job on writing these tests! Fabulous!!


import (
"encoding/json"
"fmt"
"github.com/aws/aws-sdk-go/service/autoscaling"
upgrademgrv1alpha1 "github.com/keikoproj/upgrade-manager/api/v1alpha1"
"github.com/onsi/gomega"
"testing"
)

func TestGetMaxUnavailableWithPercentageValue(t *testing.T) {
g := gomega.NewGomegaWithT(t)

jsonString := "{ \"type\": \"randomUpdate\", \"maxUnavailable\": \"75%\", \"drainTimeout\": 15 }"
strategy := upgrademgrv1alpha1.UpdateStrategy{}
err := json.Unmarshal([]byte(jsonString), &strategy)
if err != nil {
fmt.Printf("Error occurred while unmarshalling strategy object, error: %s", err.Error())
}

g.Expect(getMaxUnavailable(strategy, 200)).To(gomega.Equal(150))
}

func TestGetMaxUnavailableWithPercentageValue33(t *testing.T) {
g := gomega.NewGomegaWithT(t)

jsonString := "{ \"type\": \"randomUpdate\", \"maxUnavailable\": \"67%\", \"drainTimeout\": 15 }"
strategy := upgrademgrv1alpha1.UpdateStrategy{}
err := json.Unmarshal([]byte(jsonString), &strategy)
if err != nil {
fmt.Printf("Error occurred while unmarshalling strategy object, error: %s", err.Error())
}

g.Expect(getMaxUnavailable(strategy, 3)).To(gomega.Equal(2))
}

func TestGetMaxUnavailableWithPercentageAndSingleInstance(t *testing.T) {
g := gomega.NewGomegaWithT(t)

totalNodes := 1
jsonString := "{ \"type\": \"randomUpdate\", \"maxUnavailable\": \"67%\", \"drainTimeout\": 15 }"
strategy := upgrademgrv1alpha1.UpdateStrategy{}
err := json.Unmarshal([]byte(jsonString), &strategy)
if err != nil {
fmt.Printf("Error occurred while unmarshalling strategy object, error: %s", err.Error())
}

g.Expect(getMaxUnavailable(strategy, totalNodes)).To(gomega.Equal(1))
}

func TestGetMaxUnavailableWithPercentageNonIntResult(t *testing.T) {
g := gomega.NewGomegaWithT(t)

jsonString := "{ \"type\": \"randomUpdate\", \"maxUnavailable\": \"37%\", \"drainTimeout\": 15 }"
strategy := upgrademgrv1alpha1.UpdateStrategy{}
err := json.Unmarshal([]byte(jsonString), &strategy)
if err != nil {
fmt.Printf("Error occurred while unmarshalling strategy object, error: %s", err.Error())
}

g.Expect(getMaxUnavailable(strategy, 50)).To(gomega.Equal(18))
}

func TestGetMaxUnavailableWithIntValue(t *testing.T) {
g := gomega.NewGomegaWithT(t)

jsonString := "{ \"type\": \"randomUpdate\", \"maxUnavailable\": 75, \"drainTimeout\": 15 }"
strategy := upgrademgrv1alpha1.UpdateStrategy{}
err := json.Unmarshal([]byte(jsonString), &strategy)
if err != nil {
fmt.Printf("Error occurred while unmarshalling strategy object, error: %s", err.Error())
}

g.Expect(getMaxUnavailable(strategy, 200)).To(gomega.Equal(75))
}

func TestGetNextAvailableInstance(t *testing.T) {
g := gomega.NewGomegaWithT(t)

mockAsgName := "some-asg"
mockInstanceName1 := "foo1"
mockInstanceName2 := "bar1"
az := "az-1"
instance1 := autoscaling.Instance{InstanceId: &mockInstanceName1, AvailabilityZone: &az}
instance2 := autoscaling.Instance{InstanceId: &mockInstanceName2, AvailabilityZone: &az}

instancesList := []*autoscaling.Instance{}
instancesList = append(instancesList, &instance1, &instance2)
rcRollingUpgrade := &RollingUpgradeReconciler{ClusterState: clusterState}
rcRollingUpgrade.ClusterState.initializeAsg(mockAsgName, instancesList)
available := getNextAvailableInstances(mockAsgName, 1, instancesList, rcRollingUpgrade.ClusterState)

g.Expect(1).Should(gomega.Equal(len(available)))
g.Expect(rcRollingUpgrade.ClusterState.deleteEntryOfAsg(mockAsgName)).To(gomega.BeTrue())

}

func TestGetNextAvailableInstanceNoInstanceFound(t *testing.T) {
g := gomega.NewGomegaWithT(t)

mockAsgName := "some-asg"
mockInstanceName1 := "foo1"
mockInstanceName2 := "bar1"
az := "az-1"
instance1 := autoscaling.Instance{InstanceId: &mockInstanceName1, AvailabilityZone: &az}
instance2 := autoscaling.Instance{InstanceId: &mockInstanceName2, AvailabilityZone: &az}

instancesList := []*autoscaling.Instance{}
instancesList = append(instancesList, &instance1, &instance2)
rcRollingUpgrade := &RollingUpgradeReconciler{ClusterState: clusterState}
rcRollingUpgrade.ClusterState.initializeAsg(mockAsgName, instancesList)
available := getNextAvailableInstances("asg2", 1, instancesList, rcRollingUpgrade.ClusterState)

g.Expect(0).Should(gomega.Equal(len(available)))
g.Expect(rcRollingUpgrade.ClusterState.deleteEntryOfAsg(mockAsgName)).To(gomega.BeTrue())

}

func TestGetNextAvailableInstanceInAz(t *testing.T) {
g := gomega.NewGomegaWithT(t)

mockAsgName := "some-asg"
mockInstanceName1 := "foo1"
mockInstanceName2 := "bar1"
az := "az-1"
az2 := "az-2"
instance1 := autoscaling.Instance{InstanceId: &mockInstanceName1, AvailabilityZone: &az}
instance2 := autoscaling.Instance{InstanceId: &mockInstanceName2, AvailabilityZone: &az2}

instancesList := []*autoscaling.Instance{}
instancesList = append(instancesList, &instance1, &instance2)
rcRollingUpgrade := &RollingUpgradeReconciler{ClusterState: clusterState}
rcRollingUpgrade.ClusterState.initializeAsg(mockAsgName, instancesList)

instances := getNextSetOfAvailableInstancesInAz(mockAsgName, az, 1, instancesList, rcRollingUpgrade.ClusterState)
g.Expect(1).Should(gomega.Equal(len(instances)))
g.Expect(mockInstanceName1).Should(gomega.Equal(*instances[0].InstanceId))

instances = getNextSetOfAvailableInstancesInAz(mockAsgName, az2, 1, instancesList, rcRollingUpgrade.ClusterState)
g.Expect(1).Should(gomega.Equal(len(instances)))
g.Expect(mockInstanceName2).Should(gomega.Equal(*instances[0].InstanceId))

instances = getNextSetOfAvailableInstancesInAz(mockAsgName, "az3", 1, instancesList, rcRollingUpgrade.ClusterState)
g.Expect(0).Should(gomega.Equal(len(instances)))

g.Expect(rcRollingUpgrade.ClusterState.deleteEntryOfAsg(mockAsgName)).To(gomega.BeTrue())

}

func TestGetNextAvailableInstanceInAzGetMultipleInstances(t *testing.T) {
g := gomega.NewGomegaWithT(t)

mockAsgName := "some-asg"
mockInstanceName1 := "foo1"
mockInstanceName2 := "bar1"
az := "az-1"
instance1 := autoscaling.Instance{InstanceId: &mockInstanceName1, AvailabilityZone: &az}
instance2 := autoscaling.Instance{InstanceId: &mockInstanceName2, AvailabilityZone: &az}

instancesList := []*autoscaling.Instance{}
instancesList = append(instancesList, &instance1, &instance2)
rcRollingUpgrade := &RollingUpgradeReconciler{ClusterState: clusterState}
rcRollingUpgrade.ClusterState.initializeAsg(mockAsgName, instancesList)

instances := getNextSetOfAvailableInstancesInAz(mockAsgName, az, 3, instancesList, rcRollingUpgrade.ClusterState)

// Even though the request is for 3 instances, only 2 should be returned as there are only 2 nodes in the ASG
g.Expect(2).Should(gomega.Equal(len(instances)))
instanceIds := []string{*instances[0].InstanceId, *instances[1].InstanceId}
g.Expect(instanceIds).Should(gomega.ConsistOf(mockInstanceName1, mockInstanceName2))

g.Expect(rcRollingUpgrade.ClusterState.deleteEntryOfAsg(mockAsgName)).To(gomega.BeTrue())
}
22 changes: 22 additions & 0 deletions controllers/node_selector.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
package controllers

import (
"github.com/aws/aws-sdk-go/service/autoscaling"
upgrademgrv1alpha1 "github.com/keikoproj/upgrade-manager/api/v1alpha1"
)

type NodeSelector interface {
SelectNodesForRestack(state ClusterState) []*autoscaling.Instance
}

func getNodeSelector(
asg *autoscaling.Group,
ruObj *upgrademgrv1alpha1.RollingUpgrade,
) NodeSelector {
switch ruObj.Spec.Strategy.Type {
case upgrademgrv1alpha1.UniformAcrossAzUpdateStrategy:
return NewUniformAcrossAzNodeSelector(asg, ruObj)
default:
return NewRandomNodeSelector(asg, ruObj)
}
}
88 changes: 88 additions & 0 deletions controllers/node_selector_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
package controllers

import (
"github.com/aws/aws-sdk-go/service/autoscaling"
upgrademgrv1alpha1 "github.com/keikoproj/upgrade-manager/api/v1alpha1"
"github.com/onsi/gomega"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"testing"
)

func TestGetRandomNodeSelector(t *testing.T) {
g := gomega.NewGomegaWithT(t)

someAsg := "some-asg"
mockID := "some-id"
someLaunchConfig := "some-launch-config"
diffLaunchConfig := "different-launch-config"
az := "az-1"
mockInstance := autoscaling.Instance{InstanceId: &mockID, LaunchConfigurationName: &diffLaunchConfig, AvailabilityZone: &az}
mockAsg := autoscaling.Group{AutoScalingGroupName: &someAsg,
LaunchConfigurationName: &someLaunchConfig,
Instances: []*autoscaling.Instance{&mockInstance}}

ruObj := &upgrademgrv1alpha1.RollingUpgrade{
ObjectMeta: metav1.ObjectMeta{Name: "foo", Namespace: "default"},
Spec: upgrademgrv1alpha1.RollingUpgradeSpec{AsgName: someAsg},
}

nodeSelector := getNodeSelector(&mockAsg, ruObj)

g.Expect(nodeSelector).Should(gomega.BeAssignableToTypeOf(&RandomNodeSelector{}))
}

func TestGetUniformAcrossAzNodeSelector(t *testing.T) {
g := gomega.NewGomegaWithT(t)

someAsg := "some-asg"
mockID := "some-id"
someLaunchConfig := "some-launch-config"
diffLaunchConfig := "different-launch-config"
az := "az-1"
mockInstance := autoscaling.Instance{InstanceId: &mockID, LaunchConfigurationName: &diffLaunchConfig, AvailabilityZone: &az}
mockAsg := autoscaling.Group{AutoScalingGroupName: &someAsg,
LaunchConfigurationName: &someLaunchConfig,
Instances: []*autoscaling.Instance{&mockInstance}}

ruObj := &upgrademgrv1alpha1.RollingUpgrade{
ObjectMeta: metav1.ObjectMeta{Name: "foo", Namespace: "default"},
Spec: upgrademgrv1alpha1.RollingUpgradeSpec{
AsgName: someAsg,
Strategy: upgrademgrv1alpha1.UpdateStrategy{
Type: upgrademgrv1alpha1.UniformAcrossAzUpdateStrategy,
},
},
}

nodeSelector := getNodeSelector(&mockAsg, ruObj)

g.Expect(nodeSelector).Should(gomega.BeAssignableToTypeOf(&UniformAcrossAzNodeSelector{}))
}

func TestGetNodeSelectorWithInvalidStrategy(t *testing.T) {
g := gomega.NewGomegaWithT(t)

someAsg := "some-asg"
mockID := "some-id"
someLaunchConfig := "some-launch-config"
diffLaunchConfig := "different-launch-config"
az := "az-1"
mockInstance := autoscaling.Instance{InstanceId: &mockID, LaunchConfigurationName: &diffLaunchConfig, AvailabilityZone: &az}
mockAsg := autoscaling.Group{AutoScalingGroupName: &someAsg,
LaunchConfigurationName: &someLaunchConfig,
Instances: []*autoscaling.Instance{&mockInstance}}

ruObj := &upgrademgrv1alpha1.RollingUpgrade{
ObjectMeta: metav1.ObjectMeta{Name: "foo", Namespace: "default"},
Spec: upgrademgrv1alpha1.RollingUpgradeSpec{
AsgName: someAsg,
Strategy: upgrademgrv1alpha1.UpdateStrategy{
Type: "invalid",
},
},
}

nodeSelector := getNodeSelector(&mockAsg, ruObj)

g.Expect(nodeSelector).Should(gomega.BeAssignableToTypeOf(&RandomNodeSelector{}))
}
Loading