Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Wait method to select undisruptable pods using annotation #89

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 10 additions & 7 deletions docs/cycling/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,11 @@ The CycleNodeRequest CRD handles a request to cycle nodes belonging to a specifi
5. In the **ScalingUp** phase, wait for the cloud provider to bring up the new nodes and then wait for the new nodes to be **Ready** in the Kubernetes API. Wait for the configured health checks on the node succeed. Transition the object to **CordoningNode**.

6. In the **CordoningNode** phase, cordon the selected nodes in the Kubernetes API then perform the pre-termination checks. Transition the object to **WaitingTermination**.

7. In the **WaitingTermination** phase, create a CycleNodeStatus CRD for every node that was cordoned. Each of these CycleNodeStatuses handles the termination of an individual node. The controller will wait for a number of them to enter the **Successful** or **Failed** phase before moving on.

If any of them have **Failed** then the CycleNodeRequest will move to **Failed** and will not add any more nodes for cycling. If they are all **Successful** then the CycleNodeRequest will move back to **Initialised** to cycle more nodes.

### CycleNodeStatus

The CycleNodeStatus CRD handles the draining of pods from, and termination of, an individual node. These should only be created by the controller.
Expand All @@ -51,7 +51,7 @@ The CycleNodeStatus CRD handles the draining of pods from, and termination of, a
1. In the **Pending** phase, validate that the node still exists and store information about the node.
Transition the object to **WaitingPods** if the Method is set to "Wait", otherwise transition to
**RemovingLabelsFromPods**.

1. In the **WaitingPods** phase, wait for all pods that are not ignored by the `waitRules` to be removed from the node. Will wait for a long time before finally giving up if pods still remain. Transition the object to **Failed** if it times out waiting, or to **RemovingLabelsFromPods** once there are no pods left.

1. In the **RemovingLabelsFromPods** phase, remove any labels that are defined in the `labelsToRemove` option from any pod that is running on the target node. This is useful when you want to "detach" a pod from a service before draining it from a node to prevent requests in progress to the pod from being interrupted. Transition the object to **DrainingPods**.
Expand Down Expand Up @@ -113,7 +113,8 @@ spec:

cycleNodeSettings:
# Method can be "Wait" or "Drain", defaults to "Drain" if not provided
# "Wait" will wait for pods on the node to complete, while "Drain" will forcefully drain them off the node
# "Wait" will wait for pods with the "cyclops.atlassian.com/do-not-disrupt=true"
# annotation on the node to complete, while "Drain" will forcefully drain them off the node
method: "Wait|Drain"

# Optional field - use this to scale up by `concurrency` nodes at a time. The default is the current number
Expand All @@ -128,18 +129,20 @@ spec:
# if you want to remove them from existing services before draining the nodes
labelsToRemove:
- <labelKey>

# Optional field - only used if method=Wait
# ignorePodsLabels is a map of label names to a list of label values, where any value for the given
# label name will cause a pod to not be waited for
# Takes precendence over selecting pods with the "cyclops.atlassian.com/do-not-disrupt=true" annotation.
ignorePodsLabels:
# This example ignores all pods where labelName=value1 or labelName=value2
labelName:
- "value1"
- "value2"

# Optional field - only used if method=Wait
# ignoreNamespaces is a list of namespaces from which to ignore pods when waiting for pods on a node to finish
# Takes precendence over selecting pods with the "cyclops.atlassian.com/do-not-disrupt=true" annotation.
ignoreNamespaces:
- "kube-system"
```
Expand Down
24 changes: 20 additions & 4 deletions docs/cycling/examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,18 +166,34 @@ spec:
- stickypod
```

This example shows the usage of the `Wait` method which opposed to `Drain` which attempts to remove pods from the node before terminating, will wait for all pods to leave the node naturally by themselves. This is useful for situations where you cannot forcefully remove pods, such as high churn jobs which need to be run to completion.
This example shows the usage of the `Wait` method which as opposed to `Drain`, which attempts to remove pods from the node before terminating, will wait for pods with the `cyclops.atlassian.com/do-not-disrupt=true` annotation to leave the node naturally by themselves. This is useful for situations where you cannot forcefully remove pods, such as high churn jobs which need to be run to completion.

```yaml
# Pod example
apiVersion: v1
kind: Pod
metadata:
name: do-not-disrupt
annotations:
cyclops.atlassian.com/do-not-disrupt: "true"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is an annotation appropriate for this? Label would allow both cyclops and cluster users to select un-disruptable pods with a labelSelector, potentially useful in various scenarios.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm modeling this around how karpenter does it: https://karpenter.sh/docs/concepts/disruption/#pod-level-controls

spec:
containers:
- name: sleep
image: alpine
command: ["sleep", "3600"]
```

Cyclops provides an option to ignore pods with specific labels in order to support nodes that may run pods that will never exit themselves. In this example, the pod with label `name=stickypod` would be ignore when waiting for all other pods to terminate. The node will be terminated while `name=stickypod` is running, and all others have finished.

```yaml
cycleSettings:
method: "Wait"
ignorePodsLabels:
name:
- stickypod
- stickypod
```

Cyclops provides an option to ignore pods with specific labels in order to support nodes that may run pods that will never exit themselves. In this example, the pod with label `name=stickypod` would be ignore when waiting for all other pods to terminate. The node will be terminated while `name=stickypod` is running, and all others have finished.

The label selector to ignore pods will take precedence over selecting pods with the `cyclops.atlassian.com/do-not-disrupt=true` annotation, which means a pod with both the annotation and the label will be ignored.

## Example 5 - Concurrency within multiple cloud provider node groups

Expand Down
8 changes: 4 additions & 4 deletions pkg/controller/cyclenodestatus/transitioner/pod.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ package transitioner
import (
"fmt"

corev1 "k8s.io/api/core/v1"
"github.com/atlassian-labs/cyclops/pkg/k8s"
corev1 "k8s.io/api/core/v1"
)

func (t *CycleNodeStatusTransitioner) removeLabelsFromPods() (finished bool, err error) {
Expand Down Expand Up @@ -53,14 +53,14 @@ func (t *CycleNodeStatusTransitioner) removeLabelsFromPods() (finished bool, err

// podsFinished returns true if all relevant pods on the node are finished.
func (t *CycleNodeStatusTransitioner) podsFinished() (bool, error) {
// Get drainable pods
drainablePods, err := t.rm.GetDrainablePodsOnNode(t.cycleNodeStatus.Status.CurrentNode.Name)
// Get undisruptable pods
undisruptablePods, err := t.rm.GetUndisruptablePods(t.cycleNodeStatus.Status.CurrentNode.Name)
if err != nil {
return false, err
}

waitingOnPods, err := getRunningPods(
drainablePods,
undisruptablePods,
t.cycleNodeStatus.Spec.CycleSettings.IgnoreNamespaces,
t.cycleNodeStatus.Spec.CycleSettings.IgnorePodsLabels,
)
Expand Down
24 changes: 23 additions & 1 deletion pkg/controller/pod.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@ package controller
import (
"context"

"github.com/atlassian-labs/cyclops/pkg/k8s"
v1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/fields"
"sigs.k8s.io/controller-runtime/pkg/client"
"github.com/atlassian-labs/cyclops/pkg/k8s"
)

// GetPodsOnNode gets a list of the pods running on the given node, optionally filtered by the given label selector.
Expand All @@ -24,6 +24,28 @@ func (rm *ResourceManager) GetPodsOnNode(nodeName string) (pods []v1.Pod, err er
return podList.Items, nil
}

func getUndisruptablePods(pods []v1.Pod) []v1.Pod {
filteredPods := make([]v1.Pod, 0)

for _, pod := range pods {
if k8s.PodCannotBeDisrupted(&pod) && pod.Status.Phase == v1.PodRunning {
filteredPods = append(filteredPods, pod)
}
}

return filteredPods
}

// GetUndisruptablePods gets a list of pods on a named node that cannot evicted or deleted from the node.
func (rm *ResourceManager) GetUndisruptablePods(nodeName string) (pods []v1.Pod, err error) {
allPods, err := rm.GetPodsOnNode(nodeName)
if err != nil {
return pods, err
}

return getUndisruptablePods(allPods), nil
}

// GetDrainablePodsOnNode gets a list of pods on a named node that we can evict or delete from the node.
func (rm *ResourceManager) GetDrainablePodsOnNode(nodeName string) (pods []v1.Pod, err error) {
allPods, err := rm.GetPodsOnNode(nodeName)
Expand Down
89 changes: 89 additions & 0 deletions pkg/controller/pod_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
package controller

import (
"testing"

"github.com/stretchr/testify/assert"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

func TestGetUndisruptablePods(t *testing.T) {
pod1 := corev1.Pod{
ObjectMeta: metav1.ObjectMeta{
Name: "pod-1",
Annotations: map[string]string{
"cyclops.atlassian.com/do-not-disrupt": "true",
},
},
Status: corev1.PodStatus{
Phase: corev1.PodRunning,
},
}

pod2 := corev1.Pod{
ObjectMeta: metav1.ObjectMeta{
Name: "pod-2",
Annotations: map[string]string{
"cyclops.atlassian.com/do-not-disrupt": "true",
},
},
Status: corev1.PodStatus{
Phase: corev1.PodSucceeded,
},
}

pod3 := corev1.Pod{
ObjectMeta: metav1.ObjectMeta{
Name: "pod-3",
Annotations: map[string]string{
"cyclops.atlassian.com/do-not-disrupt": "false",
},
},
Status: corev1.PodStatus{
Phase: corev1.PodRunning,
},
}

pod4 := corev1.Pod{
ObjectMeta: metav1.ObjectMeta{
Name: "pod-4",
},
Status: corev1.PodStatus{
Phase: corev1.PodRunning,
},
}

tests := []struct {
name string
pods []corev1.Pod
want []corev1.Pod
}{
{
"test with no pods with annotation",
[]corev1.Pod{pod4},
[]corev1.Pod{},
},
{
"test with 1 pod with annotation",
[]corev1.Pod{pod1},
[]corev1.Pod{pod1},
},
{
"test succeeded pod with annotation",
[]corev1.Pod{pod2},
[]corev1.Pod{},
},
{
"test with 1 pod without correct annotation value",
[]corev1.Pod{pod3},
[]corev1.Pod{},
},
}

for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
assert.ElementsMatch(t, tc.want, getUndisruptablePods(tc.pods))
})
}
}
17 changes: 16 additions & 1 deletion pkg/k8s/pod.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@ import (
)

const (
podConditionTypeForUnhealthy = v1.PodReady
podConditionTypeForUnhealthy = v1.PodReady
doNotDisruptAnnotation = "cyclops.atlassian.com/do-not-disrupt"
doNotDisruptAnnotationRequiredValue = "true"
)

var log = logf.Log.WithName("k8s.pod.go")
Expand Down Expand Up @@ -84,6 +86,19 @@ func PodIsDaemonSet(pod *v1.Pod) bool {
return true
}
}

return false
}

// PodCannotBeDisrupted returns true if the pod cannot be forcibly drained from
// a node.
func PodCannotBeDisrupted(pod *v1.Pod) bool {
for annotationName, annotationValue := range pod.ObjectMeta.Annotations {
if annotationName == doNotDisruptAnnotation && annotationValue == doNotDisruptAnnotationRequiredValue {
return true
}
}

return false
}

Expand Down
Loading