Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Add proposal for adding progressDeadlineSeconds in CloneSet #1520

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
- name: Checkout Actions Repository
uses: actions/checkout@v4
- name: Check spelling with custom config file
uses: crate-ci/typos@v1.22.9
uses: crate-ci/typos@v1.23.1
with:
config: ./typos.toml

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/scorecard.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ jobs:
# Upload the results as artifacts (optional). Commenting out will disable uploads of run results in SARIF
# format to the repository Actions tab.
- name: "Upload artifact"
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808 # v4.3.3
uses: actions/upload-artifact@0b2256b8c012f0828dc542b3febcab082c67f72b # v4.3.4
with:
name: SARIF file
path: results.sarif
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
[![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html)
[![Go Report Card](https://goreportcard.com/badge/github.com/openkruise/kruise)](https://goreportcard.com/report/github.com/openkruise/kruise)
[![CII Best Practices](https://bestpractices.coreinfrastructure.org/projects/2908/badge)](https://bestpractices.coreinfrastructure.org/en/projects/2908)
[![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/openkruise/kruise/badge)](https://api.securityscorecards.dev/projects/github.com/openkruise/kruise)
[![OpenSSF Scorecard](https://api.scorecard.dev/projects/github.com/openkruise/kruise/badge)](https://scorecard.dev/viewer/?uri=github.com/openkruise/kruise)
[![CircleCI](https://circleci.com/gh/openkruise/kruise.svg?style=svg)](https://circleci.com/gh/openkruise/kruise)
[![codecov](https://codecov.io/gh/openkruise/kruise/branch/master/graph/badge.svg)](https://codecov.io/gh/openkruise/kruise)
[![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg)](./CODE_OF_CONDUCT.md)
Expand Down
5 changes: 3 additions & 2 deletions apis/apps/v1alpha1/daemonset_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,12 @@ limitations under the License.
package v1alpha1

import (
appspub "github.com/openkruise/kruise/apis/apps/pub"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/intstr"

appspub "github.com/openkruise/kruise/apis/apps/pub"
)

// DaemonSetUpdateStrategy is a struct used to control the update strategy for a DaemonSet.
Expand Down Expand Up @@ -91,7 +92,7 @@ type RollingUpdateDaemonSet struct {
// pod is available (Ready for at least minReadySeconds) the old DaemonSet pod
// on that node is marked deleted. If the old pod becomes unavailable for any
// reason (Ready transitions to false, is evicted, or is drained) an updated
// pod is immediatedly created on that node without considering surge limits.
// pod is immediately created on that node without considering surge limits.
// Allowing surge implies the possibility that the resources consumed by the
// daemonset on any given node can double if the readiness check fails, and
// so resource intensive daemonsets should take into account that they may
Expand Down
2 changes: 1 addition & 1 deletion config/crd/bases/apps.kruise.io_daemonsets.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ spec:
pod is available (Ready for at least minReadySeconds) the old DaemonSet pod
on that node is marked deleted. If the old pod becomes unavailable for any
reason (Ready transitions to false, is evicted, or is drained) an updated
pod is immediatedly created on that node without considering surge limits.
pod is immediately created on that node without considering surge limits.
Allowing surge implies the possibility that the resources consumed by the
daemonset on any given node can double if the readiness check fails, and
so resource intensive daemonsets should take into account that they may
Expand Down
161 changes: 161 additions & 0 deletions docs/proposals/20240309-cloneset-support-progressDeadlineSeconds.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
---
title: CloneSet
authors:
- "@hantmac"
reviewers:
- "@Fei-Guo"
- "@furykerry"
- "@FillZpp"
creation-date: 2024-03-10
last-updated: 2024-03-10
status: implementable
---

# Support progressDeadlineSeconds in CloneSet
Table of Contents
=================

- [Support progressDeadlineSeconds in CloneSet](#Support progressDeadlineSeconds in CloneSet)
- [Table of Contents](#table-of-contents)
- [Motivation](#motivation)
- [Proposal](#proposal)
- [1. add .spec.progressDeadlineSeconds field](#1add-.spec.progressDeadlineSeconds-field)
- [2. The behavior of progressDeadlineSeconds](#2the-behavior-of-progressDeadlineSeconds)
- [3. handle the logic](#2handle-the-logic)

## Motivation

`.spec.progressDeadlineSeconds` is an optional field in Deployment that specifies the number of seconds one wants to wait for their Deployment to progress before the system reports back that the Deployment has failed progressing.
Once the deadline has been exceeded, the Deployment controller adds a DeploymentCondition with the following attributes to the Deployment's `.status.conditions`:
```
type: Progressing
status: "False"
reason: ProgressDeadlineExceeded
```

This is useful for users to control the progress of the deployment.
So we should add support for `progressDeadlineSeconds` in CloneSet as well.

## Proposal
Firstly, add the `progressDeadlineSeconds` field to the CloneSetSpec.
Then add the handle logic in cloneSet controller to handle the `progressDeadlineSeconds` field.

### 1. add .spec.progressDeadlineSeconds field
The default value of `progressDeadlineSeconds` is 600 seconds according to the [official document](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#progress-deadline-seconds).
```yaml
apiVersion: apps.kruise.io/v1alpha1
kind: CloneSet
metadata:
name: cloneset-example
spec:
replicas: 3
progressDeadlineSeconds: 600
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
```

### 2. The behavior of progressDeadlineSeconds
However, the behavior of `progressDeadlineSeconds` in CloneSet might differ from its behavior in Deployment due to the support of partition in CloneSet. If a CloneSet is paused due to partition, it's debatable whether the paused time should be included in the progress deadline.
Here are two possible interpretations of `progressDeadlineSeconds` in the context of CloneSet:
1. `progressDeadlineSeconds` could be redefined as the time taken for the CloneSet to reach completion or the "paused" state due to partition. In this case, the time during which the CloneSet is paused would NOT be included in the progress deadline.
2. Secondly, `progressDeadlineSeconds` could only be supported if the partition is not set. This means that if a partition is set, the `progressDeadlineSeconds` field would not be applicable or has no effect.

After the discussion of the community, we choose the first interpretation.So we should re-set the progressDeadlineSeconds when the CloneSet reach completion OR "the paused state".

### 3. handle the logic
In cloneset controller, we should add the logic to handle the `progressDeadlineSeconds` field. We firstly add a timer to check the progress of the CloneSet.
If the progress exceeds the `progressDeadlineSeconds`, we should add a CloneSetCondition to the CloneSet's `.status.conditions`:
```go
// add a timer to check the progress of the CloneSet
if cloneSet.Spec.ProgressDeadlineSeconds != nil {
// handle the logic
starttime := time.Now()
furykerry marked this conversation as resolved.
Show resolved Hide resolved
...
if time.Now().After(starttime.Add(time.Duration(*cloneSet.Spec.ProgressDeadlineSeconds) * time.Second)) {
newStatus.Conditions = append(newStatus.Conditions, appsv1alpha1.CloneSetCondition{
Type: appsv1alpha1.CloneSetProgressing,
Status: corev1.ConditionFalse,
Reason: appsv1alpha1.CloneSetProgressDeadlineExceeded,
})
}
}
```

When the CloneSet reaches the "paused" state, we should reset the timer to avoid the progress deadline being exceeded.
And we check the progress of the CloneSet in the `syncCloneSetStatus` function. If the progress exceeds the `progressDeadlineSeconds`, we should add a CloneSetCondition to the CloneSet's `.status.conditions`:

```go
const (
CloneSetProgressDeadlineExceeded CloneSetConditionReason = "ProgressDeadlineExceeded"
CloneSetConditionTypeProgressing CloneSetConditionType = "Progressing"
)
```

```go
func (c *CloneSetController) syncCloneSetStatus(cloneSet *appsv1alpha1.CloneSet, newStatus *appsv1alpha1.CloneSetStatus) error {
...
if cloneSet.Spec.ProgressDeadlineSeconds != nil {
// handle the logic
if time.Now().After(starttime.Add(time.Duration(*cloneSet.Spec.ProgressDeadlineSeconds) * time.Second)) {
newStatus.Conditions = append(newStatus.Conditions, appsv1alpha1.CloneSetCondition{
Type: appsv1alpha1.CloneSetProgressing,
Status: corev1.ConditionFalse,
Reason: appsv1alpha1.CloneSetProgressDeadlineExceeded,
})
}
}
...
}
```

When the CloneSet reaches the "paused" state, we should reset the timer to avoid the progress deadline being exceeded.
```go
func (c *CloneSetController) syncCloneSetStatus(cloneSet *appsv1alpha1.CloneSet, newStatus *appsv1alpha1.CloneSetStatus) error {
...
// reset the starttime when the CloneSet reaches the "paused" state or complete state
if cloneSet.Status.UpdatedReadyReplicas == cloneSet.Status.Replicas || replicas - updatedReplicas = partition {
starttime = time.Now()
}

if cloneSet.Spec.Paused {
starttime = time.Now()
}
...
}

...
}
```

And we can save the starttime in the `LastUpdateTime` in the CloneSet's `.status.conditions`:
```
status:
conditions:
- lastTransitionTime: "2021-11-26T20:52:12Z"
lastUpdateTime: "2021-11-26T20:52:12Z"
message: CloneSet has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-11-26T20:52:12Z"
lastUpdateTime: "2021-11-26T20:52:12Z"
message: 'progress deadline exceeded'
reason: ProgressDeadlineExceeded
status: "False"
type: Progressing
```

## Implementation History

- [ ] 06/07/2024: Proposal submission


Loading