Skip to content

Commit

Permalink
add proposal for support progressDeadlineSeconds in CloneSet
Browse files Browse the repository at this point in the history
Signed-off-by: hantmac <[email protected]>

more docs

Signed-off-by: hantmac <[email protected]>

fix mdl ci

Signed-off-by: hantmac <[email protected]>

complete the proposal

Signed-off-by: hantmac <[email protected]>

fix

Signed-off-by: hantmac <[email protected]>

update

Signed-off-by: hantmac <[email protected]>

fix typo

Signed-off-by: hantmac <[email protected]>

Bump crate-ci/typos from 1.22.9 to 1.23.1 (#1658)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.22.9 to 1.23.1.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.22.9...v1.23.1)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bump actions/upload-artifact from 4.3.3 to 4.3.4 (#1659)

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.3.3 to 4.3.4.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@6546280...0b2256b)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

changed the scorecard badge link to the standard format and updated the domain (#1657)

Signed-off-by: harshitasao <[email protected]>

fix typo

Signed-off-by: hantmac <[email protected]>
  • Loading branch information
hantmac committed Jul 18, 2024
1 parent 8ae13b1 commit fa87578
Show file tree
Hide file tree
Showing 6 changed files with 168 additions and 6 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
- name: Checkout Actions Repository
uses: actions/checkout@v4
- name: Check spelling with custom config file
uses: crate-ci/typos@v1.22.9
uses: crate-ci/typos@v1.23.1
with:
config: ./typos.toml

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/scorecard.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ jobs:
# Upload the results as artifacts (optional). Commenting out will disable uploads of run results in SARIF
# format to the repository Actions tab.
- name: "Upload artifact"
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808 # v4.3.3
uses: actions/upload-artifact@0b2256b8c012f0828dc542b3febcab082c67f72b # v4.3.4
with:
name: SARIF file
path: results.sarif
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
[![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html)
[![Go Report Card](https://goreportcard.com/badge/github.com/openkruise/kruise)](https://goreportcard.com/report/github.com/openkruise/kruise)
[![CII Best Practices](https://bestpractices.coreinfrastructure.org/projects/2908/badge)](https://bestpractices.coreinfrastructure.org/en/projects/2908)
[![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/openkruise/kruise/badge)](https://api.securityscorecards.dev/projects/github.com/openkruise/kruise)
[![OpenSSF Scorecard](https://api.scorecard.dev/projects/github.com/openkruise/kruise/badge)](https://scorecard.dev/viewer/?uri=github.com/openkruise/kruise)
[![CircleCI](https://circleci.com/gh/openkruise/kruise.svg?style=svg)](https://circleci.com/gh/openkruise/kruise)
[![codecov](https://codecov.io/gh/openkruise/kruise/branch/master/graph/badge.svg)](https://codecov.io/gh/openkruise/kruise)
[![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg)](./CODE_OF_CONDUCT.md)
Expand Down
5 changes: 3 additions & 2 deletions apis/apps/v1alpha1/daemonset_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,12 @@ limitations under the License.
package v1alpha1

import (
appspub "github.com/openkruise/kruise/apis/apps/pub"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/intstr"

appspub "github.com/openkruise/kruise/apis/apps/pub"
)

// DaemonSetUpdateStrategy is a struct used to control the update strategy for a DaemonSet.
Expand Down Expand Up @@ -91,7 +92,7 @@ type RollingUpdateDaemonSet struct {
// pod is available (Ready for at least minReadySeconds) the old DaemonSet pod
// on that node is marked deleted. If the old pod becomes unavailable for any
// reason (Ready transitions to false, is evicted, or is drained) an updated
// pod is immediatedly created on that node without considering surge limits.
// pod is immediately created on that node without considering surge limits.
// Allowing surge implies the possibility that the resources consumed by the
// daemonset on any given node can double if the readiness check fails, and
// so resource intensive daemonsets should take into account that they may
Expand Down
2 changes: 1 addition & 1 deletion config/crd/bases/apps.kruise.io_daemonsets.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ spec:
pod is available (Ready for at least minReadySeconds) the old DaemonSet pod
on that node is marked deleted. If the old pod becomes unavailable for any
reason (Ready transitions to false, is evicted, or is drained) an updated
pod is immediatedly created on that node without considering surge limits.
pod is immediately created on that node without considering surge limits.
Allowing surge implies the possibility that the resources consumed by the
daemonset on any given node can double if the readiness check fails, and
so resource intensive daemonsets should take into account that they may
Expand Down
161 changes: 161 additions & 0 deletions docs/proposals/20240309-cloneset-support-progressDeadlineSeconds.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
---
title: CloneSet
authors:
- "@hantmac"
reviewers:
- "@Fei-Guo"
- "@furykerry"
- "@FillZpp"
creation-date: 2024-03-10
last-updated: 2024-03-10
status: implementable
---

# Support progressDeadlineSeconds in CloneSet
Table of Contents
=================

- [Support progressDeadlineSeconds in CloneSet](#Support progressDeadlineSeconds in CloneSet)
- [Table of Contents](#table-of-contents)
- [Motivation](#motivation)
- [Proposal](#proposal)
- [1. add .spec.progressDeadlineSeconds field](#1add-.spec.progressDeadlineSeconds-field)
- [2. The behavior of progressDeadlineSeconds](#2the-behavior-of-progressDeadlineSeconds)
- [3. handle the logic](#2handle-the-logic)

## Motivation

`.spec.progressDeadlineSeconds` is an optional field in Deployment that specifies the number of seconds one wants to wait for their Deployment to progress before the system reports back that the Deployment has failed progressing.
Once the deadline has been exceeded, the Deployment controller adds a DeploymentCondition with the following attributes to the Deployment's `.status.conditions`:
```
type: Progressing
status: "False"
reason: ProgressDeadlineExceeded
```

This is useful for users to control the progress of the deployment.
So we should add support for `progressDeadlineSeconds` in CloneSet as well.

## Proposal
Firstly, add the `progressDeadlineSeconds` field to the CloneSetSpec.
Then add the handle logic in cloneSet controller to handle the `progressDeadlineSeconds` field.

### 1. add .spec.progressDeadlineSeconds field
The default value of `progressDeadlineSeconds` is 600 seconds according to the [official document](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#progress-deadline-seconds).
```yaml
apiVersion: apps.kruise.io/v1alpha1
kind: CloneSet
metadata:
name: cloneset-example
spec:
replicas: 3
progressDeadlineSeconds: 600
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
```
### 2. The behavior of progressDeadlineSeconds
However, the behavior of `progressDeadlineSeconds` in CloneSet might differ from its behavior in Deployment due to the support of partition in CloneSet. If a CloneSet is paused due to partition, it's debatable whether the paused time should be included in the progress deadline.
Here are two possible interpretations of `progressDeadlineSeconds` in the context of CloneSet:
1. `progressDeadlineSeconds` could be redefined as the time taken for the CloneSet to reach completion or the "paused" state due to partition. In this case, the time during which the CloneSet is paused would NOT be included in the progress deadline.
2. Secondly, `progressDeadlineSeconds` could only be supported if the partition is not set. This means that if a partition is set, the `progressDeadlineSeconds` field would not be applicable or has no effect.

After the discussion of the community, we choose the first interpretation.So we should re-set the progressDeadlineSeconds when the CloneSet reach completion OR "the paused state".

### 3. handle the logic
In cloneset controller, we should add the logic to handle the `progressDeadlineSeconds` field. We firstly add a timer to check the progress of the CloneSet.
If the progress exceeds the `progressDeadlineSeconds`, we should add a CloneSetCondition to the CloneSet's `.status.conditions`:
```go
// add a timer to check the progress of the CloneSet
if cloneSet.Spec.ProgressDeadlineSeconds != nil {
// handle the logic
starttime := time.Now()
...
if time.Now().After(starttime.Add(time.Duration(*cloneSet.Spec.ProgressDeadlineSeconds) * time.Second)) {
newStatus.Conditions = append(newStatus.Conditions, appsv1alpha1.CloneSetCondition{
Type: appsv1alpha1.CloneSetProgressing,
Status: corev1.ConditionFalse,
Reason: appsv1alpha1.CloneSetProgressDeadlineExceeded,
})
}
}
```

When the CloneSet reaches the "paused" state, we should reset the timer to avoid the progress deadline being exceeded.
And we check the progress of the CloneSet in the `syncCloneSetStatus` function. If the progress exceeds the `progressDeadlineSeconds`, we should add a CloneSetCondition to the CloneSet's `.status.conditions`:

```go
const (
CloneSetProgressDeadlineExceeded CloneSetConditionReason = "ProgressDeadlineExceeded"
CloneSetConditionTypeProgressing CloneSetConditionType = "Progressing"
)
```

```go
func (c *CloneSetController) syncCloneSetStatus(cloneSet *appsv1alpha1.CloneSet, newStatus *appsv1alpha1.CloneSetStatus) error {
...
if cloneSet.Spec.ProgressDeadlineSeconds != nil {
// handle the logic
if time.Now().After(starttime.Add(time.Duration(*cloneSet.Spec.ProgressDeadlineSeconds) * time.Second)) {
newStatus.Conditions = append(newStatus.Conditions, appsv1alpha1.CloneSetCondition{
Type: appsv1alpha1.CloneSetProgressing,
Status: corev1.ConditionFalse,
Reason: appsv1alpha1.CloneSetProgressDeadlineExceeded,
})
}
}
...
}
```

When the CloneSet reaches the "paused" state, we should reset the timer to avoid the progress deadline being exceeded.
```go
func (c *CloneSetController) syncCloneSetStatus(cloneSet *appsv1alpha1.CloneSet, newStatus *appsv1alpha1.CloneSetStatus) error {
...
// reset the starttime when the CloneSet reaches the "paused" state or complete state
if cloneSet.Status.UpdatedReadyReplicas == cloneSet.Status.Replicas || replicas - updatedReplicas = partition {
starttime = time.Now()
}
if cloneSet.Spec.Paused {
starttime = time.Now()
}
...
}
...
}
```

And we can save the starttime in the `LastUpdateTime` in the CloneSet's `.status.conditions`:
```
status:
conditions:
- lastTransitionTime: "2021-11-26T20:52:12Z"
lastUpdateTime: "2021-11-26T20:52:12Z"
message: CloneSet has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-11-26T20:52:12Z"
lastUpdateTime: "2021-11-26T20:52:12Z"
message: 'progress deadline exceeded'
reason: ProgressDeadlineExceeded
status: "False"
type: Progressing
```

## Implementation History

- [ ] 06/07/2024: Proposal submission


0 comments on commit fa87578

Please sign in to comment.