Skip to content

Commit

Permalink
feat(lifecycle-operator): adapt WorkloadVersionReconciler logic to us…
Browse files Browse the repository at this point in the history
…e ObservabilityTimeout for workload deployment (keptn#3160)

Signed-off-by: odubajDT <[email protected]>
Signed-off-by: vickysomtee <[email protected]>
  • Loading branch information
odubajDT authored and Vickysomtee committed Apr 22, 2024
1 parent b53a316 commit b9e2cd3
Show file tree
Hide file tree
Showing 16 changed files with 253 additions and 4 deletions.
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ integration-test:
chainsaw test --test-dir ./test/chainsaw/testanalysis/
chainsaw test --test-dir ./test/chainsaw/testcertificate/
chainsaw test --test-dir ./test/chainsaw/non-blocking-deployment/
chainsaw test --test-dir ./test/chainsaw/timeout-failure-deployment/

.PHONY: integration-test-local #these tests should run on a real cluster!
integration-test-local:
Expand Down
11 changes: 11 additions & 0 deletions docs/docs/components/lifecycle-operator/deployment-flow.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,17 @@ If any of these activities fail,
the `KeptnApp` issues the `AppDeployErrored` event
and terminates the deployment.

> **Note**
By default Keptn observes the state of the Kubernetes workloads
for 5 minutes.
After this timeout is exceeded, the deployment phase (from Keptn
viewpoint) is considered as `Failed` and Keptn does not proceed
with post-deployment phases (tasks, evaluations or promotion phase).
This timeout can be modified for the cluster by changing the value
of the `observabilityTimeout` field in the
[KeptnConfig](../../reference/crd-reference/config.md)
resource.

```shell
AppDeploy
AppDeployStarted
Expand Down
11 changes: 11 additions & 0 deletions docs/docs/components/lifecycle-operator/keptn-apps.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,17 @@ The `KeptnWorkload` resources are created automatically
and without delay by the mutating webhook
as soon as the workload manifest is applied.

> **Note**
By default Keptn observes the state of the Kubernetes workloads
for 5 minutes.
After this timeout is exceeded, the deployment phase (from Keptn
viewpoint) is considered as `Failed` and Keptn does not proceed
with post-deployment phases (tasks, evaluations or promotion phase).
This timeout can be modified for the cluster by changing the value
of the `observabilityTimeout` field in the
[KeptnConfig](../../reference/crd-reference/config.md)
resource.

## Keptn Applications

A [KeptnApp](../../reference/crd-reference/app.md)
Expand Down
1 change: 1 addition & 0 deletions docs/docs/getting-started/observability.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ metadata:
spec:
OTelCollectorUrl: 'jaeger-collector.keptn-system.svc.cluster.local:4317'
keptnAppCreationRequestTimeoutSeconds: 30
observabilityTimeout: 5m
```
Apply the file and wait for Keptn to pick up the new configuration:
Expand Down
14 changes: 14 additions & 0 deletions docs/docs/guides/otel.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,20 @@ kubectl port-forward deployment/metrics-operator 9999 -n keptn-system

You can access the metrics from your browser at: `http://localhost:9999`

## Define timeout for workload observability

There are situations when the deployment of the application fails due to
various reasons (e.g. container image not found).
By default Keptn observes the state of the Kubernetes workloads
for 5 minutes.
After this timeout is exceeded, the deployment phase (from Keptn
viewpoint) is considered as `Failed` and Keptn does not proceed
with post-deployment phases (tasks, evaluations or promotion phase).
This timeout can be modified for the cluster by changing the value
of the `observabilityTimeout` field in the
[KeptnConfig](../reference/crd-reference/config.md)
resource.

## Advanced tracing configurations in Keptn: Linking traces

In Keptn you can connect multiple traces, for instance to connect deployments
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -521,3 +521,11 @@ func (w KeptnWorkloadVersion) GetEventAnnotations() map[string]string {
"workloadVersionName": w.Name,
}
}

func (w *KeptnWorkloadVersion) SetDeploymentStartTime() {
w.Status.DeploymentStartTime = metav1.NewTime(time.Now().UTC())
}

func (w *KeptnWorkloadVersion) IsDeploymentStartTimeSet() bool {
return !w.Status.DeploymentStartTime.IsZero()
}
Original file line number Diff line number Diff line change
Expand Up @@ -113,12 +113,15 @@ func TestKeptnWorkloadVersion(t *testing.T) {

require.False(t, workload.IsEndTimeSet())
require.False(t, workload.IsStartTimeSet())
require.False(t, workload.IsDeploymentStartTimeSet())

workload.SetStartTime()
workload.SetEndTime()
workload.SetDeploymentStartTime()

require.True(t, workload.IsEndTimeSet())
require.True(t, workload.IsStartTimeSet())
require.True(t, workload.IsDeploymentStartTimeSet())

require.Equal(t, []attribute.KeyValue{
common.AppName.String("appname"),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ func TestKeptnWorkloadVersionReconciler_reconcileDeployment_FailedReplicaSet(t *
keptnState, err := r.reconcileDeployment(context.TODO(), workloadVersion)
require.Nil(t, err)
require.Equal(t, apicommon.StateProgressing, keptnState)
require.False(t, workloadVersion.Status.DeploymentStartTime.IsZero())
}

func TestKeptnWorkloadVersionReconciler_reconcileDeployment_UnavailableReplicaSet(t *testing.T) {
Expand All @@ -71,6 +72,51 @@ func TestKeptnWorkloadVersionReconciler_reconcileDeployment_UnavailableReplicaSe
keptnState, err := r.reconcileDeployment(context.TODO(), workloadVersion)
require.NotNil(t, err)
require.Equal(t, apicommon.StateUnknown, keptnState)
require.True(t, workloadVersion.Status.DeploymentStartTime.IsZero())
}

func TestKeptnWorkloadVersionReconciler_reconcileDeployment_WorkloadDeploymentTimedOut(t *testing.T) {

rep := int32(1)
replicaset := makeReplicaSet("myrep", "default", &rep, 0)
workloadVersion := makeWorkloadVersionWithRef(replicaset.ObjectMeta, "ReplicaSet")

fakeClient := testcommon.NewTestClient(replicaset, workloadVersion)

fakeRecorder := record.NewFakeRecorder(100)

r := &KeptnWorkloadVersionReconciler{
Client: fakeClient,
Config: config.Instance(),
EventSender: eventsender.NewK8sSender(fakeRecorder),
}

r.Config.SetObservabilityTimeout(metav1.Duration{
Duration: 5 * time.Second,
})

keptnState, err := r.reconcileDeployment(context.TODO(), workloadVersion)
require.Nil(t, err)
require.Equal(t, apicommon.StateProgressing, keptnState)
require.False(t, workloadVersion.Status.DeploymentStartTime.IsZero())

//revert the start time parameter backwards to check the timer
workloadVersion.Status.DeploymentStartTime = metav1.Time{
Time: workloadVersion.Status.DeploymentStartTime.Add(-10 * time.Second),
}

err = r.Client.Status().Update(context.TODO(), workloadVersion)
require.Nil(t, err)

keptnState, err = r.reconcileDeployment(context.TODO(), workloadVersion)
require.Nil(t, err)
require.Equal(t, apicommon.StateFailed, keptnState)
require.False(t, workloadVersion.Status.DeploymentStartTime.IsZero())

event := <-fakeRecorder.Events
require.Equal(t, strings.Contains(event, workloadVersion.GetName()), true, "wrong workloadVersion")
require.Equal(t, strings.Contains(event, workloadVersion.GetNamespace()), true, "wrong namespace")
require.Equal(t, strings.Contains(event, "has reached timeout"), true, "wrong message")
}

func TestKeptnWorkloadVersionReconciler_reconcileDeployment_FailedStatefulSet(t *testing.T) {
Expand All @@ -87,6 +133,7 @@ func TestKeptnWorkloadVersionReconciler_reconcileDeployment_FailedStatefulSet(t
keptnState, err := r.reconcileDeployment(context.TODO(), workloadVersion)
require.Nil(t, err)
require.Equal(t, apicommon.StateProgressing, keptnState)
require.False(t, workloadVersion.Status.DeploymentStartTime.IsZero())
}

func TestKeptnWorkloadVersionReconciler_reconcileDeployment_UnavailableStatefulSet(t *testing.T) {
Expand All @@ -105,6 +152,7 @@ func TestKeptnWorkloadVersionReconciler_reconcileDeployment_UnavailableStatefulS
keptnState, err := r.reconcileDeployment(context.TODO(), workloadVersion)
require.NotNil(t, err)
require.Equal(t, apicommon.StateUnknown, keptnState)
require.True(t, workloadVersion.Status.DeploymentStartTime.IsZero())
}

func TestKeptnWorkloadVersionReconciler_reconcileDeployment_FailedDaemonSet(t *testing.T) {
Expand All @@ -121,6 +169,7 @@ func TestKeptnWorkloadVersionReconciler_reconcileDeployment_FailedDaemonSet(t *t
keptnState, err := r.reconcileDeployment(context.TODO(), workloadVersion)
require.Nil(t, err)
require.Equal(t, apicommon.StateProgressing, keptnState)
require.False(t, workloadVersion.Status.DeploymentStartTime.IsZero())
}

func TestKeptnWorkloadVersionReconciler_reconcileDeployment_UnavailableDaemonSet(t *testing.T) {
Expand All @@ -137,6 +186,7 @@ func TestKeptnWorkloadVersionReconciler_reconcileDeployment_UnavailableDaemonSet
keptnState, err := r.reconcileDeployment(context.TODO(), workloadVersion)
require.NotNil(t, err)
require.Equal(t, apicommon.StateUnknown, keptnState)
require.True(t, workloadVersion.Status.DeploymentStartTime.IsZero())
}

func TestKeptnWorkloadVersionReconciler_reconcileDeployment_ReadyReplicaSet(t *testing.T) {
Expand All @@ -154,6 +204,7 @@ func TestKeptnWorkloadVersionReconciler_reconcileDeployment_ReadyReplicaSet(t *t
keptnState, err := r.reconcileDeployment(context.TODO(), workloadVersion)
require.Nil(t, err)
require.Equal(t, apicommon.StateSucceeded, keptnState)
require.False(t, workloadVersion.Status.DeploymentStartTime.IsZero())
}

func TestKeptnWorkloadVersionReconciler_reconcileDeployment_ReadyStatefulSet(t *testing.T) {
Expand All @@ -171,6 +222,7 @@ func TestKeptnWorkloadVersionReconciler_reconcileDeployment_ReadyStatefulSet(t *
keptnState, err := r.reconcileDeployment(context.TODO(), workloadVersion)
require.Nil(t, err)
require.Equal(t, apicommon.StateSucceeded, keptnState)
require.False(t, workloadVersion.Status.DeploymentStartTime.IsZero())
}

func TestKeptnWorkloadVersionReconciler_reconcileDeployment_ReadyDaemonSet(t *testing.T) {
Expand All @@ -187,6 +239,7 @@ func TestKeptnWorkloadVersionReconciler_reconcileDeployment_ReadyDaemonSet(t *te
keptnState, err := r.reconcileDeployment(context.TODO(), workloadVersion)
require.Nil(t, err)
require.Equal(t, apicommon.StateSucceeded, keptnState)
require.False(t, workloadVersion.Status.DeploymentStartTime.IsZero())
}

func TestKeptnWorkloadVersionReconciler_reconcileDeployment_UnsupportedReferenceKind(t *testing.T) {
Expand All @@ -200,6 +253,7 @@ func TestKeptnWorkloadVersionReconciler_reconcileDeployment_UnsupportedReference
keptnState, err := r.reconcileDeployment(context.TODO(), workloadVersion)
require.ErrorIs(t, err, controllererrors.ErrUnsupportedWorkloadVersionResourceReference)
require.Equal(t, apicommon.StateUnknown, keptnState)
require.True(t, workloadVersion.Status.DeploymentStartTime.IsZero())
}

func makeReplicaSet(name string, namespace string, wanted *int32, available int32) *appsv1.ReplicaSet {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package keptnworkloadversion

import (
"context"
"time"

argov1alpha1 "github.com/argoproj/argo-rollouts/pkg/apis/rollouts/v1alpha1"
klcv1beta1 "github.com/keptn/lifecycle-toolkit/lifecycle-operator/apis/lifecycle/v1beta1"
Expand All @@ -15,6 +16,16 @@ func (r *KeptnWorkloadVersionReconciler) reconcileDeployment(ctx context.Context
var isRunning bool
var err error

if r.isDeploymentTimedOut(workloadVersion) {
workloadVersion.Status.DeploymentStatus = apicommon.StateFailed
err = r.Client.Status().Update(ctx, workloadVersion)
if err != nil {
return apicommon.StateUnknown, err
}
r.EventSender.Emit(apicommon.PhaseWorkloadDeployment, "Warning", workloadVersion, apicommon.PhaseStateFinished, "has reached timeout", workloadVersion.GetVersion())
return workloadVersion.Status.DeploymentStatus, nil
}

switch workloadVersion.Spec.ResourceReference.Kind {
case "ReplicaSet":
isRunning, err = r.isReplicaSetRunning(ctx, workloadVersion.Spec.ResourceReference, workloadVersion.Namespace)
Expand All @@ -29,10 +40,14 @@ func (r *KeptnWorkloadVersionReconciler) reconcileDeployment(ctx context.Context
if err != nil {
return apicommon.StateUnknown, err
}

if !workloadVersion.IsDeploymentStartTimeSet() {
workloadVersion.SetDeploymentStartTime()
workloadVersion.Status.DeploymentStatus = apicommon.StateProgressing
}

if isRunning {
workloadVersion.Status.DeploymentStatus = apicommon.StateSucceeded
} else {
workloadVersion.Status.DeploymentStatus = apicommon.StateProgressing
}

err = r.Client.Status().Update(ctx, workloadVersion)
Expand All @@ -42,6 +57,16 @@ func (r *KeptnWorkloadVersionReconciler) reconcileDeployment(ctx context.Context
return workloadVersion.Status.DeploymentStatus, nil
}

func (r *KeptnWorkloadVersionReconciler) isDeploymentTimedOut(workloadVersion *klcv1beta1.KeptnWorkloadVersion) bool {
if !workloadVersion.IsDeploymentStartTimeSet() {
return false
}

deploymentDeadline := workloadVersion.Status.DeploymentStartTime.Add(r.Config.GetObservabilityTimeout().Duration)
currentTime := time.Now().UTC()
return currentTime.After(deploymentDeadline)
}

func (r *KeptnWorkloadVersionReconciler) isReplicaSetRunning(ctx context.Context, resource klcv1beta1.ResourceReference, namespace string) (bool, error) {
rep := appsv1.ReplicaSet{}
err := r.Client.Get(ctx, types.NamespacedName{Name: resource.Name, Namespace: namespace}, &rep)
Expand Down
File renamed without changes.
4 changes: 2 additions & 2 deletions test/chainsaw/non-blocking-deployment/chainsaw-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ spec:
- name: step-01
try:
- script:
content: ./verify-keptnconfig.sh
content: ./../common/verify-keptnconfig.sh
- sleep:
duration: 30s
- name: step-02
Expand All @@ -32,7 +32,7 @@ spec:
- name: step-04
try:
- script:
content: ./verify-keptnconfig.sh
content: ./../common/verify-keptnconfig.sh
- sleep:
duration: 30s
- name: step-05
Expand Down
50 changes: 50 additions & 0 deletions test/chainsaw/timeout-failure-deployment/00-assert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
apiVersion: lifecycle.keptn.sh/v1beta1
kind: KeptnAppVersion
metadata:
name: podtato-head-0.1.0-6b86b273
spec:
appName: podtato-head
revision: 1
version: 0.1.0
workloads:
- name: podtato-head-entry
version: 0.1.0
status:
currentPhase: AppDeploy
postDeploymentEvaluationStatus: Deprecated
postDeploymentStatus: Deprecated
preDeploymentEvaluationStatus: Succeeded
preDeploymentStatus: Succeeded
promotionStatus: Deprecated
status: Failed
workloadOverallStatus: Failed
---
apiVersion: lifecycle.keptn.sh/v1beta1
kind: KeptnWorkloadVersion
metadata:
generation: 1
name: podtato-head-podtato-head-entry-0.1.0
spec:
app: podtato-head
version: 0.1.0
workloadName: podtato-head-podtato-head-entry
status:
currentPhase: WorkloadDeploy
deploymentStatus: Failed
postDeploymentEvaluationStatus: Deprecated
postDeploymentStatus: Deprecated
preDeploymentEvaluationStatus: Succeeded
preDeploymentStatus: Succeeded
status: Failed
---
apiVersion: v1
kind: Pod
metadata:
annotations:
keptn.sh/app: podtato-head
keptn.sh/version: 0.1.0
keptn.sh/workload: podtato-head-entry
labels:
component: podtato-head-entry
status:
phase: Pending
28 changes: 28 additions & 0 deletions test/chainsaw/timeout-failure-deployment/00-install.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: podtato-head-entry
labels:
app: podtato-head
spec:
selector:
matchLabels:
component: podtato-head-entry
template:
metadata:
labels:
component: podtato-head-entry
annotations:
keptn.sh/app: podtato-head
keptn.sh/workload: podtato-head-entry
keptn.sh/version: 0.1.0
spec:
containers:
- name: server
image: ghcr.io/podtato-head/entry:non-existing
imagePullPolicy: Always
ports:
- containerPort: 9000
env:
- name: PODTATO_PORT
value: "9000"
Loading

0 comments on commit b9e2cd3

Please sign in to comment.