Skip to content

Commit

Permalink
Make phase condition reasons part of the API
Browse files Browse the repository at this point in the history
TaskRuns and PipelineRuns use the "Reason" field to complement the
value of the "Succeeded" condition. Those values are not part of
the API and are even owned by the underlying resource (pod) in
case of TaskRuns. This makes it difficult to rely on them to
understand that the state of the resource is.

In case of corev1.ConditionTrue, the reason can be used to
distinguish between:
- Successful
- Successful, some parts were skipped (pipelinerun only)

In case of corev1.ConditionFalse, the reason can be used to
distinguish between:
- Failed
- Failed because of timeout
- Failed because of cancelled by the user

In case of corev1.ConditionUnknown, the reason can be used to
distinguish between:
- Just started reconciling
- Validation done, running (or still running)
- Cancellation requested

This is implemented through the following changes:
- Bubble-up reasons for taskrun and pipelinerun to the
  v1beta1 API, except for reason that are defined by the
  underlying resource
- Enforce the start reason to be set during condition init

This allows for an additional change in the eventing module: the
condition before and after can be used to decide whether to send
an event at all. If they are different, the after condition now
contains enough information to send the event.

The cloudevent module is extended with ability to send the correct
event based on both status and reason.
  • Loading branch information
afrittoli committed May 27, 2020
1 parent 705c84c commit 232dea7
Show file tree
Hide file tree
Showing 18 changed files with 294 additions and 120 deletions.
65 changes: 65 additions & 0 deletions docs/pipelineruns.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ weight: 4
- [Specifying `Workspaces`](#specifying-workspaces)
- [Specifying `LimitRange` values](#specifying-limitrange-values)
- [Configuring a failure timeout](#configuring-a-failure-timeout)
- [Monitoring execution status](#monitoring-execution-status)
- [Cancelling a `PipelineRun`](#cancelling-a-pipelinerun)
- [Events](events.md#pipelineruns)

Expand Down Expand Up @@ -337,6 +338,70 @@ The `timeout` value is a `duration` conforming to Go's
values are `1h30m`, `1h`, `1m`, and `60s`. If you set the global timeout to 0, all `PipelineRuns`
that do not have an idividual timeout set will fail immediately upon encountering an error.

## Monitoring execution status

As your `PipelineRun` executes, its `status` field accumulates information on the execution of each `TaskRun`
as well as the `PipelineRun` as a whole. This information includes the name of the pipeline `Task` associated
to a `TaskRun`, the complete [status of the `TaskrRun`](taskruns.md#monitoring-execution-status) and details
about `Conditions` that may be associated to a `TaskRun`.

The following example shows an extract from tje `status` field of a `PipelineRun` that has executed successfully:

```yaml
completionTime: "2020-05-04T02:19:14Z"
conditions:
- lastTransitionTime: "2020-05-04T02:19:14Z"
message: 'Tasks Completed: 4, Skipped: 0'
reason: Succeeded
status: "True"
type: Succeeded
startTime: "2020-05-04T02:00:11Z"
taskRuns:
triggers-release-nightly-frwmw-build-ng2qk:
pipelineTaskName: build
status:
completionTime: "2020-05-04T02:10:49Z"
conditions:
- lastTransitionTime: "2020-05-04T02:10:49Z"
message: All Steps have completed executing
reason: Succeeded
status: "True"
type: Succeeded
podName: triggers-release-nightly-frwmw-build-ng2qk-pod-8vj99
resourcesResult:
- key: commit
resourceRef:
name: git-source-triggers-frwmw
value: 9ab5a1234166a89db352afa28f499d596ebb48db
startTime: "2020-05-04T02:05:07Z"
steps:
- container: step-build
imageID: docker-pullable://golang@sha256:a90f2671330831830e229c3554ce118009681ef88af659cd98bfafd13d5594f9
name: build
terminated:
containerID: docker://6b6471f501f59dbb7849f5cdde200f4eeb64302b862a27af68821a7fb2c25860
exitCode: 0
finishedAt: "2020-05-04T02:10:45Z"
reason: Completed
startedAt: "2020-05-04T02:06:24Z"
```

The following tables shows how to read the overall status of a `PipelineRun`:

`status`|`reason`|`completionTime` is set|PipelineRun status
:-------|:-------|:---------------------:|--------------:
Unknown|Started|No|The `PipelineRun` has just been picked up by the controller.
Unknown|Running|No|The `PipelineRun` has been validate and started to perform its work.
Unknown|PipelineRunCancelled|No|The user requested the PipelineRun to be cancelled. Cancellation has not be done yet.
True|Succeeded|Yes|The `PipelineRun` completed successfully.
False|Failed|Yes|The `PipelineRun` failed because one of the `TaskRuns` failed.
False|\[Error message\]|Yes|The `PipelineRun` failed with a permanent error (usually validation).
False|PipelineRunCancelled|Yes|The `PipelineRun` was cancelled successfully.
False|PipelineRunTimeout|Yes|The `PipelineRun` timed out.
False|\[Error message\]|No|The `PipelineRun` encountered an error, but it's still running.

When a `PipelineRun` changes status, [events](events.md#pipelineruns) are triggered accordingly.

## Cancelling a `PipelineRun`

To cancel a `PipelineRun` that's currently executing, update its definition
Expand Down
20 changes: 18 additions & 2 deletions docs/taskruns.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ point for the `Pod` in which the container images specified in your `Task` will
customize the `Pod` configuration specifically for that `TaskRun`.

In the following example, the `Task` specifies a `volumeMount` (`my-cache`) object, also provided by the `TaskRun`,
using a `PersistentVolumeClaim` volume. A specific scheduler is also configured in the `SchedulerName` field.
using a `PersistentVolumeClaim` volume. A specific scheduler is also configured in the `SchedulerName` field.
The `Pod` executes with regular (non-root) user permissions.

```yaml
Expand Down Expand Up @@ -281,7 +281,7 @@ For more information, see [`ServiceAccount`](auth.md).
## Monitoring execution status

As your `TaskRun` executes, its `status` field accumulates information on the execution of each `Step`
as well as the `TaskRun` as a whole. This information includes start and stop times, exit codes, the
as well as the `TaskRun` as a whole. This information includes start and stop times, exit codes, the
fully-qualified name of the container image, and the corresponding digest.

**Note:** If any `Pods` have been [`OOMKilled`](https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/)
Expand Down Expand Up @@ -311,6 +311,22 @@ steps:
startedAt: "2019-08-12T18:22:54Z"
```

The following tables shows how to read the overall status of a `TaskRun`:

`status`|`reason`|`completionTime` is set|TaskRun status
:-------|:-------|:---------------------:|--------------:
Unknown|Started|No|The TaskRun has just been picked up by the controller.
Unknown|Running|No|The TaskRun has been validate and started to perform its work.
Unknown|TaskRunCancelled|No|The user requested the TaskRun to be cancelled. Cancellation has not be done yet.
True|Succeeded|Yes|The TaskRun completed successfully.
False|Failed|Yes|The TaskRun failed because one of the steps failed.
False|\[Error message\]|Yes|The TaskRun failed with a permanent error (usually validation).
False|TaskRunCancelled|Yes|The TaskRun was cancelled successfully.
False|TaskRunTimeout|Yes|The TaskRun timed out.
False|\[Error message\]|No|The TaskRun encountered an error, but it's still running.

When a `TaskRun` changes status, [events](events.md#taskruns) are triggered accordingly.

### Monitoring `Steps`

If multiple `Steps` are defined in the `Task` invoked by the `TaskRun`, you can monitor their execution
Expand Down
33 changes: 32 additions & 1 deletion pkg/apis/pipeline/v1beta1/pipelinerun_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,32 @@ type PipelineRunStatus struct {
PipelineRunStatusFields `json:",inline"`
}

// PipelineRunSucceededReason represents a reason for the pipeline run "Succeeded" condition
type PipelineRunSucceededReason string

const (
// PipelineRunSucceededReasonStarted is the reason set when the PipelineRun has just started
PipelineRunSucceededReasonStarted PipelineRunSucceededReason = "Started"
// PipelineRunSucceededReasonRunning is the reason set when the PipelineRun is running
PipelineRunSucceededReasonRunning PipelineRunSucceededReason = "Running"
// PipelineRunSucceededReasonSuccessful is the reason set when the PipelineRun completed successfully
PipelineRunSucceededReasonSuccessful PipelineRunSucceededReason = "Succeeded"
// PipelineRunSucceededReasonCompleted is the reason set when the PipelineRun completed successfully with one or more skipped Tasks
PipelineRunSucceededReasonCompleted PipelineRunSucceededReason = "Completed"
// PipelineRunSucceededReasonFailed is the reason set when the PipelineRun completed with a failure
PipelineRunSucceededReasonFailed PipelineRunSucceededReason = "Failed"
// PipelineRunSucceededReasonCancelled is the reason set when the PipelineRun cancelled by the user
// This reason may be found with a corev1.ConditionFalse status, if the cancellation was processed successfully
// This reason may be found with a corev1.ConditionUnknown status, if the cancellation is being processed or failed
PipelineRunSucceededReasonCancelled PipelineRunSucceededReason = "Cancelled"
// PipelineRunSucceededReasonTimedOut is the reason set when the PipelineRun has timed out
PipelineRunSucceededReasonTimedOut PipelineRunSucceededReason = "PipelineRunTimeout"
)

func (t PipelineRunSucceededReason) String() string {
return string(t)
}

var pipelineRunCondSet = apis.NewBatchConditionSet()

// GetCondition returns the Condition matching the given type.
Expand All @@ -236,7 +262,12 @@ func (pr *PipelineRunStatus) InitializeConditions() {
if pr.StartTime.IsZero() {
pr.StartTime = &metav1.Time{Time: time.Now()}
}
pipelineRunCondSet.Manage(pr).InitializeConditions()
conditionManager := pipelineRunCondSet.Manage(pr)
conditionManager.InitializeConditions()
// Ensure the started reason is set for the "Succeeded" condition
initialCondition := conditionManager.GetCondition(apis.ConditionSucceeded)
initialCondition.Reason = TaskRunSucceededReasonStarted.String()
conditionManager.SetCondition(*initialCondition)
}

// SetCondition sets the condition, unsetting previous conditions with the same
Expand Down
28 changes: 28 additions & 0 deletions pkg/apis/pipeline/v1beta1/run_interface.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,20 +23,48 @@ import (

// RunsToCompletionStatus is implemented by TaskRun.Status and PipelineRun.Status
type RunsToCompletionStatus interface {

// GetCondition returns the Condition for the specified ConditionType
GetCondition(t apis.ConditionType) *apis.Condition

// InitializeConditions is used to set up the initial conditions for the
// RunsToCompletion when it's initially started
InitializeConditions()

// SetCondition is used to set up a conditions for the specified ConditionType
SetCondition(newCond *apis.Condition)
}

// RunsToCompletion is implemented by TaskRun and PipelineRun
type RunsToCompletion interface {

// GetTypeMeta returns the TypeMeta
GetTypeMeta() *metav1.TypeMeta

// GetObjectMeta returns the ObjectMeta
GetObjectMeta() *metav1.ObjectMeta

// GetOwnerReference returns the RunsToCompletion as owner reference for any related object
GetOwnerReference() metav1.OwnerReference

// GetStatus returns the status as RunsToCompletionStatus
GetStatus() RunsToCompletionStatus

// IsDone returns true once the reconcile work on the resource is complete
// except for postrun actions (stop timeout timer, emit events, record metrics)
IsDone() bool

// HasStarted returns true after the RunsToCompletion has been reconciled for
// the first time. It must be true after InitializeConditions has been invoked
// on the associated RunsToCompletionStatus
HasStarted() bool

// IsCancelled returns true if the user marked the RunsToCompletion for cancellation
IsCancelled() bool

// HasTimedOut returns true once the RunsToCompletion has passed its maximum run time
HasTimedOut() bool

// GetRunKey returns the RunsToCompletion keuy which is equeued in the controller queue
GetRunKey() string
}
52 changes: 45 additions & 7 deletions pkg/apis/pipeline/v1beta1/taskrun_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -72,10 +72,6 @@ const (
// TaskRunSpecStatusCancelled indicates that the user wants to cancel the task,
// if not already cancelled or terminated
TaskRunSpecStatusCancelled = "TaskRunCancelled"

// TaskRunReasonCancelled indicates that the TaskRun has been cancelled
// because it was requested so by the user
TaskRunReasonCancelled = "TaskRunCancelled"
)

// TaskRunInputs holds the input values that this task was invoked with.
Expand All @@ -102,6 +98,43 @@ type TaskRunStatus struct {
TaskRunStatusFields `json:",inline"`
}

// TaskRunSucceededReason is an enum used to store all TaskRun reason for
// the Succeeded condition that are controlled by the TaskRun itself. Failure
// reasons that emerge from underlying resources are not included here
type TaskRunSucceededReason string

const (
// TaskRunSucceededReasonStarted is the reason set when the TaskRun has just started
TaskRunSucceededReasonStarted TaskRunSucceededReason = "Started"
// TaskRunSucceededReasonRunning is the reason set when the TaskRun is running
TaskRunSucceededReasonRunning TaskRunSucceededReason = "Running"
// TaskRunSucceededReasonSuccessful is the reason set when the TaskRun completed successfully
TaskRunSucceededReasonSuccessful TaskRunSucceededReason = "Succeeded"
// TaskRunSucceededReasonFailed is the reason set when the TaskRun completed with a failure
TaskRunSucceededReasonFailed TaskRunSucceededReason = "Failed"
// TaskRunSucceededReasonCancelled is the reason set when the Taskrun is cancelled by the user
TaskRunSucceededReasonCancelled TaskRunSucceededReason = "TaskRunCancelled"
// TaskRunSucceededReasonTimedOut is the reason set when the Taskrun has timed out
TaskRunSucceededReasonTimedOut TaskRunSucceededReason = "TaskRunTimeout"
)

func (t TaskRunSucceededReason) String() string {
return string(t)
}

// GetStartedReason returns the reason set to the "Succeeded" condition when
// InitializeConditions is invoked
func (trs *TaskRunStatus) GetStartedReason() string {
return TaskRunSucceededReasonStarted.String()
}

// GetRunningReason returns the reason set to the "Succeeded" condition when
// the RunsToCompletion starts running. This is used indicate that the resource
// could be validated is starting to perform its job.
func (trs *TaskRunStatus) GetRunningReason() string {
return TaskRunSucceededReasonRunning.String()
}

// MarkResourceNotConvertible adds a Warning-severity condition to the resource noting
// that it cannot be converted to a higher version.
func (trs *TaskRunStatus) MarkResourceNotConvertible(err *CannotConvertError) {
Expand All @@ -116,11 +149,11 @@ func (trs *TaskRunStatus) MarkResourceNotConvertible(err *CannotConvertError) {

// MarkResourceFailed sets the ConditionSucceeded condition to ConditionFalse
// based on an error that occurred and a reason
func (trs *TaskRunStatus) MarkResourceFailed(reason string, err error) {
func (trs *TaskRunStatus) MarkResourceFailed(reason TaskRunSucceededReason, err error) {
taskRunCondSet.Manage(trs).SetCondition(apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionFalse,
Reason: reason,
Reason: reason.String(),
Message: err.Error(),
})
}
Expand Down Expand Up @@ -211,7 +244,12 @@ func (trs *TaskRunStatus) InitializeConditions() {
if trs.StartTime.IsZero() {
trs.StartTime = &metav1.Time{Time: time.Now()}
}
taskRunCondSet.Manage(trs).InitializeConditions()
conditionManager := taskRunCondSet.Manage(trs)
conditionManager.InitializeConditions()
// Ensure the started reason is set for the "Succeeded" condition
initialCondition := conditionManager.GetCondition(apis.ConditionSucceeded)
initialCondition.Reason = TaskRunSucceededReasonStarted.String()
conditionManager.SetCondition(*initialCondition)
}

// SetCondition sets the condition, unsetting previous conditions with the same
Expand Down
22 changes: 4 additions & 18 deletions pkg/pod/status.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,6 @@ const (
// that taskrun failed runtime validation
ReasonFailedValidation = "TaskRunValidationFailed"

// ReasonRunning indicates that the reason for the inprogress status is that the TaskRun
// is just starting to be reconciled
ReasonRunning = "Running"

// ReasonTimedOut indicates that the TaskRun has taken longer than its configured timeout
ReasonTimedOut = "TaskRunTimeout"

// ReasonExceededResourceQuota indicates that the TaskRun failed to create a pod due to
// a ResourceQuota in the namespace
ReasonExceededResourceQuota = "ExceededResourceQuota"
Expand All @@ -68,13 +61,6 @@ const (
// is that the creation of the pod backing the TaskRun failed
ReasonPodCreationFailed = "PodCreationFailed"

// ReasonSucceeded indicates that the reason for the finished status is that all of the steps
// completed successfully
ReasonSucceeded = "Succeeded"

// ReasonFailed indicates that the reason for the failure status is unknown or that one of the steps failed
ReasonFailed = "Failed"

//timeFormat is RFC3339 with millisecond
timeFormat = "2006-01-02T15:04:05.000Z07:00"
)
Expand Down Expand Up @@ -114,7 +100,7 @@ func MakeTaskRunStatus(logger *zap.SugaredLogger, tr v1beta1.TaskRun, pod *corev
trs.SetCondition(&apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionUnknown,
Reason: ReasonRunning,
Reason: v1beta1.TaskRunSucceededReasonRunning.String(),
Message: "Not all Steps in the Task have finished executing",
})
}
Expand Down Expand Up @@ -197,14 +183,14 @@ func updateCompletedTaskRun(trs *v1beta1.TaskRunStatus, pod *corev1.Pod) {
trs.SetCondition(&apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionFalse,
Reason: ReasonFailed,
Reason: v1beta1.TaskRunSucceededReasonFailed.String(),
Message: msg,
})
} else {
trs.SetCondition(&apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionTrue,
Reason: ReasonSucceeded,
Reason: v1beta1.TaskRunSucceededReasonSuccessful.String(),
Message: "All Steps have completed executing",
})
}
Expand All @@ -219,7 +205,7 @@ func updateIncompleteTaskRun(trs *v1beta1.TaskRunStatus, pod *corev1.Pod) {
trs.SetCondition(&apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionUnknown,
Reason: ReasonRunning,
Reason: v1beta1.TaskRunSucceededReasonRunning.String(),
Message: "Not all Steps in the Task have finished executing",
})
case corev1.PodPending:
Expand Down
Loading

0 comments on commit 232dea7

Please sign in to comment.