Skip to content

Commit

Permalink
[TEP-0050] Add proposed design details
Browse files Browse the repository at this point in the history
This commit proposes adding a new field "OnError" to "PipelineTask" definition to allow users define task failure strategy
  • Loading branch information
QuanZhang-William committed Aug 18, 2022
1 parent 3294182 commit cfe56e8
Show file tree
Hide file tree
Showing 2 changed files with 292 additions and 3 deletions.
293 changes: 291 additions & 2 deletions teps/0050-ignore-task-failures.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
---
status: proposed
status: implementable
title: 'Ignore Task Failures'
creation-date: '2021-02-05'
last-updated: '2021-02-19'
last-updated: '2022-08-18'
authors:
- '@pritidesai'
- '@skaegi'
- '@QuanZhang-William'
---

# TEP-0050: Ignore Task Failures
Expand All @@ -17,6 +18,14 @@ authors:
- [Non-Goals](#non-goals)
- [Requirements](#requirements)
- [Use Cases](#use-cases)
- [Proposal](#proposal)
- [Ignored Failed Tasks with Retry](#ignored-failed-tasks-with-retry)
- [Tasks with Resource Dependency](#tasks-with-resource-dependency)
- [Task with ```OnError: stopAndFail``` refers to unavailable results](#task-with-onerror-stopandfail-refers-to-unavailable-results)
- [Task with ```OnError: continue``` refers to unavailable results](#task-with-onerror-continue-refers-to-unavailable-results)
- [Alternatives](#alternatives)
- [A bool flag](#a-bool-flag)
- [A list of ignorable fail tasks in PipelineSpec](#a-list-of-ignorable-fail-tasks-in-pipelinespec)
- [References](#references)
<!-- /toc -->

Expand Down Expand Up @@ -187,6 +196,286 @@ control over the task definitions but may desire to ignore a failure and continu

![Jenkins Dashboard](images/0050-jenkins-dashboard-with-failure-stage.png)

## Proposal
We propose a new filed ```OnError``` to the [PipelineTask](https://github.com/tektoncd/pipeline/blob/main/docs/pipelines.md#adding-tasks-to-the-pipeline) definition.

```go
type PipelineTask struct {
Name string `json:"name,omitempty"`

//...

// OnError defines the termination behavior of a pipeline when the task is on error
// can be set to [ continue | stopAndFail ]
OnError OnErrorType `json:onError, "omitempty"`
}
```

```go
type OnErrorType string

const (
// StopAndFail indicates to stop the pipeline if the task is failed
StopAndFail OnErrorType = "stopAndFail"
// Continue indicates to continue executing the rest of the pipeline irrespective of the status of the task
Continue OnErrorType = "continue"
)
```

Pipeline author can set the ```OnError``` field to configure the task failure strategy. If set to ```StopAndFail```, the pipeline is stopped and failed when the task is failed. If set to ```Continue```, the failure of task is ignored and the pipeline continues to execute the rest of the DAG.

```yaml
- name: task1
onError: continue
taskSpec:
steps:
- image: alpine
name: exit-with-1
script: |
exit 1
```
This new field ```OnError``` will be implemented as a alpha feature and can be enabled by setting enable-api-fields to alpha.

Setting ```OnError``` is optional, the default pipeline behaviour is ```StopAndFail```

The task run information is avaialble under the ```pipelineRun.status.taskRuns```. Note that the original task run status remains as it is irrelevant of the value of ```OnError``` (i.e. a failed task with ```onError: continue``` is still marked as failed). The task would be considered "successful" ONLY for the purposes of determining the status of the pipeline, which is represented in ```pipelineRun.status.conditions```


To distinguish pipeline run messages with and without ignored task failures, we explicitly add the ignored task failure count to ```pipelineRun.status.conditions.message``` in the following way if ignored task failure > 0:

```
"Tasks Completed: A (Failed: B (C is ignored), Cancelled D), Skipped: E"
```

Example Input:
```yaml
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
name: demo-pipeline-run
spec:
pipelineSpec:
tasks:
- name: task1
onError: continue
taskSpec:
steps:
- image: alpine
name: exit-with-1
script: |
exit 1
- name: task2
taskSpec:
steps:
- image: alpine
name: exit-with-0
script: |
exit 0
```

Example Output:
```yaml
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
...
status:
completionTime: "2022-08-15T17:26:15Z"
conditions:
- lastTransitionTime: "2022-08-15T17:26:15Z"
message: "Tasks Completed: 2 (Failed: 1 (1 is ignored), Cancelled 0), Skipped: 0"
reason: Succeeded
status: "True" # The failed task is considered "successful" when determining the state of pipelineRun
type: Succeeded
pipelineSpec:
...
taskRuns:
demo-pipeline-run-task1:
pipelineTaskName: task1
status:
completionTime: "2022-08-15T17:26:13Z"
conditions:
- lastTransitionTime: "2022-08-15T17:26:13Z"
message: ...
reason: Failed
status: "False" # The task is failed even OnError is set to continue
type: Succeeded
...
demo-pipeline-run-task2:
pipelineTaskName: task2
status:
completionTime: "2022-08-15T17:26:15Z"
conditions:
- lastTransitionTime: "2022-08-15T17:26:15Z"
message: All Steps have completed executing
reason: Succeeded
status: "True"
type: Succeeded
...
```

### Ignored Failed Tasks with Retry
Setting ```Retry``` and ```OnError``` to ```continue``` at the same time is not allowed, as there is no point to retry a task that allows to fail. Pipeline validation will be added accordingly.

### Tasks with Resource Dependency
#### Task with ```OnError: stopAndFail``` refers to unavailable results
In the following example, the first task fails to produce a result ([details](https://github.com/tektoncd/pipeline/issues/3749)) that is going to be consumed by the second task with ```OnError:stopAndFail```. The second task will not be scheduled and the pipeline fails with reason ```InvalidTaskResultReference```

Input
```yaml
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
name: test-case-nzrlj
spec:
pipelineSpec:
tasks:
- name: generate-suffix
onError: continue
taskSpec:
results:
- name: suffix
steps:
- name: generate-suffix
image: alpine
script: |
echo -n "suffix" > $(results.suffix.path)
exit 1
- name: concat
onError: stopAndFail
taskSpec:
params:
- name: arg
steps:
- name: concat
image: alpine
script: |
echo "$(params.arg)"
params:
- name: arg
value: "prefix:$(tasks.generate-suffix.results.suffix)"
```
Output
```yaml
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
...
status:
completionTime: "2022-08-18T14:55:53Z"
conditions:
- lastTransitionTime: "2022-08-18T14:55:53Z"
message: task "generate-suffix" referenced by result was not successful
reason: InvalidTaskResultReference
status: "False"
type: Succeeded
taskRuns:
test-case-nzrlj-generate-suffix:
pipelineTaskName: generate-suffix
status:
completionTime: "2022-08-18T14:55:53Z"
conditions:
- lastTransitionTime: "2022-08-18T14:55:53Z"
message: ...
reason: Failed
status: "False"
type: Succeeded
...
```

#### Task with ```OnError: continue``` refers to unavailable results
In the following example, the first task fails to produce a result ([details](https://github.com/tektoncd/pipeline/issues/3749)) that is going to be consumed by the second task with ```OnError: continue```. The second task will not be skipped with reason ```Results were missing``` and the pipeline successes

Input
```yaml
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
name: test-case-nzrlj
spec:
pipelineSpec:
tasks:
- name: generate-suffix
onError: continue
taskSpec:
results:
- name: suffix
steps:
- name: generate-suffix
image: alpine
script: |
echo -n "suffix" > $(results.suffix.path)
exit 1
- name: concat
onError: continue
taskSpec:
params:
- name: arg
steps:
- name: concat
image: alpine
script: |
echo "$(params.arg)"
params:
- name: arg
value: "prefix:$(tasks.generate-suffix.results.suffix)"
```

Output
```yaml
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
...
status:
completionTime: "2022-08-18T15:08:25Z"
conditions:
- lastTransitionTime: "2022-08-18T15:08:25Z"
message: 'Tasks Completed: 1 (Failed: 1 (1 is ignored), Cancelled 0), Skipped:
1'
reason: Completed
status: "True"
type: Succeeded
skippedTasks:
- name: concat
reason: Results were missing
startTime: "2022-08-18T15:08:19Z"
taskRuns:
test-case-nzrlk-generate-suffix:
pipelineTaskName: generate-suffix
status:
completionTime: "2022-08-18T15:08:25Z"
conditions:
- lastTransitionTime: "2022-08-18T15:08:25Z"
message: ...
reason: Failed
status: "False"
type: Succeeded
...
```

## Alternatives
### A bool flag
Use a boolean flag indicating to ignore a task failure or not.

### A list of ignorable fail tasks in PipelineSpec
Add a new field ```IgnoreFailureTasks``` in ```PipelineSpec``` indicating the list of tasks that should not block the execution of the Pipeline when failed

```go
type PipelineSpec struct {
Description string `json:"description,omitempty"`
Resources []PipelineDeclaredResource `json:"resources,omitempty"`
Tasks []PipelineTask `json:"tasks,omitempty"`
Params []ParamSpec `json:"params,omitempty"`
Workspaces []PipelineWorkspaceDeclaration `json:"workspaces,omitempty"`
Results []PipelineResult `json:"results,omitempty"`
Finally []PipelineTask `json:"finally,omitempty"`

IgnoreFailureTasks []string `json:"ignoreFailureTasks,omitempty"`
}

```


## References

* [TEP-0040 Ignore Step Errors](https://github.com/tektoncd/community/pull/302)
Expand Down
2 changes: 1 addition & 1 deletion teps/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ This is the complete list of Tekton teps:
|[TEP-0047](0047-pipeline-task-display-name.md) | Pipeline Task Display Name | implementable | 2022-01-04 |
|[TEP-0048](0048-task-results-without-results.md) | Task Results without Results | implementable | 2022-08-09 |
|[TEP-0049](0049-aggregate-status-of-dag-tasks.md) | Aggregate Status of DAG Tasks | implemented | 2021-06-03 |
|[TEP-0050](0050-ignore-task-failures.md) | Ignore Task Failures | proposed | 2021-02-19 |
|[TEP-0050](0050-ignore-task-failures.md) | Ignore Task Failures | implementable | 2022-08-18 |
|[TEP-0051](0051-ppc64le-architecture-support.md) | ppc64le Support | proposed | 2021-01-28 |
|[TEP-0052](0052-tekton-results-automated-run-resource-cleanup.md) | Tekton Results: Automated Run Resource Cleanup | implementable | 2021-03-22 |
|[TEP-0053](0053-nested-triggers.md) | Nested Triggers | implementable | 2021-04-15 |
Expand Down

0 comments on commit cfe56e8

Please sign in to comment.