-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design: Failure Strategy for TaskRuns in a PipelineRun #1684
Comments
Thanks for getting this started!!! - name: integration
taskRef:
name: run-integration-tests
runAfter: uts
errorStrategy: RunNextTasks # allow cleanup to occur
- name: cleanup
taskRef:
name: cleanup-integration-test-junk
runAfter: integration A concern about this is for these two use cases:
It seems like it doesn't work if there is one more Task in the Pipeline, e.g. (totally contrived?) but something like: - name: integration
runAfter: uts
errorStrategy: RunNextTasks # allow cleanup to occur
- name: integration2 # pretend there was another set of tests?
runAfter: uts
errorStrategy: RunNextTasks # allow cleanup to occur
- name: cleanup
taskRef:
name: cleanup-integration-test-junk
runAfter: integration
A couple different ideas:
|
(I do think an |
One workaround might be that integration2 is a conditional task that only runs if the previous step is successful? That being said, if this is a common patter, I think a separate |
Defining I am biased towards (2). Defining these error strategies based on my understanding so far:
Also, here we have to be explicit and define what I have collected my thoughts here based on talking to @dibyom on slack, step PR from @sbwsg, comments from issues themselves and working group recordings. |
@pritidesai Thanks for writing this up....I think your examples for error strategies are for tasks defining their own behavior and not for the subsequent tasks. With that approach how do we model the scenario that @bobcatfish mentioned above i.e. we have 3 tasks running sequentially A -> B -> C in the happy case but when A fails, we want to jump to C Also, another idea @sbwsg had was adding the errorStrategies in the
|
how about modeling the scenario that @bobcatfish mentioned above with:
woo, I like errorStrategies in |
Another alternative to consider: Go's defer and recover keywords model quite similar behaviour to what we're discussing here. I can imagine DeferredPipelineTask# In this example, a "deferred" task is used to clean up environment after integration tests.
# Deferred tasks run regardless of outcome in prior tasks
spec:
tasks:
- name: integration-tests # can fail!
taskRef:
name: run-some-tests
- name: cleanup-integration-environment
deferred: true # will run regardless of failure in integration-tests. Will not run if integration-tests is never run (i.e. because a task prior to integration-tests failed)
runAfter: integration-tests
taskRef:
name: delete-integration-namespaces RecoveryPipelineTask# In this example, a "recovery" task is used to handle errors during deployment to staging.
# Recovery tasks only execute if the task they runAfter fails
spec:
tasks:
- name: deploy-to-staging
taskRef:
name: deploy-to-k8s
- name: rollback-staging
recovery: true # will run only if deploy-to-staging fails
runAfter: deploy-to-staging
taskRef:
name: rollback-deployment Two further tweaks to this idea: First, a Also worth keeping in mind that while a DeferredPipelineTask or RecoverPipelineTask needs to be explicitly marked as such, I think they would also be allowed to be "roots" of their own trees. In other words another task could be So I think this would cover the following scenarios:
The deferred and recovery keys would need to be either-or in the yaml. I don't think you can support both recovery: true and deferred: true on the same task. What I most like about this approach is that:
|
Another phrasing of the above approach that @dibyom and I discussed would be to use keywords for defer / recover / skip (the default):
This ^ says that Having thought about it for a couple days I'm still pretty sure we could describe all of the use cases we've talked about so far with just these three strategies. |
thanks @sbwsg, Also,
|
Overall I like the idea of defining |
I agree, the keywords don't make much sense in isolation. How about "AlwaysRun" (defer), "RunOnFail" (recover), and "RunOnSuccess" (Tekton's current behaviour)?
I think the analogy here with go's defer breaks down. I somewhat regret drawing the comparison. In my mind the strategy only describes a single relationship between a task and its "parents" (those it declares with "runAfter" or "from"). iow given the following tasks: - name: Task A
- name: Task B
runAfter:
- Task A
strategy: RunOnFail # Task B only executes if Task A errors out
- name: Task C
runAfter:
- Task B
strategy: AlwaysRun I expect the following behaviour:
So I think that's another reason why using the go keywords probably doesn't make sense after all - they don't map perfectly on to Tekton's meanings. But AlwaysRun / RunOnFail / RunOnSuccess are a bit clearer maybe, especially when we consider them paired with |
Hrm. AlwaysRun# This pipeline pings a URL when the pipeline finishes.
# This ping happens regardless of the pipeline's outcome.
apiVersion: tekton.dev/v1alpha1
kind: Pipeline
metadata:
name: test-pipeline
spec:
tasks:
- name: ping-url-on-complete
taskRef:
name: send-ping
strategy: AlwaysRun # AlwaysRun without a runAfter. Executes at end of pipeline.
- name: uts
taskRef:
name: run-unit-tests
- name: integration
taskRef:
name: run-integration-tests
runAfter: uts Deferred# This pipeline pings a URL when the pipeline finishes.
# This ping happens regardless of the pipeline's outcome.
apiVersion: tekton.dev/v1alpha1
kind: Pipeline
metadata:
name: test-pipeline
spec:
tasks:
- name: ping-url-on-complete
taskRef:
name: send-ping
strategy: Deferred # Deferred without a runAfter. Executes at end of pipeline.
- name: uts
taskRef:
name: run-unit-tests
- name: integration
taskRef:
name: run-integration-tests
runAfter: uts |
Yes I agree, |
I kind of ran these strategies against the pipeline example you have and it looks something like this:
|
|
Adding one more use case to build Javascript application and/or Java application depending on the runtime of an application: Using strategy here to make sure the task and pipelinerun doesnt report failure if condition fails and the chain of tasks doesnt get executed.
|
Another slightly different case: For things like updating GitHub status notifications it would be nice if we could do something like the following...admittedly this is a bit repetitive, but passing the "success" or "failure" of a task might work with the "recover" strategy mentioned earlier, which would mean that after each task, somehow it'd use the success/failure of the previous task to update the GitHub status appropriately. Updating these kinds of statuses would be really useful if you want your pipeline to determine whether or not a commit can be merged (if you're not familiar with these, you can require specific contexts to be successful before a PR can be merged). This also adds a The example below would trigger two parallel executions (lint and tests), which would report in their status to GitHub. apiVersion: tekton.dev/v1alpha1
kind: Pipeline
metadata:
name: pullrequest-pipeline
spec:
runAfter:
taskRef: cleanup-post-pullrequest
tasks:
- name: start-github-ci-status
taskRef:
name: update-github-status
params:
- name: STATUS
value: pending
- name: CONTEXT
value: ci-tests
- name: COMMIT_SHA
value: $(inputs.params.commit_sha)
- name: run-tests
taskRef:
name: golang-test
errorStrategy:
taskRef: update-commit-status
params:
- name: STATUS
value: failed
- name: CONTEXT
value: ci-tests
- name: COMMIT_SHA
value: $(inputs.params.commit_sha)
- name: mark-github-ci-status-success
runAfter:
- run-tests
taskRef:
name: update-github-status
params:
- name: STATUS
value: success
- name: CONTEXT
value: ci-tests
- name: COMMIT_SHA
value: $(inputs.params.commit_sha)
# repeat pending for ci-lint context
- name: run-lint
taskRef:
name: golangci-lint
errorStrategy:
taskRef: update-commit-status
params:
- name: STATUS
value: failed
- name: CONTEXT
value: ci-lint
- name: COMMIT_SHA
value: $(inputs.params.commit_sha)
# repeat success for ci-lint context |
One specific use case that I don't think has been explicitly mentioned above is in a fan-in/out scenario. For example, if my pipeline is
Here, I would want to always run the To me, this reads a lot like conditional execution but more like, conditional failure. Perhaps, this could be served as an extension to the conditions that already exist. If you wanted to "always execute B after A" your condition could simply always return true to override the default behavior of "execute B after A if A is successful" |
@pritidesai @bigkevmcd @pierretasci Thanks a lot for adding such detailed use cases! Very very helpful 🙏
|
One idea - instead of - name: task1
conditions:
conditionRef: "condition-that-sometime-fails"
taskRef: { name: "my-task" }
- name: runIfTask1Fails
runAfter: task1
runOn: ["failure"]
- name: runIfTask1Succeeds
runAfter: task1
runOn: ["success"]
- name: runIfTask1IsSkipped
runAfter: task1
runOn: ["skip"] What I like about this is that the field name is more succinct and for the user instead of having to remember a bunch of magic strings (is it A few more examples here: https://gist.github.com/dibyom/92dfd6ea20f13f5c769a21389df53977 |
I feel that
If I understood correctly, the #2134 could help us organize this a little bit better, but the failure strategy would be the same: if one pipeline fails, it could still interrupt the schedule of other pipelines (instead of tasks) |
@RafaeLeal we have multiple TEP proposed around
This can be addressed by ignoring non-required PR checks execution status with TEP-0050
Combining pipeline in pipeline with allowing task or sub-pipeline to fail but continue executing rest of the graph could help address this. Please let me know if these three proposals does not help solve your use cases. |
We have a strict requirement of not stopping the pipeline if any task is failing and it can not be achieved through |
@email2smohanty the current behaviour of If I understand correctly, you would like the There are some features in Tekton today that you could use to achieve something like that - as mentioned in #1684 (comment) - but they require changes to Tasks and Pipeline. If you need this feature, would you mind filing a separate issue about it? |
Hey @email2smohanty we have proposal in |
Yes @pritidesai I am looking for a feature which you have mentioned i.e ignoring a task failure at the pipeline authoring time. Currently is this feature available as an alpha feature? |
@afrittoli thanks for responding to my comment, you mentioned that There are some features in Tekton today that we could use to achieve something like that - as mentioned in #1684 (comment). But my understanding is that whatever the features mentioned in said comment are in |
Hey @email2smohanty this feature is not implemented yet. We are looking for help or if someone is available we can guide on how to implement this. Once implemented, yes it will be an alpha feature. |
Hey @pritidesai! I just stumbled upon this since our organization is also looking to have this feature of continuing a pipeline (executing tasks serially) in case one of the tasks fail. I would be much happy to help, could you please guide me a bit with where to start? |
@samagana -- @pritidesai is ooo right now here's the proposal to solve this issue: https://github.com/tektoncd/community/blob/main/teps/0050-ignore-task-failures.md#proposal @QuanZhang-William worked on the design and can help further Also, we'd appreciate it if you could add your organization in https://github.com/tektoncd/community/blob/main/adopters.md |
Hi @samagana! Yeah, I can help to get started with this feature. I will also be OOO for most of the time till the end of the year, but please don't hesitate to contact me via Slack 😄 . |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
Stale issues rot after 30d of inactivity. /lifecycle rotten Send feedback to tektoncd/plumbing. |
/remove-lifecycle rotten |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
/remove-lifecycle stale |
This is an attempt to support dependencies between finally Tasks but it is not fully functional! Problems: * Finally tasks cannot be cancelled by design but are supposed to run after cancellation (which doesn't meet our internal cb use-case). * A finally task is not allowed to refer to normal DAG task's result for some reason. Though, this should be possible according to the documentation and may be a Tekton bug. Relates to CBP-969 Relates to tektoncd#1684 Relates to tektoncd#4130 Relates to tektoncd#6919 Relates to tektoncd#6903
The goal is to come up with a design to handle failing task runs in a pipelinerun. Today, we simply fail the entire pipelinerun if a single taskrun fails.
Current Status
Summary in this comment: #1684 (comment)
Ideas
Here are a couple of ideas from @sbwsg and me:
errorStrategy
field inPipelineTasks
similar to the idea in Allow steps to run regardless of previous step errors #1573errorStrategy
could be under therunAfter
field.FailPipeline
which is the default for today, andContinuePipeline
which will continue running the whole pipelineAdditional Info
@sbwsg has some strawperson YAMLs:
RunNextTasks for an integration test cleanup scenario
FailPipeline(default) for a unit test failing before a deploy task
Use Cases
Related Issues
The Epic #1376 has all the related issues
The text was updated successfully, but these errors were encountered: