You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When recovering an execution with tasks that were SKIPPED due to an upstream node failure, these tasks are not retried, and the execution is incorrectly marked as SUCCEEDED anyway.
Expected behavior
SKIPPED tasks should also be retried on recovery if the upstream node succeeds.
Additional context to reproduce
import random
from flytekit import task, workflow
from flytekit.core.workflow import WorkflowFailurePolicy
@task
def pass_through(input1: int) -> int:
return input1
@task
def fail(input1: int) -> int:
if random.randint(0, 10) < 7:
assert False
return input1
@workflow(failure_policy=WorkflowFailurePolicy.FAIL_AFTER_EXECUTABLE_NODES_COMPLETE)
def wf(wf_input: int) -> tuple[int, int]:
a = fail(input1=wf_input)
b = pass_through(input1=wf_input)
c = pass_through(input1=a)
return b, c
Execute against a flyte-sandbox like so:
pyflyte run --remote --image cr.flyte.org/flyteorg/flytekit:py3.10-latest test/recovery.py wf --wf_input 3
Since the task failure is non-deterministic, keep retrying until the first node fails. The last node should now be marked as SKIPPED. Then, recover until the first node succeeds and observe the behavior.
Screenshots
Initial failure:
After "successful" recovery:
How it SHOULD work (behavior in Flyte v1.3.0):
Are you sure this issue hasn't been raised already?
Yes
Have you read the Code of Conduct?
Yes
The text was updated successfully, but these errors were encountered:
Describe the bug
When recovering an execution with tasks that were
SKIPPED
due to an upstream node failure, these tasks are not retried, and the execution is incorrectly marked asSUCCEEDED
anyway.Expected behavior
SKIPPED
tasks should also be retried on recovery if the upstream node succeeds.Additional context to reproduce
Execute against a flyte-sandbox like so:
Since the task failure is non-deterministic, keep retrying until the first node fails. The last node should now be marked as
SKIPPED
. Then, recover until the first node succeeds and observe the behavior.Screenshots
Initial failure:
After "successful" recovery:
How it SHOULD work (behavior in Flyte v1.3.0):
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: