Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Copy dynamicJobSpec from partially completed DynamicTask on recovery #518

Merged
merged 13 commits into from
Mar 4, 2023

Conversation

hamersaw
Copy link
Contributor

@hamersaw hamersaw commented Jan 19, 2023

DO NOT MERGE waiting on PRs:
flyteorg/flyteadmin#513

TL;DR

Since node IDs in dynamic job specs are not computed deterministically between executions recovering partially completed dynamic tasks can result in data corruptions. In this PR we attempt to recover the DynamicJobSpec from previous the previous DynamicTask execution during node recovery.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

^^^

Tracking Issue

fixes flyteorg/flyte#3121

Follow-up issue

NA

Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
@codecov
Copy link

codecov bot commented Jan 19, 2023

Codecov Report

Merging #518 (68e5c76) into master (03a6672) will increase coverage by 0.42%.
The diff coverage is 41.17%.

❗ Current head 68e5c76 differs from pull request most recent head 7ff7f96. Consider uploading reports for the commit 7ff7f96 to get more accurate results

@cosmicBboy
Copy link

@kumare3 @EngHabu friendly ping ^^

EngHabu
EngHabu previously approved these changes Feb 6, 2023
pkg/controller/nodes/executor.go Outdated Show resolved Hide resolved
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
@hamersaw hamersaw merged commit 5c7a31d into master Mar 4, 2023
@hamersaw hamersaw deleted the bug/recovering-dynamic-task branch March 4, 2023 00:53
@hamersaw hamersaw mentioned this pull request Apr 7, 2023
8 tasks
eapolinario pushed a commit to eapolinario/flytepropeller that referenced this pull request Aug 9, 2023
…lyteorg#518)

* storing compiled workflow

Signed-off-by: Daniel Rammer <[email protected]>

* working

Signed-off-by: Daniel Rammer <[email protected]>

* added unit tests

Signed-off-by: Daniel Rammer <[email protected]>

* fixed lint issues

Signed-off-by: Daniel Rammer <[email protected]>

* recovering inputs of partially complete dynamic task

Signed-off-by: Daniel Rammer <[email protected]>

* removed dead code

Signed-off-by: Daniel Rammer <[email protected]>

* updated flyteidl version

Signed-off-by: Daniel Rammer <[email protected]>

* returning inputs from recoverInputs function

Signed-off-by: Daniel Rammer <[email protected]>

* fixed tests

Signed-off-by: Daniel Rammer <[email protected]>

* updated flyteidl

Signed-off-by: Daniel Rammer <[email protected]>

---------

Signed-off-by: Daniel Rammer <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Incorrect output data when recovering partially completed dynamic workflows
4 participants