Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Propeller PreviousCheckpointPath assumes attempt-1 #2894

Open
2 tasks done
andrewwdye opened this issue Sep 19, 2022 · 3 comments
Open
2 tasks done

[BUG] Propeller PreviousCheckpointPath assumes attempt-1 #2894

andrewwdye opened this issue Sep 19, 2022 · 3 comments
Assignees
Labels
bug Something isn't working stale

Comments

@andrewwdye
Copy link
Contributor

Describe the bug

When retrying a task we pass the previous checkpoint path in the task context. Today we assume that the most recently available checkpoint is from the prior attempt; however, this may not be correct if the prior attempt failed before running the task or if the prior attempt failed in the middle of writing the checkpoint. In either case we would "lose" progress on retry and end up restarting the task from the beginning.

Because checkpoints are opaque to propeller, we likely need additional context from the task when a checkpoint is written (transacted).

Expected behavior

RemoteCheckpointPaths.previousPath refers to the last known valid checkpoint contents instead of assuming the prior attempt checkpoint path contains the latest.

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@andrewwdye andrewwdye added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Sep 19, 2022
@andrewwdye andrewwdye changed the title [BUG] PreviousCheckpointPath assumes attempt-1 [BUG] Propeller PreviousCheckpointPath assumes attempt-1 Sep 19, 2022
@eapolinario eapolinario removed the untriaged This issues has not yet been looked at by the Maintainers label Sep 23, 2022
@github-actions
Copy link

github-actions bot commented Sep 4, 2023

Hello 👋, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏

@github-actions github-actions bot added the stale label Sep 4, 2023
@hamersaw
Copy link
Contributor

hamersaw commented Sep 6, 2023

Commenting to keep open.

@github-actions github-actions bot removed the stale label Sep 7, 2023
Copy link

github-actions bot commented Jun 4, 2024

Hello 👋, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable.
Thank you for your contribution and understanding! 🙏

@github-actions github-actions bot added the stale label Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

3 participants