You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When retrying a task we pass the previous checkpoint path in the task context. Today we assume that the most recently available checkpoint is from the prior attempt; however, this may not be correct if the prior attempt failed before running the task or if the prior attempt failed in the middle of writing the checkpoint. In either case we would "lose" progress on retry and end up restarting the task from the beginning.
Because checkpoints are opaque to propeller, we likely need additional context from the task when a checkpoint is written (transacted).
Expected behavior
RemoteCheckpointPaths.previousPath refers to the last known valid checkpoint contents instead of assuming the prior attempt checkpoint path contains the latest.
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
Yes
Have you read the Code of Conduct?
Yes
The text was updated successfully, but these errors were encountered:
Hello 👋, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏
Hello 👋, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable.
Thank you for your contribution and understanding! 🙏
Describe the bug
When retrying a task we pass the previous checkpoint path in the task context. Today we assume that the most recently available checkpoint is from the prior attempt; however, this may not be correct if the prior attempt failed before running the task or if the prior attempt failed in the middle of writing the checkpoint. In either case we would "lose" progress on retry and end up restarting the task from the beginning.
Because checkpoints are opaque to propeller, we likely need additional context from the task when a checkpoint is written (transacted).
Expected behavior
RemoteCheckpointPaths.previousPath
refers to the last known valid checkpoint contents instead of assuming the prior attempt checkpoint path contains the latest.Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: