-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
queue: preserve checkpoints for failed experiments #8750
Comments
If I gracefully stop an experiment (
If I forcefully stop an experiment (
Although VS Code shows the failure under the same experiment: This makes some sense to me except for the failed checkpoint showing up separate from the rest of the experiment. What do you think? |
@dberenbaum, Yes, it looks like there is something wrong with my repository in the previous. After updating to the current |
I came across this issue whilst working through the checkboxes in iterative/vscode-dvc#3091 Generating the extra experiment record when killing an experiment running in the queue means that Screen.Recording.2023-01-27.at.12.57.04.pm.movprovided error is Not a massive problem as the records get removed. More of a minor annoyance. |
Hmm, okay, raising this to p1 since removing any entry shown in the table shouldn't fail. Once we fix that problem raised by @mattseddon, I think we can close this one. |
dropping this to P2 while the discussion about whether to drop checkpoints support entirely is still ongoing |
This behavior is different from
--temp
in which we return completed checkpoint results even if the tasks failed. I think the--temp
behavior is more reasonable.Originally posted by @karajan1001 in #8668 (comment)
The text was updated successfully, but these errors were encountered: