-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integrations: Load model_file
for resuming
#140
Comments
model_file
on init for resumingmodel_file
for resuming
It looks that for some ML Frameworks which have some "resuming" argument this should be a Documentation update |
Lowering the priority as this depends on deciding and documenting the recomended workflow for DVC checkpoints |
Back to |
Relevant for https://github.com/iterative/terraform-provider-iterative use case |
also related: iterative/example-repos-dev#83 (comment) |
Inside each integration, we should look for the ML-Framework specific flags and handle |
In order to properly resume training with dvc checkpoints, the user needs to load the existing
model_file
at the beginning of training.Given that DVCLive integrations already take care of saving the
model_file
I think it makes sense to also include some logic to load themodel_file
, if it already exists, on the callback instantiation oron_train_begin
.This would simplify the usage of
dvc checkpoints
for resuming training.PyTorch Lightning
: Support saving model tomodel_file
#170The text was updated successfully, but these errors were encountered: