-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot on signal #2012
Comments
I like this idea, though I think it would be helpful to have a way of easily disabling this behavior if it's the default (e.g. command-line flag). Otherwise, management scripts that call caffe now have to worry about extra disk-space consequences of stopping training early. I think it's also worth letting users configure which signal code will trigger a snapshot (e.g. in solver.prototxt). The default could be SIGINT (2), but other codes such as SIGHUP (1) would let you signal a snapshot without interrupting training. |
@seanbell agreed. All good points. |
Is there a ticket to focus all thoughts on error handling within Caffe already ? If not, maybe it could make sense to create one with bullet points for sub-tasks. I would happily work on this after it's all been discussed and agreed upon so that no effort is wasted. |
Ref PR 2253. |
Solved by #2253 |
Caffe should attempt to snapshot on SIGINT. Suggested by @longjon.
The text was updated successfully, but these errors were encountered: