Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot on signal #2012

Closed
shelhamer opened this issue Mar 2, 2015 · 5 comments
Closed

Snapshot on signal #2012

shelhamer opened this issue Mar 2, 2015 · 5 comments

Comments

@shelhamer
Copy link
Member

Caffe should attempt to snapshot on SIGINT. Suggested by @longjon.

@seanbell
Copy link

seanbell commented Mar 3, 2015

I like this idea, though I think it would be helpful to have a way of easily disabling this behavior if it's the default (e.g. command-line flag). Otherwise, management scripts that call caffe now have to worry about extra disk-space consequences of stopping training early.

I think it's also worth letting users configure which signal code will trigger a snapshot (e.g. in solver.prototxt). The default could be SIGINT (2), but other codes such as SIGHUP (1) would let you signal a snapshot without interrupting training.

@shelhamer
Copy link
Member Author

@seanbell agreed. All good points.

@beniz
Copy link

beniz commented Mar 3, 2015

Is there a ticket to focus all thoughts on error handling within Caffe already ? If not, maybe it could make sense to create one with bullet points for sub-tasks. I would happily work on this after it's all been discussed and agreed upon so that no effort is wasted.

@jyegerlehner
Copy link
Contributor

Ref PR 2253.

@shelhamer
Copy link
Member Author

Solved by #2253

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants