-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blobstore job fails to start after vm crash or reboot #25
Comments
We have created an issue in Pivotal Tracker to manage this: https://www.pivotaltracker.com/story/show/126686223 The labels on this github issue will be updated when the story is started. |
It appears the same problem exists with the cloud controller:
|
Similar issues in consul: cloudfoundry-attic/consul-release#31 |
We've made fixes in these two commits: This will ensure that the directories in When we switched our start commands to run as non-root, we moved all directory creation into pre-start, because we ran into problems with Let us know if this fixes the issue for you, and close the issue if it does! @sax && @adowns01 |
Thanks. |
Issue
The blobstore job fails to start after the VM was rebooted.
Context
The
nginx.stderr.log
from the failure shows the following line hundreds of times:The control script appears to rely on the
pre-start
script to setup that directory:According to the time stamps from the log, the pre-start script did run 3-days before the reboot:
Unfortunately, most of the directories that are created by that script live on temporary file systems that bosh sets up. In particular, /var/vcap/sys/run:
Since it's a tmpfs, it's memory only file system and all data gets lost on a reboot. That means that the directory used for the nginx pidfile is gone when the blobstore control script starts.
Steps to Reproduce
Expected result
The blobstore job recovers when the reboot is complete.
Current result
The blobstore job fails to recover. This causes the cloud controllers, cloud controller workers, and the runtimes to fail.
The text was updated successfully, but these errors were encountered: