-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deal with stale daemon PID files during stop attempts #2795
Conversation
@giovannipizzi , requested your review. Even though you co-authored the commit and the idea for the implementation was yours, you didn't touch any of the real code in the end so maybe it is okay for you to review. |
9a51e4e
to
6b42ad7
Compare
fcfc71e
to
4ea906c
Compare
What do we want to do get this over the line? Remove the dependence on psutil? I think this is the only place it's used in AiiDA now. |
In some circumstances, the circus PID file which stores the PID of the running circus process, can be corrupted or not deleted properly, causing calls to `is_daemon_running` to return a false positive. This situation can arise if a system is shut down suddenly and so the process is killed but the PID file is not deleted in time. Alternatively, another process or the user may have meddled with the PID file in some way, corrupting it. Here, we implement a function that checks for stale PID files by checking whether the PID contained in the PID file matches a valid running `verdi `process. If it does not, a warning is emitted and the PID file is deleted. This function is called when `verdi daemon stop` is called, making it function as a hard reset in a situation of a stale PID. Co-authored-by: Giovanni Pizzi <[email protected]>
For now, I think we can leave the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Straight up banging. Thanks @ConradJohnston
Fixes #2775 and fixes #1865
In some circumstances, the circus PID file which stores the PID of the running circus process can be corrupted, causing calls to the
get_daemon_pid()
method to return the wrong PID, or causing theis_daemon_running
attribute to returnTrue
when in reality the daemon is not running.This function checks if the PID contained in the circus pid file (
circus-{PROFILE_NAME}.pid
) matches a valid runningverdi
process. If it does not, the PID file is stale and will be removed.This situation can arise if a system is shut down suddenly and so the process is killed but the PID file is not deleted in time. Alternatively, another process or the user may have meddled with the PID file in some way, corrupting it.
Example of broken behaviour:
verdi daemon start
pid=$(cat ~/.aiida/daemon/circus-{PROFILE_NAME}.pid)
echo '9999999999' > ~/.aiida/daemon/circus-{PROFILE_NAME}.pid
kill $pid
verdi daemon stop
In this scenario, the
daemon stop
command will timeout, as will thestart
command, due to the stale PID file which is lying about there being a running daemon with the claimed PID.