Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with stale daemon PID files during stop attempts #2795

Merged
merged 1 commit into from
May 30, 2019

Conversation

ConradJohnston
Copy link
Contributor

@ConradJohnston ConradJohnston commented Apr 24, 2019

Fixes #2775 and fixes #1865

In some circumstances, the circus PID file which stores the PID of the running circus process can be corrupted, causing calls to the get_daemon_pid() method to return the wrong PID, or causing the is_daemon_running attribute to return True when in reality the daemon is not running.

This function checks if the PID contained in the circus pid file (circus-{PROFILE_NAME}.pid) matches a valid running verdi process. If it does not, the PID file is stale and will be removed.

This situation can arise if a system is shut down suddenly and so the process is killed but the PID file is not deleted in time. Alternatively, another process or the user may have meddled with the PID file in some way, corrupting it.

Example of broken behaviour:
verdi daemon start
pid=$(cat ~/.aiida/daemon/circus-{PROFILE_NAME}.pid)
echo '9999999999' > ~/.aiida/daemon/circus-{PROFILE_NAME}.pid
kill $pid
verdi daemon stop
In this scenario, the daemon stop command will timeout, as will the start command, due to the stale PID file which is lying about there being a running daemon with the claimed PID.

@ConradJohnston
Copy link
Contributor Author

@giovannipizzi , requested your review. Even though you co-authored the commit and the idea for the implementation was yours, you didn't touch any of the real code in the end so maybe it is okay for you to review.

@coveralls
Copy link

coveralls commented Apr 24, 2019

Coverage Status

Coverage decreased (-0.03%) to 72.87% when pulling 258949c on ConradJohnston:fix_2775 into 875f3db on aiidateam:develop.

@ConradJohnston
Copy link
Contributor Author

What do we want to do get this over the line? Remove the dependence on psutil? I think this is the only place it's used in AiiDA now.

In some circumstances, the circus PID file which stores the PID of the
running circus process, can be corrupted or not deleted properly,
causing calls to `is_daemon_running` to return a false positive.

This situation can arise if a system is shut down suddenly and so
the process is killed but the PID file is not deleted in time.
Alternatively, another process or the user may have meddled with the
PID file in some way, corrupting it.

Here, we implement a function that checks for stale PID files by
checking whether the PID contained in the PID file matches a valid
running `verdi `process. If it does not, a warning is emitted and the
PID file is deleted. This function is called when `verdi daemon stop` is
called, making it function as a hard reset in a situation of a stale PID.

Co-authored-by: Giovanni Pizzi <[email protected]>
@sphuber sphuber changed the title Implement delete_stale_pid_file function Deal with stale daemon PID files during stop attempts May 30, 2019
@sphuber
Copy link
Contributor

sphuber commented May 30, 2019

For now, I think we can leave the psutil dependency, because this fix is useful enough. Thanks a lot @ConradJohnston and sorry for the delay with the review

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Straight up banging. Thanks @ConradJohnston

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stale PID files are not always removed Daemon fails to start after being killed
3 participants