-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resmgr: write a PID file upon successful startup. #756
Conversation
As a side-note, I did look for a sane-looking and maintained existing pidfile pkg. I only found the facebook abandonware and another one which among other things lacked any test cases. So I decided we'll roll our own. |
13c54c8
to
f0ea5fb
Compare
Codecov Report
@@ Coverage Diff @@
## master #756 +/- ##
==========================================
+ Coverage 36.98% 37.43% +0.44%
==========================================
Files 54 55 +1
Lines 8035 8099 +64
==========================================
+ Hits 2972 3032 +60
+ Misses 4775 4771 -4
- Partials 288 296 +8
Continue to review full report at Codecov.
|
Do you think we should/could follow the pattern that leaves pidfile open and locked for the daemon when it runs? It seems that 100 % of pid files under my /var/run have the same Recognizing running cri-resmgr by the lock would be safer than relying on the pid written to the file. Current logic might encourage killing an unlucky processes has received the pid of a former cri-resmgr, in case |
Keeping the PID file open should be fine and I can add it. I really wouldn't add locking it... at least not before portable file locking lands in a golang release. AFAICT, this shouldn't be a problem since anyway we don't/won't use the PID file to protect against multiple concurrently running instances (we use the relay socket for that). And keeping the file open alone should satisfy your need for the workaround to kill existing instances using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks basically good to me 👌 I wouldn't add locking, either, but keeping the file open sounds fine
f0ea5fb
to
a354693
Compare
@askervin Actually... do we really need this whole thing ? You can already do an |
a354693
to
1c792ab
Compare
Yeah, this is what I did in the draft PR #754, though it doesn't yet take the disambiguation (kubelet) into account. I kept the PR as draft, as I had a doubt that it will not work anymore in the case where With this reasoning I'd still say this PR makes sense. |
That's a good point. You would need to use the |
1c792ab
to
06b9f08
Compare
@jukkar Sorry, I had to fix a few remaining test case name vs. test copy/paste typos in the tests and some comma spelling errors... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, works fine. Thanks @klihub!
Not related to this patch, but when I was testing this, the first observation was something that I didn't expect: if I start one cri-resmgr, suspend it in terminal (ctrl-z), and start another one, the latter one prints:
...
W0105 13:43:19.196350 19065 server.go:203] removing abandoned socket '/var/run/cri-resmgr/cri-resmgr.sock' in use...
...
and starts just fine. After a couple of repetitions this results in:
# fuser -v /var/run/cri-resmgr/cri-resmgr.sock
USER PID ACCESS COMMAND
/run/cri-resmgr/cri-resmgr.sock:
root 18963 F.... cri-resmgr
root 19065 F.... cri-resmgr
root 19400 F.... cri-resmgr
root 19631 F.... cri-resmgr
root 19733 F.... cri-resmgr
But obviously this is a corner case. The logic works fine and prints the PID of the running cri-resmgr when it runs normally in the background.
Hmm... indeed a corner case. But I bet that behavior could be improved a lot by better socket probing. Now we're really dumb about it, simply try to connect to the socket, and all failures are considered to indicate an abandoned socket. |
This corner case should be fixed by this PR. |
Implement basic PID file handling. Write a PID file upon successful startup. If we fail to start up due to an existing active socket, try to read the process ID of the running instance from the PID file. If this succeeds write an diagnostic/error message with the read process ID.