Skip to content

Commit

Permalink
Merge #37390
Browse files Browse the repository at this point in the history
37390: roachprod: remove monitor netcat command r=ajkr a=ajkr

`roachprod monitor` assumes `nc` will exit as soon as Cockroach server
exits. This actually is not the case in later versions of netcat (tested
on Ubuntu 18.04+).

This PR changes to a polling approach calling `kill -0` once per second
to monitor the Cockroach server's liveness. This should give us better
portability and we verified the overhead is low (~0.65ms of a CPU core's
time per `kill` invocation). Tested by running `roachprod monitor`
locally, gradually killing the nodes, and observing the output:

```
3: 28342
1: 28176
2: 28257
3: kill exited nonzero
3: dead
2: kill exited nonzero
2: dead
1: kill exited nonzero
1: dead
```

Fixes #37370.

Release note: None

Co-authored-by: Andrew Kryczka <[email protected]>
  • Loading branch information
craig[bot] and ajkr committed May 9, 2019
2 parents 9e051cd + bb4cdf8 commit 12917e0
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions pkg/cmd/roachprod/install/cluster_synced.go
Original file line number Diff line number Diff line change
Expand Up @@ -368,8 +368,10 @@ while :; do
exit 0
{{- end}}
if [ -n "${lastpid}" ]; then
nc localhost {{.Port}} >/dev/null 2>&1
echo nc exited
while kill -0 "${lastpid}"; do
sleep 1
done
echo "kill exited nonzero"
else
sleep 1
fi
Expand Down

0 comments on commit 12917e0

Please sign in to comment.