Skip to content

Commit

Permalink
roachprod: monitor local script
Browse files Browse the repository at this point in the history
Add a new monitor script for monitoring local processes. On OSX development
machines `systemctl` is not available so we need to rely on process status, `ps`
command, to determine which processes are running. One limitation of using
`ps` is that we are unable to determine the status for a process that has
exited. But to keep consistent with the output Monitor expects we append a
status line "unknown" for each process.

The biggest difference between the new and old script is that this script does
not contain any logic to detect process changes, but rather only provides
information on cockroach processes back to the caller. It does however optimise
a little bit by not resending the information if it has not changed. This now
puts the responsibility on the caller (Monitor) to maintain the logic for
detecting process changes and emitting events.

Epic: None
  • Loading branch information
herkolategan committed Dec 9, 2024
1 parent 6d97e73 commit 3bb2e18
Showing 1 changed file with 53 additions and 0 deletions.
53 changes: 53 additions & 0 deletions pkg/roachprod/install/scripts/monitor_local.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
#!/usr/bin/env bash
#
# Copyright 2024 The Cockroach Authors.
#
# Use of this software is governed by the CockroachDB Software License
# included in the /LICENSE file.

# This script is used to monitor the status of cockroach processes on local nodes
# using `ps`. It does not support checking the exit status of a process, but
# still outputs the status line for consistency with the remote monitor.
# It produces output in the following format:
#cockroach-system=500
#status=unknown
#cockroach-tenant_0=501
#status=unknown
#\n = end of frame

roachprod_regex=#{shesc .RoachprodEnvRegex#}
one_shot=#{if .OneShot#}true#{end#}

prev_frame=""
while :; do
# Get the PID and command of all processes that match the roachprod regex.
ps_output=$(ps axeww -o pid,state,command | grep -v grep | grep -E "$roachprod_regex")
frame=""
while IFS= read -r line; do
# Extract the PID and command from the line.
read -r pid state command <<< "$line"
# If the state contains a '+' character, the command is still in the foreground and will be excluded.
if [[ "$state" == *"+"* ]]; then
continue
fi
# Extract the virtual cluster label from the command.
vc_label=$(echo "$command" | grep -E -o 'ROACHPROD_VIRTUAL_CLUSTER=[^ ]*' | cut -d= -f2)
# If the virtual cluster label is not empty, print the label and the PID.
# Also print the status of the process, if remote, where systemd is available.
if [ -n "$vc_label" ]; then
frame+="$vc_label=$pid\n"
# If the process is local we can't check the status (exit code).
frame+="status=unknown\n"
fi
done <<< "$ps_output"
# Only print the frame if it has changed.
if [ "$frame" != "$prev_frame" ]; then
echo -e "$frame"
prev_frame="$frame"
fi
# If one_shot is set, exit after the first iteration.
if [[ -n "${one_shot}" ]]; then
break
fi
sleep 1
done

0 comments on commit 3bb2e18

Please sign in to comment.