This repository houses pgrt
, a small utility that will connect
to all of the nodes in a PostgreSQL streaming replication cluster
and verify the health and well-being of each node.
A healthy cluster, with some leeway on the replication lag (`-l)
$ pgrt -M 10.244.232.2 -S 10.244.232.3 -S 10.244.232.4 -l 31768
10.244.232.2: 0/B30BA98
10.244.232.3: 0/B30BA98 (0) to 0/B30BC28 (-400)
10.244.232.4: 0/B30BDC0 (-808) to 0/B30BDC0 (0)
The same cluster, reporting as unhealthy because we only tolerate 800 bytes of replication lag (admittedly, fairly unrealistic):
$ pgrt -M 10.244.232.2 -S 10.244.232.3 -S 10.244.232.4 -l 800
10.244.232.2: 0/B17F098
10.244.232.3: 0/B17F230 (-408) to 0/B17F230 (0)
10.244.232.4: 0/B17F558 (-1216) to 0/B17F6E8 (-400) !! too far behind write master
FAILED
pgrt
exits 0 if it can contact all nodes, each node is playing
the part specified (i.e. write master is a write master, and read
slaves are actually read slaves), and the replication lag (first
parenthetical figure) is below the acceptable lag (per -l
)
It exists non-zero on failure, with the following meanings:
- 1 - Option processing or other non-runtime error. Check your flags.
- 2 - Connectivity to at least one node failed.
- 3 - A query to the write master failed
- 4 - A query to one of the read slaves failed
- 5 - xlog conversion failed (if this happens, something is terribly broken...)
- 6 - One or more of the read slaves was lagging too far
behind the master (based on
-l
)
-M, --master Replication master host. May only be specified once
-S, --slave Replication slave host(s). May be specified more than once
-p, --port TCP port that Postgres listens on (default: 6432)
-u, --user User to connect as
-w, --password Password to connect with
-D, --debug Enable debugging output (to standard error)
-l, --lag Maximum acceptable lag behind the master xlog position (bytes) (default: 8192)