-
-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consul: Add patroni_maximum_lag_on_replica #570
Conversation
Test (dcs_type: "consul") Consul DNSConsul service config:
Test commands: # Set recovery_min_apply_delay on replica
psql -h 10.172.0.21 -p 5432 -U postgres -c "alter system set recovery_min_apply_delay='2min'"
psql -h 10.172.0.21 -p 5432 -U postgres -c "select pg_reload_conf()"
# Observe the replication lag
for i in {1..600}; do psql -h 10.172.0.20 -U postgres -p 5432 -c " select now(),client_addr,pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),replay_lsn)) as total_lag from pg_stat_replication"; sleep 2; done
# Run check Patroni REST API for replica
for i in {1..600}; do echo $(date); curl -I http://10.172.0.21:8008/replica?lag=100MB; sleep 2; done
# Connect to replicas and check listen_addresses
for i in {1..600}; do echo $(date); psql -h replica.postgres-cluster.service.consul -p 6432 -U postgres -c "show listen_addresses"; sleep 2; done
# Generate data to create a lag
pgbench -h 10.172.0.20 -p 5432 -U postgres -i -s 10 postgres Result:
We observe that during a lag period of more than 100MB (definitely in patroni_maximum_lag_on_replica), replica 10.172.0.21 is removed from read traffic balancing (port 5001) and connections are routed only to the replica (10.172.0.22) without a high replication lag. Until the lag is below the threshold, and then replica 10.172.0.21 is available again to balance the read-only traffic. passed |
Related PR: #569