How should Postgres behave if etcd is unavailable / unpredictable? #7

Winslett · 2015-05-07T14:47:40Z

Some of the scenarios to consider are:

governor for Postgres primary cannot communicate with etcd, but rest of cluster can
no governors in Postgres cluster can communicate with etcd
etcd crashes and recovers. after recovery, etcd leader TTLs have expired.
etcd crashes and recovers. after recovery, the initialization key is empty. this would cause issues when new members would come online and race to initialize

The first decision to answer is: should Postgres cluster go readonly if etcd fails? Or, should the Postgres cluster keep the current Primary, but not have automatic failover functionality?

tvb · 2015-05-07T14:55:45Z

I think it is best to go readonly so replication from the new elected master can continue.

bjoernbessert · 2015-05-08T17:09:28Z

In my opinion, i would prefer to keep the current Primary and do not take any "decision" when you loose the "brain". In general, i try to avoid the situation that the HA software/layer itself is able to bring down the protected application (and the protected application itself haven't any problems at all).

The Postgres cluster can go read-only as an additional safety measure, but it's not a must-have.

Winslett mentioned this issue May 7, 2015

haproxy_status.sh should get leader status from etcd #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How should Postgres behave if etcd is unavailable / unpredictable? #7

How should Postgres behave if etcd is unavailable / unpredictable? #7

Winslett commented May 7, 2015

tvb commented May 7, 2015

bjoernbessert commented May 8, 2015

How should Postgres behave if etcd is unavailable / unpredictable? #7

How should Postgres behave if etcd is unavailable / unpredictable? #7

Comments

Winslett commented May 7, 2015

tvb commented May 7, 2015

bjoernbessert commented May 8, 2015