Pub/Sub subscribe to +switch-master events. #25

ldm5180 · 2014-11-20T18:12:01Z

The master can change even without a disconnect happening to the client. In order to get this change the clients must subscribe to the +switch-master message. This needs to be subscribed before the client connection is established to avoid race conditions.

…eton. Problem: The original design of the sentinel with Pub/Sub to the switch-master message broke the last unit test because it was always connected to the previously successful servers. Analysis: In order to not have always stay connected to the old servers the subscription to the Pub/Sub message needs to only occur when the successful connect occurs.

jamessharp · 2014-11-21T11:56:23Z

This is great - thanks. I think there's a race condition

reconnectAllClients calls connection_gone, which in turn emits 'reconnecting' which causes redis-sentinel to request the master/slave/whatever endpoint again and then node_redis sets a timer to attempt a reconnect.
In the case where the sentinel responds with endpoint details before the retry timer is up then all is fine. node_redis just connects to the new endpoints.
However I doubt that will happen very often. In the case where the retry timer calls first then node_redis will reconnect back to the old master. Even when the sentinel response comes back we won't do a reconnect so we'll stay connected to the old master.
The fix is non-trivial because you can't just say 'when we get a response from sentinel check if the endpoints have changed and if they have reconnect'. This is because in the intervening period (when we've reconnected to the old master before the sentinel response has come back) commands could well have been issued to the old master and any writes to the DB will fail.

The solution is probably along these lines:

In refreshEndpoints initially null route the address (i.e. set client.connectionOption.host = '0.0.0.0')
Then once resolver returns set the correct address (as today) then call connection_gone again to make sure we do attempt a reconnect.

I'm not sure that setting the null address won't kill node_redis but is worth a try.

…plete. The resolver needs to run before the place to connect is determined, but node-redis will start attempting to reconnect immediately. Prevent this by setting the port and ip to the empty string and letting the resolver modify it. A better fix would be to modify node-redis to be able to not retry until told to do so.

ldm5180 · 2014-11-29T18:20:46Z

Yes, there is certainly a race there. I fixed the race with the fix you suggested and tested it using a slow Sentinel. It seems to work fine.

jamessharp · 2014-12-09T09:28:04Z

Looks good. Thanks!

Pub/Sub subscribe to +switch-master events.

James Allen and others added 2 commits November 20, 2014 10:56

Listen on Sentinel PUBSUB channel for +switch-master

c903a85

jamessharp added a commit that referenced this pull request Dec 9, 2014

Merge pull request #25 from ldm5180/pubsub-switchmaster

1a493ad

Pub/Sub subscribe to +switch-master events.

jamessharp merged commit 1a493ad into ortoo:master Dec 9, 2014

ldm5180 deleted the pubsub-switchmaster branch December 23, 2014 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pub/Sub subscribe to +switch-master events. #25

Pub/Sub subscribe to +switch-master events. #25

ldm5180 commented Nov 20, 2014

jamessharp commented Nov 21, 2014

ldm5180 commented Nov 29, 2014

jamessharp commented Dec 9, 2014

Pub/Sub subscribe to +switch-master events. #25

Pub/Sub subscribe to +switch-master events. #25

Conversation

ldm5180 commented Nov 20, 2014

jamessharp commented Nov 21, 2014

ldm5180 commented Nov 29, 2014

jamessharp commented Dec 9, 2014