Sentinel failover doesn't appear to function correctly #1144

mikebell90 · 2019-10-04T15:37:09Z

Bug Report

So I'm a redis/lettuce n00b, and I may be missing something (probably am), but I can't get redis with sentinel (not Redis Cluster) to function 100% on failover.

Current Behavior

So . my goal was to follow .the stated best practice - keep one cached connection for all my simple gets and puts. But I need HA, and reasonably fast failover. So we have Sentinel.

I'll put wiring details below but here are some relevant general usages

Test harness simply does 1 put and then 100 gets and logs errors and retries.
The RedisConnection is kept cached. Every 5 minutes (configurable), there's an unconditional recyclng where the old connection is killed and a new one is provided as a lazy supplier. That's meant to be a mitigant for unexpected conditions - not the primary fix.
I attempted to choose reasonable timeouts and reconnect behavior.
We provide 3 sentinel uris as the "seed"

public RedisConfiguration(@Value("${redis.nodes:}")
                              List<String> redisNodes,
                              @Value("${redis.enabled:true}")
                              Boolean redisEnabled,
                              @Value("${redis.resetConnectionTimeMinutes:5}")
                              Long resetConnectionTime,
                              @Value("${redis.commandTimeoutMils:250}")
                              Long commandTimeout,
                              @Value("${redis.maxDelayReconnect:10000}")
                              Long maxDelayReconnect,
                              @Value("${redis.password:}")
                                      String password) {

        this.redisEnabled = redisEnabled;

        this.resetConnectionTimeMinutes = resetConnectionTime;

        this.commandTimeout = commandTimeout;

        this.maxDelayReconnect = maxDelayReconnect;

        redisURIs = redisNodes.stream().map(el -> {
            final String host = el.split(":")[0];
            final Integer port = Integer.valueOf(el.split(":")[1]);
            final RedisURI.Builder builder = RedisURI.builder()
                                       .withSentinel(host,port)
                    .withSentinelMasterId("dbatest");

            if (!StringUtils.isEmpty(password)) {
                builder.withPassword(password);
            }
            return  builder.build();
        }).collect(Collectors.toList());

        redisClient.set(getRedisClient());
        redisConnection.set(getRedisConnection(redisClient.get()));
        redisCommand.set(getRedisCommand(redisConnection.get()));

        scheduledExecutorService.schedule(() -> {
            LOG.info("Running standard recycle");
            resetRedisConnectionWithScheduleReset(true);
            return null;
        }, resetConnectionTime, TimeUnit.MINUTES);

    }

    /**
     * Creates client resources
     * @return
     */
    private ClientResources getClientResources() {
        return DefaultClientResources.create();
    }

    /**
     * Returns Redis client
     * @return
     */
    private RedisClient getRedisClient() {

        if (!redisEnabled) {
            return null;
        }

        final SocketOptions socketOptions = SocketOptions.builder()
                                            .connectTimeout(Duration.ofMillis(200))
                                            .build();

        final ClientOptions clientOptions = ClientOptions.builder()
                                            .socketOptions(socketOptions)
                                            .autoReconnect(true)
                                            .suspendReconnectOnProtocolFailure(true)
                                            .cancelCommandsOnReconnectFailure(true)
                                            .pingBeforeActivateConnection(true)
                                            .disconnectedBehavior(ClientOptions.DisconnectedBehavior.REJECT_COMMANDS)
                                            .timeoutOptions(TimeoutOptions
                                                            .builder()
                                                            .fixedTimeout(Duration.ofMillis(commandTimeout))
                                                            .build())
                                            .build();
        RedisClient client = RedisClient.create(getClientResources());
        client.setOptions(clientOptions);

        return client;
    }

    public Boolean resetRedisConnectionWithScheduleReset(Boolean scheduleReset) {

        RedisClient redisClientBuffer = getRedisClient();
        StatefulRedisMasterSlaveConnection<String, String> redisConnectionBuffer = getRedisConnection(redisClientBuffer); //NOPMD
        RedisCommands<String,String> redisCommandBuffer = getRedisCommand(redisConnectionBuffer);

        RedisClient previousClient = redisClient.getAndSet(redisClientBuffer);
        StatefulRedisMasterSlaveConnection<String, String> previousConnection = redisConnection.getAndSet(redisConnectionBuffer); //NOPMD
        redisCommand.set(redisCommandBuffer);

        if (previousConnection != null) {
            previousConnection.close();
        }
        if (previousClient != null) {
            previousClient.getResources().shutdown();
            previousClient.shutdown();
        }

        if (scheduleReset) {
            this.scheduledExecutorService.schedule(() -> {
                LOG.info("Running (non seed)");
                resetRedisConnectionWithScheduleReset(true);
                return null;
            }, resetConnectionTimeMinutes, TimeUnit.MINUTES);
        }

        return true;
    }

Expected behavior/code

When a Redis slave or master goes down, we get errors until the master is reelected or we switch slaves automaitcally (the driver does this). This functions pretty well
When a sentinel node goes down, we might get an error or two, but the driver moves onto other Sentinel nodes its discovered, and added/subtracted nodes are reflected.

Actual behavior

Redis failover itself works well
However if . the first listed Sentinel uri is brought down, the driver keeps throwing exceptions (can't connect to blah blah blah), and the driver never tries another Sentinel URI, despite 2 other seeds being provided and an additional 2 not provided in seed being available.

Am I missing something? The docs strongly imply that the topology is automatically refreshed, and one would also assume the driver understands Sentinel never "gives up" on an old Sentinel node, but the driver should try connecting to multiple Sentinels and reshuffle do to errors.

Environment

Lettuce 5.2
Redis 5

mp911de · 2019-10-05T17:04:14Z

Can you attach a stack trace that shows the error? To rephrase your requirement: If any of the given sentinels is not reachable, then you want to still be able to connect to Redis

This functionality is built into RedisClient and is expected to work. Having the stack trace should give us a hint what's going on.

mikebell90 · 2019-10-05T17:10:40Z

I will whip up another test and attach the info with lettuce logger at trace. Might be Monday or Tuesday

mikebell90 · 2019-10-09T21:29:15Z

Attached will be the following:
our testbed code (check out subfolder testbed-sentinel)
the stack trace

Here is how we duped
sentinel 1-4
1 is master
sentinel 1 down, then redis 1 down, no issues, though it complains for a file

4 is master now, sentinel 4 is brought down and master. 1 goes back up, and elects 3
now we seem stuck in the reconnect
testbeds.zip
lettuce.txt.zip

mikebell90 · 2019-10-09T21:30:32Z

@mp911de Let me know what else you need. FWIW I couldn't find what in the topology provider ever reset the initial list of sentinel nodes. You then have code loopng through it, so whatever's first SEEMS like a POF. But might have missed something, thanks for your assistance

mikebell90 · 2019-10-10T23:24:02Z

More details:
I set up a test cluster - a,b,c.
Each has both sentinel and redis

If and only if the FIRST node listed in the config.yaml is the one as MASTER, and I kill both sentinel and redis, we get a deadlock of infinite reconnects..

If we have another node listed first, all is well. If it's not master it appears to be well (although that might just be luck)

I saw some code that always iterated throughh a list sequentially - I can't find it right now, but it looked like it needed an incrementing round-robin weight to work properly.

Ah, it's SentinelTopologyRefresh. But to be fair I've not seen if that gets reinstantiated somewhere.

mikebell90 · 2019-10-10T23:24:44Z

I should also point out I provided my config and program. I may have changed a few defaults unwisely - see RedisConfiguration, which is also quoted above.

mikebell90 · 2019-10-10T23:30:49Z

Ohh and it's Lettuce 5.1.7. Sorry, was inaccurate before

mikebell90 · 2019-10-11T15:45:04Z

I tried Lettuce 5.2.0 - same behavior

mikebell90 · 2019-10-18T13:49:55Z

@mp911de just confirming if you need any additional information ?

mp911de · 2019-10-18T14:11:05Z

I'm currently buried under a pile of issues and bugs I need to follow up.

mp911de · 2019-10-25T13:06:55Z

I'm not sure I understand the issue correctly. I tried the following steps:

Start the 3 sentinels, Master (Node 1) and Replica (Node 2)
Kill the Master (Node 1) -> Errors until failover to replica is done. Node 2 is now the new master
Kill the first Sentinel
Start the old Master (Node 1) again. Comes back up as replica
Kill the new Master that was elected in 2 (Node 2) -> Errors until failover to replica is done. Node 1 is now the new master.

In every case, I've seen a couple RedisException: Currently not connected. Commands are rejected. until failover completed.

Can you help me reproduce the issue?

mikebell90 · 2019-10-25T17:23:31Z

Yes I'll be happy to write up a more detailed thing. Hopefully get to it inn a few minutes. Thanks as always Mark.

mikebell90 · 2019-10-25T20:36:46Z

I've attached a detailed workthrough. I hope it is sufficient. We've replicaed this on multiple clusters here of various sizes. This is our tiny test cluster (3 nodes, min for sentinel test)

testingredis.txt

mikebell90 · 2019-10-25T20:37:03Z

There ya go @mp911de I hope its adequate docs.

mp911de · 2019-10-30T10:52:56Z

Thanks for the update. The issue seems to be the way how RedisURI is configured. The URI itself is a container for multiple sentinel endpoints. So you should rather have a setup that correlates with:

RedisURI uri = RedisURI.builder()
        .withSentinel("arch-test-node-a-redis-ci-sf.otenv.com", 26379)
        .withSentinel("arch-test-node-b-redis-ci-sf.otenv.com", 26379)
        .withSentinel("arch-test-node-c-redis-ci-sf.otenv.com", 26379)
        .withSentinelMasterId("arch").build();

MasterReplica.connect(client, codec, uri);

The MasterReplica.connect(…) method that accepts a List<RedisURI> is primarily intended for static setups and not Redis Sentinel. That method picks the first RedisURI from that list and ignores the remaining RedisURI objects.

See also: https://github.com/lettuce-io/lettuce-core/blob/52ee0dca68fb9bcd011c1574e3ea0150ca350eb6/src/main/java/io/lettuce/core/masterslave/MasterSlave.java#L209-L210

It would make sense to add a bit of logging here to indicate that only the first URI was used and if the List contains more than one entry, then there should be a warning. Otherwise, the arrangement gets dropped unnoticed.

Can you change your code and run that test again?

mikebell90 · 2019-10-30T18:24:44Z

That appears to have fixed the issue. Thank you. However I have two followup questions:

Rather than failing, even with a LOG.WARN, why not remove the errant method signature. The usage seems to be largely "of convenience", rather than "correct".
It leads to this much larger question. Imagine I provide 3 "seed" sentinel uris. The actual cluster may have 1000 redis nodes and 1000 sentinel nodes (just an illustration). The cluster might change size, so including every sentinel node as seed is not practical or desirable. Instead we choose a reaosnable number of seed nodes spread across availability groups.

Here's what I'd expect

If all 3 of the seeds are down at startup I fail, game over. After initial connection and pubsub however....
If any master or replica goes down I'd expect failover - we get this
if any sentinel process goes down I'd also expect failover. In other words, after initial connection, you know about ALL 997 other sentinels, and hence EVEN if all 3 original seed sentinels go down, this is but a mere nuisance, you just connect over to another one. Once/IF they ever go up again, great.

It's the very last part I'm worried about. Because list.get(0) is terrible, sure, because your seed is only one node, and hence successful startup of a connection is compromised. But once up, who cares? - take that node down permanently! It should not affect function.

In the sample code I used, we re used the same one RedisConnection, so hence failover should be easy. I agree that if we had grabbed a new RedisConnection and the first node is down, problem. (Actually I take that back, since there should be a backing provider that can potentially reuse the new pool discovered)

Am I missing something?

mikebell90 · 2019-10-30T18:47:07Z

I have confirmed this btw:

Provide only one uri for sentinel (the one that's on master)
kill both redis and sentinel
endless loop again until process restarted.

This strikes me as a pretty problematic thing

"SENTINEL sentinels Show a list of sentinel instances for this master, and their state."
"Adding a new Sentinel to your deployment is a simple process because of the auto-discover mechanism implemented by Sentinel. All you need to do is to start the new Sentinel configured to monitor the currently active master. Within 10 seconds the Sentinel will acquire the list of other Sentinels and the set of replicas attached to the master.

If you need to add multiple Sentinels at once, it is suggested to add it one after the other, waiting for all the other Sentinels to already know about the first one before adding the next. This is useful in order to still guarantee that majority can be achieved only in one side of a partition, in the chance failures should happen in the process of adding new Sentinels."

mikebell90 · 2019-10-30T18:49:07Z

To clarify thats from the Redis documentation https://redis.io/topics/sentinel
As an experiment I made a single seed node pointing to master/sentinel, waited 3 minutes (for replication per . their comments). I then stopped the sentinel and redis. Boom, endless loop

mp911de · 2019-10-31T14:50:32Z

Rather than failing, even with a LOG.WARN, why not remove the errant method signature. The usage seems to be largely "of convenience", rather than "correct".

That's not fully true. Master/Replica setups can be connected by using:

Master/Replica using auto-discovery (e.g. you run your Redis servers yourself and point the URI to one of the nodes)
Sentinel
Master/Replica with a static setup that cannot be auto-discovered (e.g. Redis is proxied and internal topology details use internal, non-reachable IP addresses. AWS ElastiCache with Replication is such a case)

For the last case, we require multiple URI's as we cannot determine the topology from Redis.

It leads to this much larger question. Imagine I provide 3 "seed" sentinel uris. The actual cluster may have 1000 redis nodes and 1000 sentinel nodes (just an illustration).

The RedisURI should contain more than one sentinel endpoint and probably not 1000. A reasonable number is between 3 and 5 to balance resource usage vs. availability. It's mostly up to your requirements and how likely it is that all Sentinels are not reachable and then a master node dies and how much failover you want to provide.

Regarding adding additional sentinels during runtime:

Lettuce isn't connecting automatically to newly discovered Sentinel nodes. Primarily, because no one asked and the second aspect is a security issue: Do you want this and what happens if a malicious Sentinel is added to your Sentinel cluster? Do you want to automatically trust it?
Lettuce has no configuration for these aspects using Sentinel-based discovery. It is also a difficult balance between all the things you could want to configure and complexity of usage.

mp911de · 2019-10-31T14:51:19Z

Based on your replies, when you use the method connect(…) accepting a single RedisURI instead of Iterable<RedisURI> everything works as expected. May I close this ticket?

mikebell90 · 2019-10-31T14:54:34Z

You may. Although I’d like to open another ticket to further discuss sentinel auto discovery if I may.

mikebell90 · 2019-10-31T15:09:02Z

Closing as bug per se is resolved, and larger design issue is now discussed in #1168

mikebell90 added the type: bug A general bug label Oct 4, 2019

mp911de added the status: waiting-for-feedback We need additional information before we can continue label Oct 5, 2019

mp911de removed the status: waiting-for-feedback We need additional information before we can continue label Oct 14, 2019

mp911de added the status: waiting-for-feedback We need additional information before we can continue label Oct 25, 2019

mp911de removed the status: waiting-for-feedback We need additional information before we can continue label Oct 30, 2019

mp911de added the status: waiting-for-feedback We need additional information before we can continue label Oct 30, 2019

mp911de mentioned this issue Oct 30, 2019

Add warning log if MasterReplica(…, Iterable<RedisURI>) contains multiple Sentinel URIs #1165

Closed

mikebell90 mentioned this issue Oct 31, 2019

Sentinel support should be able to autodiscover additional Sentinel nodes #1168

Open

mikebell90 closed this as completed Oct 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sentinel failover doesn't appear to function correctly #1144

Sentinel failover doesn't appear to function correctly #1144

mikebell90 commented Oct 4, 2019

mp911de commented Oct 5, 2019

mikebell90 commented Oct 5, 2019 via email •

edited

Loading

mikebell90 commented Oct 9, 2019

mikebell90 commented Oct 9, 2019

mikebell90 commented Oct 10, 2019

mikebell90 commented Oct 10, 2019

mikebell90 commented Oct 10, 2019

mikebell90 commented Oct 11, 2019

mikebell90 commented Oct 18, 2019

mp911de commented Oct 18, 2019

mp911de commented Oct 25, 2019 •

edited

Loading

mikebell90 commented Oct 25, 2019

mikebell90 commented Oct 25, 2019

mikebell90 commented Oct 25, 2019

mp911de commented Oct 30, 2019

mikebell90 commented Oct 30, 2019 •

edited

Loading

mikebell90 commented Oct 30, 2019

mikebell90 commented Oct 30, 2019

mp911de commented Oct 31, 2019

mp911de commented Oct 31, 2019

mikebell90 commented Oct 31, 2019

mikebell90 commented Oct 31, 2019

Sentinel failover doesn't appear to function correctly #1144

Sentinel failover doesn't appear to function correctly #1144

Comments

mikebell90 commented Oct 4, 2019

Bug Report

Current Behavior

Expected behavior/code

Environment

mp911de commented Oct 5, 2019

mikebell90 commented Oct 5, 2019 via email • edited Loading

mikebell90 commented Oct 9, 2019

mikebell90 commented Oct 9, 2019

mikebell90 commented Oct 10, 2019

mikebell90 commented Oct 10, 2019

mikebell90 commented Oct 10, 2019

mikebell90 commented Oct 11, 2019

mikebell90 commented Oct 18, 2019

mp911de commented Oct 18, 2019

mp911de commented Oct 25, 2019 • edited Loading

mikebell90 commented Oct 25, 2019

mikebell90 commented Oct 25, 2019

mikebell90 commented Oct 25, 2019

mp911de commented Oct 30, 2019

mikebell90 commented Oct 30, 2019 • edited Loading

mikebell90 commented Oct 30, 2019

mikebell90 commented Oct 30, 2019

mp911de commented Oct 31, 2019

mp911de commented Oct 31, 2019

mikebell90 commented Oct 31, 2019

mikebell90 commented Oct 31, 2019

mikebell90 commented Oct 5, 2019 via email •

edited

Loading

mp911de commented Oct 25, 2019 •

edited

Loading

mikebell90 commented Oct 30, 2019 •

edited

Loading