Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentinel failover doesn't appear to function correctly #1144

Closed
mikebell90 opened this issue Oct 4, 2019 · 22 comments
Closed

Sentinel failover doesn't appear to function correctly #1144

mikebell90 opened this issue Oct 4, 2019 · 22 comments
Labels
status: waiting-for-feedback We need additional information before we can continue type: bug A general bug

Comments

@mikebell90
Copy link

Bug Report

So I'm a redis/lettuce n00b, and I may be missing something (probably am), but I can't get redis with sentinel (not Redis Cluster) to function 100% on failover.

Current Behavior

So . my goal was to follow .the stated best practice - keep one cached connection for all my simple gets and puts. But I need HA, and reasonably fast failover. So we have Sentinel.

I'll put wiring details below but here are some relevant general usages

  • Test harness simply does 1 put and then 100 gets and logs errors and retries.
  • The RedisConnection is kept cached. Every 5 minutes (configurable), there's an unconditional recyclng where the old connection is killed and a new one is provided as a lazy supplier. That's meant to be a mitigant for unexpected conditions - not the primary fix.
  • I attempted to choose reasonable timeouts and reconnect behavior.
  • We provide 3 sentinel uris as the "seed"
public RedisConfiguration(@Value("${redis.nodes:}")
                              List<String> redisNodes,
                              @Value("${redis.enabled:true}")
                              Boolean redisEnabled,
                              @Value("${redis.resetConnectionTimeMinutes:5}")
                              Long resetConnectionTime,
                              @Value("${redis.commandTimeoutMils:250}")
                              Long commandTimeout,
                              @Value("${redis.maxDelayReconnect:10000}")
                              Long maxDelayReconnect,
                              @Value("${redis.password:}")
                                      String password) {

        this.redisEnabled = redisEnabled;

        this.resetConnectionTimeMinutes = resetConnectionTime;

        this.commandTimeout = commandTimeout;

        this.maxDelayReconnect = maxDelayReconnect;

        redisURIs = redisNodes.stream().map(el -> {
            final String host = el.split(":")[0];
            final Integer port = Integer.valueOf(el.split(":")[1]);
            final RedisURI.Builder builder = RedisURI.builder()
                                       .withSentinel(host,port)
                    .withSentinelMasterId("dbatest");

            if (!StringUtils.isEmpty(password)) {
                builder.withPassword(password);
            }
            return  builder.build();
        }).collect(Collectors.toList());

        redisClient.set(getRedisClient());
        redisConnection.set(getRedisConnection(redisClient.get()));
        redisCommand.set(getRedisCommand(redisConnection.get()));

        scheduledExecutorService.schedule(() -> {
            LOG.info("Running standard recycle");
            resetRedisConnectionWithScheduleReset(true);
            return null;
        }, resetConnectionTime, TimeUnit.MINUTES);

    }

    /**
     * Creates client resources
     * @return
     */
    private ClientResources getClientResources() {
        return DefaultClientResources.create();
    }

    /**
     * Returns Redis client
     * @return
     */
    private RedisClient getRedisClient() {

        if (!redisEnabled) {
            return null;
        }

        final SocketOptions socketOptions = SocketOptions.builder()
                                            .connectTimeout(Duration.ofMillis(200))
                                            .build();

        final ClientOptions clientOptions = ClientOptions.builder()
                                            .socketOptions(socketOptions)
                                            .autoReconnect(true)
                                            .suspendReconnectOnProtocolFailure(true)
                                            .cancelCommandsOnReconnectFailure(true)
                                            .pingBeforeActivateConnection(true)
                                            .disconnectedBehavior(ClientOptions.DisconnectedBehavior.REJECT_COMMANDS)
                                            .timeoutOptions(TimeoutOptions
                                                            .builder()
                                                            .fixedTimeout(Duration.ofMillis(commandTimeout))
                                                            .build())
                                            .build();
        RedisClient client = RedisClient.create(getClientResources());
        client.setOptions(clientOptions);

        return client;
    }

    public Boolean resetRedisConnectionWithScheduleReset(Boolean scheduleReset) {

        RedisClient redisClientBuffer = getRedisClient();
        StatefulRedisMasterSlaveConnection<String, String> redisConnectionBuffer = getRedisConnection(redisClientBuffer); //NOPMD
        RedisCommands<String,String> redisCommandBuffer = getRedisCommand(redisConnectionBuffer);

        RedisClient previousClient = redisClient.getAndSet(redisClientBuffer);
        StatefulRedisMasterSlaveConnection<String, String> previousConnection = redisConnection.getAndSet(redisConnectionBuffer); //NOPMD
        redisCommand.set(redisCommandBuffer);

        if (previousConnection != null) {
            previousConnection.close();
        }
        if (previousClient != null) {
            previousClient.getResources().shutdown();
            previousClient.shutdown();
        }

        if (scheduleReset) {
            this.scheduledExecutorService.schedule(() -> {
                LOG.info("Running (non seed)");
                resetRedisConnectionWithScheduleReset(true);
                return null;
            }, resetConnectionTimeMinutes, TimeUnit.MINUTES);
        }

        return true;
    }

Expected behavior/code

  • When a Redis slave or master goes down, we get errors until the master is reelected or we switch slaves automaitcally (the driver does this). This functions pretty well
  • When a sentinel node goes down, we might get an error or two, but the driver moves onto other Sentinel nodes its discovered, and added/subtracted nodes are reflected.

Actual behavior

  • Redis failover itself works well
  • However if . the first listed Sentinel uri is brought down, the driver keeps throwing exceptions (can't connect to blah blah blah), and the driver never tries another Sentinel URI, despite 2 other seeds being provided and an additional 2 not provided in seed being available.

Am I missing something? The docs strongly imply that the topology is automatically refreshed, and one would also assume the driver understands Sentinel never "gives up" on an old Sentinel node, but the driver should try connecting to multiple Sentinels and reshuffle do to errors.

Environment

  • Lettuce 5.2
  • Redis 5
@mikebell90 mikebell90 added the type: bug A general bug label Oct 4, 2019
@mp911de
Copy link
Collaborator

mp911de commented Oct 5, 2019

Can you attach a stack trace that shows the error? To rephrase your requirement: If any of the given sentinels is not reachable, then you want to still be able to connect to Redis

This functionality is built into RedisClient and is expected to work. Having the stack trace should give us a hint what's going on.

@mp911de mp911de added the status: waiting-for-feedback We need additional information before we can continue label Oct 5, 2019
@mikebell90
Copy link
Author

mikebell90 commented Oct 5, 2019 via email

@mikebell90
Copy link
Author

Attached will be the following:
our testbed code (check out subfolder testbed-sentinel)
the stack trace

Here is how we duped
sentinel 1-4
1 is master
sentinel 1 down, then redis 1 down, no issues, though it complains for a file

4 is master now, sentinel 4 is brought down and master. 1 goes back up, and elects 3
now we seem stuck in the reconnect
testbeds.zip
lettuce.txt.zip

@mikebell90
Copy link
Author

@mp911de Let me know what else you need. FWIW I couldn't find what in the topology provider ever reset the initial list of sentinel nodes. You then have code loopng through it, so whatever's first SEEMS like a POF. But might have missed something, thanks for your assistance

@mikebell90
Copy link
Author

More details:
I set up a test cluster - a,b,c.
Each has both sentinel and redis

If and only if the FIRST node listed in the config.yaml is the one as MASTER, and I kill both sentinel and redis, we get a deadlock of infinite reconnects..

If we have another node listed first, all is well. If it's not master it appears to be well (although that might just be luck)

I saw some code that always iterated throughh a list sequentially - I can't find it right now, but it looked like it needed an incrementing round-robin weight to work properly.

Ah, it's SentinelTopologyRefresh. But to be fair I've not seen if that gets reinstantiated somewhere.

@mikebell90
Copy link
Author

I should also point out I provided my config and program. I may have changed a few defaults unwisely - see RedisConfiguration, which is also quoted above.

@mikebell90
Copy link
Author

Ohh and it's Lettuce 5.1.7. Sorry, was inaccurate before

@mikebell90
Copy link
Author

I tried Lettuce 5.2.0 - same behavior

@mp911de mp911de removed the status: waiting-for-feedback We need additional information before we can continue label Oct 14, 2019
@mikebell90
Copy link
Author

@mp911de just confirming if you need any additional information ?

@mp911de
Copy link
Collaborator

mp911de commented Oct 18, 2019

I'm currently buried under a pile of issues and bugs I need to follow up.

@mp911de
Copy link
Collaborator

mp911de commented Oct 25, 2019

I'm not sure I understand the issue correctly. I tried the following steps:

  1. Start the 3 sentinels, Master (Node 1) and Replica (Node 2)
  2. Kill the Master (Node 1) -> Errors until failover to replica is done. Node 2 is now the new master
  3. Kill the first Sentinel
  4. Start the old Master (Node 1) again. Comes back up as replica
  5. Kill the new Master that was elected in 2 (Node 2) -> Errors until failover to replica is done. Node 1 is now the new master.

In every case, I've seen a couple RedisException: Currently not connected. Commands are rejected. until failover completed.

Can you help me reproduce the issue?

@mp911de mp911de added the status: waiting-for-feedback We need additional information before we can continue label Oct 25, 2019
@mikebell90
Copy link
Author

Yes I'll be happy to write up a more detailed thing. Hopefully get to it inn a few minutes. Thanks as always Mark.

@mikebell90
Copy link
Author

I've attached a detailed workthrough. I hope it is sufficient. We've replicaed this on multiple clusters here of various sizes. This is our tiny test cluster (3 nodes, min for sentinel test)

testingredis.txt

@mikebell90
Copy link
Author

There ya go @mp911de I hope its adequate docs.

@mp911de mp911de removed the status: waiting-for-feedback We need additional information before we can continue label Oct 30, 2019
@mp911de
Copy link
Collaborator

mp911de commented Oct 30, 2019

Thanks for the update. The issue seems to be the way how RedisURI is configured. The URI itself is a container for multiple sentinel endpoints. So you should rather have a setup that correlates with:

RedisURI uri = RedisURI.builder()
        .withSentinel("arch-test-node-a-redis-ci-sf.otenv.com", 26379)
        .withSentinel("arch-test-node-b-redis-ci-sf.otenv.com", 26379)
        .withSentinel("arch-test-node-c-redis-ci-sf.otenv.com", 26379)
        .withSentinelMasterId("arch").build();

MasterReplica.connect(client, codec, uri);

The MasterReplica.connect(…) method that accepts a List<RedisURI> is primarily intended for static setups and not Redis Sentinel. That method picks the first RedisURI from that list and ignores the remaining RedisURI objects.

See also: https://github.com/lettuce-io/lettuce-core/blob/52ee0dca68fb9bcd011c1574e3ea0150ca350eb6/src/main/java/io/lettuce/core/masterslave/MasterSlave.java#L209-L210

It would make sense to add a bit of logging here to indicate that only the first URI was used and if the List contains more than one entry, then there should be a warning. Otherwise, the arrangement gets dropped unnoticed.

Can you change your code and run that test again?

@mikebell90
Copy link
Author

mikebell90 commented Oct 30, 2019

That appears to have fixed the issue. Thank you. However I have two followup questions:

  • Rather than failing, even with a LOG.WARN, why not remove the errant method signature. The usage seems to be largely "of convenience", rather than "correct".
  • It leads to this much larger question. Imagine I provide 3 "seed" sentinel uris. The actual cluster may have 1000 redis nodes and 1000 sentinel nodes (just an illustration). The cluster might change size, so including every sentinel node as seed is not practical or desirable. Instead we choose a reaosnable number of seed nodes spread across availability groups.

Here's what I'd expect

  • If all 3 of the seeds are down at startup I fail, game over. After initial connection and pubsub however....

  • If any master or replica goes down I'd expect failover - we get this

  • if any sentinel process goes down I'd also expect failover. In other words, after initial connection, you know about ALL 997 other sentinels, and hence EVEN if all 3 original seed sentinels go down, this is but a mere nuisance, you just connect over to another one. Once/IF they ever go up again, great.

It's the very last part I'm worried about. Because list.get(0) is terrible, sure, because your seed is only one node, and hence successful startup of a connection is compromised. But once up, who cares? - take that node down permanently! It should not affect function.

In the sample code I used, we re used the same one RedisConnection, so hence failover should be easy. I agree that if we had grabbed a new RedisConnection and the first node is down, problem. (Actually I take that back, since there should be a backing provider that can potentially reuse the new pool discovered)

Am I missing something?

@mikebell90
Copy link
Author

I have confirmed this btw:

  • Provide only one uri for sentinel (the one that's on master)
  • kill both redis and sentinel
  • endless loop again until process restarted.

This strikes me as a pretty problematic thing

  • "SENTINEL sentinels Show a list of sentinel instances for this master, and their state."
  • "Adding a new Sentinel to your deployment is a simple process because of the auto-discover mechanism implemented by Sentinel. All you need to do is to start the new Sentinel configured to monitor the currently active master. Within 10 seconds the Sentinel will acquire the list of other Sentinels and the set of replicas attached to the master.

If you need to add multiple Sentinels at once, it is suggested to add it one after the other, waiting for all the other Sentinels to already know about the first one before adding the next. This is useful in order to still guarantee that majority can be achieved only in one side of a partition, in the chance failures should happen in the process of adding new Sentinels."

@mikebell90
Copy link
Author

To clarify thats from the Redis documentation https://redis.io/topics/sentinel
As an experiment I made a single seed node pointing to master/sentinel, waited 3 minutes (for replication per . their comments). I then stopped the sentinel and redis. Boom, endless loop

@mp911de
Copy link
Collaborator

mp911de commented Oct 31, 2019

Rather than failing, even with a LOG.WARN, why not remove the errant method signature. The usage seems to be largely "of convenience", rather than "correct".

That's not fully true. Master/Replica setups can be connected by using:

  • Master/Replica using auto-discovery (e.g. you run your Redis servers yourself and point the URI to one of the nodes)
  • Sentinel
  • Master/Replica with a static setup that cannot be auto-discovered (e.g. Redis is proxied and internal topology details use internal, non-reachable IP addresses. AWS ElastiCache with Replication is such a case)

For the last case, we require multiple URI's as we cannot determine the topology from Redis.

It leads to this much larger question. Imagine I provide 3 "seed" sentinel uris. The actual cluster may have 1000 redis nodes and 1000 sentinel nodes (just an illustration).

The RedisURI should contain more than one sentinel endpoint and probably not 1000. A reasonable number is between 3 and 5 to balance resource usage vs. availability. It's mostly up to your requirements and how likely it is that all Sentinels are not reachable and then a master node dies and how much failover you want to provide.

Regarding adding additional sentinels during runtime:

Lettuce isn't connecting automatically to newly discovered Sentinel nodes. Primarily, because no one asked and the second aspect is a security issue: Do you want this and what happens if a malicious Sentinel is added to your Sentinel cluster? Do you want to automatically trust it?
Lettuce has no configuration for these aspects using Sentinel-based discovery. It is also a difficult balance between all the things you could want to configure and complexity of usage.

@mp911de
Copy link
Collaborator

mp911de commented Oct 31, 2019

Based on your replies, when you use the method connect(…) accepting a single RedisURI instead of Iterable<RedisURI> everything works as expected. May I close this ticket?

@mikebell90
Copy link
Author

You may. Although I’d like to open another ticket to further discuss sentinel auto discovery if I may.

@mikebell90
Copy link
Author

Closing as bug per se is resolved, and larger design issue is now discussed in #1168

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: waiting-for-feedback We need additional information before we can continue type: bug A general bug
Projects
None yet
Development

No branches or pull requests

2 participants