-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sentinel failover doesn't appear to function correctly #1144
Comments
Can you attach a stack trace that shows the error? To rephrase your requirement: If any of the given sentinels is not reachable, then you want to still be able to connect to Redis This functionality is built into |
I will whip up another test and attach the info with lettuce logger at
trace. Might be Monday or Tuesday
|
Attached will be the following: Here is how we duped 4 is master now, sentinel 4 is brought down and master. 1 goes back up, and elects 3 |
@mp911de Let me know what else you need. FWIW I couldn't find what in the topology provider ever reset the initial list of sentinel nodes. You then have code loopng through it, so whatever's first SEEMS like a POF. But might have missed something, thanks for your assistance |
More details: If and only if the FIRST node listed in the config.yaml is the one as MASTER, and I kill both sentinel and redis, we get a deadlock of infinite reconnects.. If we have another node listed first, all is well. If it's not master it appears to be well (although that might just be luck) I saw some code that always iterated throughh a list sequentially - I can't find it right now, but it looked like it needed an incrementing round-robin weight to work properly. Ah, it's SentinelTopologyRefresh. But to be fair I've not seen if that gets reinstantiated somewhere. |
I should also point out I provided my config and program. I may have changed a few defaults unwisely - see RedisConfiguration, which is also quoted above. |
Ohh and it's Lettuce 5.1.7. Sorry, was inaccurate before |
I tried Lettuce 5.2.0 - same behavior |
@mp911de just confirming if you need any additional information ? |
I'm currently buried under a pile of issues and bugs I need to follow up. |
I'm not sure I understand the issue correctly. I tried the following steps:
In every case, I've seen a couple Can you help me reproduce the issue? |
Yes I'll be happy to write up a more detailed thing. Hopefully get to it inn a few minutes. Thanks as always Mark. |
I've attached a detailed workthrough. I hope it is sufficient. We've replicaed this on multiple clusters here of various sizes. This is our tiny test cluster (3 nodes, min for sentinel test) |
There ya go @mp911de I hope its adequate docs. |
Thanks for the update. The issue seems to be the way how RedisURI uri = RedisURI.builder()
.withSentinel("arch-test-node-a-redis-ci-sf.otenv.com", 26379)
.withSentinel("arch-test-node-b-redis-ci-sf.otenv.com", 26379)
.withSentinel("arch-test-node-c-redis-ci-sf.otenv.com", 26379)
.withSentinelMasterId("arch").build();
MasterReplica.connect(client, codec, uri); The It would make sense to add a bit of logging here to indicate that only the first URI was used and if the Can you change your code and run that test again? |
That appears to have fixed the issue. Thank you. However I have two followup questions:
Here's what I'd expect
It's the very last part I'm worried about. Because In the sample code I used, we re used the same one RedisConnection, so hence failover should be easy. I agree that if we had grabbed a new RedisConnection and the first node is down, problem. (Actually I take that back, since there should be a backing provider that can potentially reuse the new pool discovered) Am I missing something? |
I have confirmed this btw:
This strikes me as a pretty problematic thing
If you need to add multiple Sentinels at once, it is suggested to add it one after the other, waiting for all the other Sentinels to already know about the first one before adding the next. This is useful in order to still guarantee that majority can be achieved only in one side of a partition, in the chance failures should happen in the process of adding new Sentinels." |
To clarify thats from the Redis documentation https://redis.io/topics/sentinel |
That's not fully true. Master/Replica setups can be connected by using:
For the last case, we require multiple URI's as we cannot determine the topology from Redis.
The Regarding adding additional sentinels during runtime: Lettuce isn't connecting automatically to newly discovered Sentinel nodes. Primarily, because no one asked and the second aspect is a security issue: Do you want this and what happens if a malicious Sentinel is added to your Sentinel cluster? Do you want to automatically trust it? |
Based on your replies, when you use the method |
You may. Although I’d like to open another ticket to further discuss sentinel auto discovery if I may. |
Closing as bug per se is resolved, and larger design issue is now discussed in #1168 |
Bug Report
So I'm a redis/lettuce n00b, and I may be missing something (probably am), but I can't get redis with sentinel (not Redis Cluster) to function 100% on failover.
Current Behavior
So . my goal was to follow .the stated best practice - keep one cached connection for all my simple gets and puts. But I need HA, and reasonably fast failover. So we have Sentinel.
I'll put wiring details below but here are some relevant general usages
Expected behavior/code
Actual behavior
Am I missing something? The docs strongly imply that the topology is automatically refreshed, and one would also assume the driver understands Sentinel never "gives up" on an old Sentinel node, but the driver should try connecting to multiple Sentinels and reshuffle do to errors.
Environment
The text was updated successfully, but these errors were encountered: