-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sentinel failover not detected when connection hangs #1314
Comments
Hi Andres, thanks for raising this and the detailed explanation! A PR would be welcome. |
@luin I'm thinking of two different approaches here:
The first one seems more reliable, but could get complicated when some sentinels are up and some are down. Also, it might not work well with the existing SentinelIterator logic. Any thoughts/recommendations? |
Yeah it's more complex than I thought. I'd go with subscribing to all sentinels because in case of a partition, as you said, it's very likely that the connected sentinel and the old master are in the same network, so we won’t be able to get events from that sentinel. As for implementation details, I think SentinelConnector will get a lastActiveSentinel property, which defaults to null. Once a node is resolved, the connector will subscribe to all sentinels provided by user (I don’t think it need to be dynamic in v1 as it seems non-trivial to implement). Not sure if it’s necessary but a reasonable connection count limit may be applied to avoid user provides too many sentinels. When a +switch-master is received, we set lastActiveSentinel to the one that got the event, and disconnect so Redis#connect() will kick in. Next time SentinelConnector will try lastActiveSentinel (and then reset it) first. Wdyt? @ohadisraeli @leibale btw do you have any inputs on the correct behaviors about whether clients should subscribe to all sentinels or not? Or may be it should behind an option so users can enable/disable? |
🎉 This issue has been resolved in version 4.27.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Hi, i use the version 4.27.6 and the issue still exists actually. |
Can you create a reproducible example? Otherwise it's extremely difficult to pinpoint the exact cause of the issue. |
Thanks for quick response.
|
I'm not familiar with this feature, could you provide a code example? |
Yes sure, It's about using keyspaces feature for example :
|
Are you sure it's the same issue and ioredis fails to detect the failover? Or could it be that the failover is actually detected and the problem is more specifically about the subscription? You could try adding extra code, for example something that increments a Redis value every second and logs the new value to the console. |
I think the subscription should be reset to another sentinel if we have a failover. From my view i think it's related. Maybe I'm wrong you can tell me. |
I agree
It is certainly related. And if the root cause of your problem is that failover detection fails, then it is also the exact same issue. If, however, it turns out that failover is successfully detected and the problem is that ioredis does not perform the necessary additional actions after successfully detecting a failover, then I would say it is a related, but separate issue. So my recommendation is to first try and find out if failover is actually detected or not. |
We're seeing this issue in version I have also tried executing the sleep command on the redis master instance, and that results in the same master election happening, with sentinel announcing new master, but our application does not initiate a new connection, so commands are still trying to execute against old master. I will try to implement the workaround that listens to the sentinel EDIT: I just noticed the option EDIT 2: I also see these errors when I turned on |
I have stumbled onto the same issue as these:
After some investigating, I concluded that ioredis currently relies on Redis closing the connection as described here
However, when the failover is initiated with the Redis
DEBUG SLEEP
command ordocker pause
, the connection simply hangs, but doesn't terminate.This could be solved by subscribing to sentinel messages on the
+switch-master
channel. Described in the Sentinel docs as "the message most external users are interested in"I've created a reproducible example: https://github.com/mjomble/ioredis-sentinel-issue
This example listens to the message outside ioredis.
Once received, it uses internal/undocumented fields to call
client.connector.stream.destroy()
becauseredis.disconnect(true)
(which callsstream.end()
) leaves the connection open in this scenario.Ideally, this could all happen inside ioredis. I could probably submit a PR if needed.
The text was updated successfully, but these errors were encountered: