-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
redis + sentinel master pod reschedule / deletion results in two masters #5543
Comments
Note this is on 12.2.3 because that's the only version of the chart i can get working that doesn't initialise all instances as masters, as per #5347 |
Hi, Thanks for reporting. Pinging @rafariossaa as he is looking into the Redis + Sentinel issues. |
Hi @aariacarterweir , |
Hi, |
@rafariossaa sorry I haven't gotten back to you. I will give this a shot soon, but:
Yup that's correct. For now I'm using the dandydeveloper chart as it works with pod deletion and also correctly promotes only one pod to master. I'll give this chart a spin again soon though and get back to you |
I'm having the same issue, with different result. My problem is caused by the chart using: Example below with kind: → kubectl logs redis-node-0 -c sentinel
14:17:44.81 INFO ==> redis-headless.default.svc.cluster.local has my IP: 10.244.0.72
14:17:44.83 INFO ==> Cleaning sentinels in sentinel node: 10.244.0.75
Could not connect to Redis at 10.244.0.75:26379: Connection refused
14:17:49.83 INFO ==> Cleaning sentinels in sentinel node: 10.244.0.74
1
14:17:54.84 INFO ==> Sentinels clean up done
Could not connect to Redis at 10.244.0.72:26379: Connection refused
→ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP
redis-node-0 1/2 CrashLoopBackOff 8 13m 10.244.0.72
redis-node-1 2/2 Running 0 12m 10.244.0.74
redis-node-2 0/2 CrashLoopBackOff 14 12m 10.244.0.75
→ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 23h
redis ClusterIP 10.96.155.117 <none> 6379/TCP,26379/TCP 14m
redis-headless ClusterIP None <none> 6379/TCP,26379/TCP 14m |
Hi @GMartinez-Sisti , On the |
Bumping this...this is a really nasty bug and I cannot make sense of it. Bitnami redis sentinel setup is beyond unstable. I actually think this chart should be quarantined until this is resolved. I will continue to investigate and report back. |
Ok so I have gotten to the bottom of this: if you lose the pod with both the leader sentinel and leader redis, we end up in a situation where another sentinel is promoted to leader, but continues to vote for the old redis leader which is down. When the pod comes back online, start-sentinel.sh polls the quorum for leader and attempts connection, which due to the above is pointing to its own IP. This might be an issue with Redis, as it appears that if the leader sentinel goes down as it's failing over the leader redis to a follower, then the follower sentinels are unaware of the change and can never converge back on a consistent state. |
Hi, |
Hi @rafariossaa, thanks for the follow up. I was testing with: kind create cluster --name=redis-test
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install my-release bitnami/redis --set=usePassword=false --set=cluster.slaveCount=3 --set=sentinel.enabled=true --set=sentinel.usePassword=false And then executing The good news are that I can't reproduce this problem again (just tried now with |
Hi, |
Hi, I was dealing with the same issue and I can confirm that the issue seems resolved in the most recent 14.1.0 version ( commıt #6080). I was observing the same problem with the 14.0.2 version. It was not always reproducible but I could not able to find a workaround. The problem was when the master Redis pod is restarted with |
Hi @serkantul , |
Hi @rafariossaa, |
Hi, |
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback. |
I am closing this issue. |
Which chart:
bitnami/redis 12.7.4
Describe the bug
If the master pod is rescheduled / deleted manually, a new master is elected properly but when the old master comes back online it elects itself as a master too.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Expected old master to rejoin as slave
Version of Helm and Kubernetes:
helm version
:kubectl version
:Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: