-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bitnami/redis] all redis/sentinel pods become master at initial install #5347
Comments
Hi, |
We had a similar problem: we wanted to deploy 3 node Redis cluster with sentinels. We used the latest chart from "master" branch and didn't make any changes. After deploying, two of the nodes were able to communicate with themselves and choose a master. The third one didn't connect with the rest and remain single. I believe the problem is with startup scripts that generate config, there is probably a race condition somewhere. Our solution was to get back to an older commit 2cd3ec6 ([bitnami/redis] Fix Sentinel Redis with TLS). It was randomly chosen so I'm not sure which of the commit introduced a regression. Probably the one that was supposed to fix sentinels synchronization. |
I am seeing the same issue. Installing the chart with
Even replication is not setup. Manual Work around After the chart is installed
|
Similarly, I am seeing the same issue for almost many 12.y.z versions. By tuning some configuration under the 12.7.3 version, I managed to get 1 master pod and the rest of pods as replicas (slaves), and everything seemed to work normally. The same worked for the version 12.3.2 and could probably work for many other 12.y.z versions. Here is the override configuration that worked for me:
Increasing the value of "initialDelaySeconds" to 30 seconds gave enough time to the first pod to register itself as "master", then the remaining pods simply joined it as replicas (slaves). I tried with 10 seconds and then with 20 seconds, but I keep getting at least 2 masters out of 4 pods. Although this is working now for me, I would love to hear your feedbacks and see why it takes too long for the first pod to register itself as master under the recent releases (12.y.z) while it was taking seconds in 10.8.1. The increase of "initialDelaySeconds" to 30 seconds solved the issue, but is there any better solution ? Thanks. |
Hi |
@kzellag, I can confirm that your suggested settings work. I tested it out with 3 replicas. They all came up with timed gaps, and one became master, with the other two becoming replicas, as expected. Sentinels setup correctly too. |
@avadhanij, thanks for trying my settings and confirming it. I have another corner case that I have confirmed today where that settings breaks: |
That's strange. I just tried this on my Minikube(minikube v1.17.1 on Darwin 10.15.7) setup(stop and then start), and the pods came back up, and the replicas and sentinels are still correctly setup. |
My single node is an AWS EC2 instance bootstrapped with Kubernetes. Whenever the instance is rebooted (or stopped then started), Kubernetes starts all pods at the same time, which explains why there are multiple redis masters. In contrast to the initial deployment of redis, where its pods are deployed sequentially, which gives enough time to the first pod to identify itself as master then the rest of redis pods reach it as replicas. |
Hi @avadhanij , @kzellag . |
We again stumbled upon this problem even on 2cd3ec6. It is not happening every time but still it is. On latest commit from master branch problem happens every deployment. Now we are trying to set "initialDelaySeconds" to 30s and this works, but we need to do more tests to be sure. We are deploying this chart on 3 node Kubernetes 1.17 cluster. The cluster is somehow unusual because 2 of the nodes are high performance servers and the third one is VM. The problem we encounter is that sometimes high performance node create cluster with VM and the second high performance node is creating it's own cluster. So we have two masters. One of the workarounds was to start with one Redis node and scale later. Because of that I suspect some problems with generating configs during startup phase. Summary: Only changed values are: enabled sentinels and slavecount set to 3. (BTW slave count for sentinel deployment probably should just be replaced with nodeCount because it's somehow confusing) |
@rafariossaa, I tested with the following two configurations:
and
|
Hi, |
@rafariossaa, I am using the following versions - Chart - redis-12.7.3 |
Thanks for your feedback |
I'm seeing this happen too on a local kind cluster, as well as on GKE |
Hi, |
Hi, |
Hi everyone, I did the following tests: Test-1I Deployed the chart with the default values for livenessProbe/readinessProbe under "master:", "slave:" and "sentinel:", with 4 replicas. 3 redis pods started, where only one of them stayed in the "Running" state (myns-redis-node-1), while the two others (myns-redis-node-0 and myns-redis-node-2) kept switching between "Running" and "CrashLoopBackOff".
By checking the errors in the "Events" for the failing Pods (myns-redis-node-0 and myns-redis-node-2), I found that the liveness and readiness probes are failing. Events:
Test-2I have set explicitly the livenessProbe/readinessProbe values under "master:", "slave:" and "sentinel:" to 30 seconds, then all 4 redis pods started properly, as shown in:
However, when I rebooted the node (an EC2 instance), none of the redis pods managed to be in the "Running" state as shown in:
When I checked the "Events:" for one of the pods, I see that it is also failing the liveness and readiness probes. Events:
May be I am missing something, but that was my observations for testing (bitnami/redis : 12.7.7 , 2dc23f8). Note that increasing the liveness/readiness probes to 30 seconds worked for me with some previous releases (like bitnami/redis 12.1.3 , 544b7bc), but only for the initial deployment, and after rebooting the node, all pods started in the "Running" state but all of them as "master". Thank you! |
Hi @kzellag, Regarding Regarding the
I got the 5 redis nodes up and running without issues. Then I resized the cluster to 2 nodes, and couple of redis nodes needed to be redeployed in other k8s node, but it went without issues, and I got only 1 master and 4 slaves:
I got some restart because the PVC took its time to move from a node to the other. |
@rafariossaa, I pulled the latest chart and reinstalled it on my minikube cluster. I did not use the livenessProbe and readinessProbe 30 second values @kzellag provided as initial workaround. It works. Even on the first bring up, I can see that the master is correctly elected, the other two become replicas, and the sentinel info reflects it as well.
|
Hi again, Under KINDkubectl -n myns get pods and here is information about the master/replicas: myredis-node-0 : 10.244.0.5 myredis-node-1 : 10.244.0.6 myredis-node-2 : 10.244.0.7 myredis-node-3 : 10.244.0.8 Under and EC-2 instance (Similar to Test-2)kubectl -n myns get pods Logs under the "redis" container for the 3 nodes (myredis-node-0, myredis-node-1 and myredis-node-2) kubectl -n myns logs myredis-node-0 -c redis kubectl -n myns logs myredis-node-1 -c redis kubectl -n myns logs myredis-node-2 -c redis Logs under the "sentinel" container for the 3 nodes (myredis-node-0, myredis-node-1 and myredis-node-2) kubectl -n myns logs myredis-node-0 -c sentinel kubectl -n myns logs myredis-node-1 -c sentinel kubectl -n myns logs myredis-node-2 -c sentinel Here are my installation steps: helm -n myns install myredis bitnami/redis --version 12.7.7 --values redisoverrides.yaml where the file "redisoverrides.yaml" content is: master: slave: cluster: sentinel: usePassword: false metrics: I have used the same config "redisoverrides.yaml" for both tests (against a KIND cluster, and against an EC2 instance), but it is working under the first but not under the second. For the same config, when I revert back to previous releases (like 12.1.3), I get them at least starting with multiple masters, and setting the values of livenessProbe/readinessProbe to 30 seconds, comes up with a single master and the rest as replicas. |
Hi guys, |
Any update on this issue ? Thanks, |
Hi @kzellag , |
Hi @miguelaeh, Thanks, |
I've just re-opened the internal task so we can further investigate these issues. We'll get back to you soon, but unfortunately I cannot give an ETA due to Easter holidays. |
Any news here? I'm getting the message My cluster has 3 nodes and it works fine (master is reallocated when neccessary) until I delete at the same time the master and one slave. Then the cluster never comes up again. One of the killed nodes tries to come up, and warns about the above message. I've tried to set initialDelaySeconds to 30, but it's not helping at all cluster:
enabled: true
slaveCount: 2
usePassword: false
nameoverride: "redis"
architecture: replication
master:
persistence:
size: 10Gi
livenessProbe:
initialDelaySeconds: 30
readinessProbe:
initialDelaySeconds: 30
replica:
persistence:
size: 10Gi
livenessProbe:
initialDelaySeconds: 30
readinessProbe:
initialDelaySeconds: 30
sentinel:
enabled: true
usePassword: false
downAfterMilliseconds: 20000
failoverTimeout: 18000
cleanDelaySeconds: 5
livenessProbe:
initialDelaySeconds: 30
readinessProbe:
initialDelaySeconds: 30
auth:
enabled: false
sentinel: false $ kubectl get po
NAME READY STATUS RESTARTS AGE
redis-client 1/1 Running 0 114m
redis-node-0 0/2 Running 0 31s
redis-node-2 2/2 Running 0 38m
$ kubectl logs redis-node-0 -c redis
14:18:02.80 WARN ==> redis-headless.test-redis-cluster.svc.cluster.local does not contain the IP of this pod: 10.3.193.182
14:18:07.82 WARN ==> redis-headless.test-redis-cluster.svc.cluster.local does not contain the IP of this pod: 10.3.193.182
# ... after some time ...
$ kubectl get po
NAME READY STATUS RESTARTS AGE
redis-client 1/1 Running 0 117m
redis-node-0 0/2 Running 4 4m8s
redis-node-2 2/2 Running 0 42m |
Hi, |
Hi @rafariossaa! Thank you for your quick answer. I completely understand your point, and as the quorum is 2 the remaining node will never be master while it's alone. It's the desired behavior, no problem. I don't want the cluster to work under that conditions, but I want the cluster to come up again somewhen. Then, once the 3 nodes are restored, I want the cluster to continue the operative. What it's actually happening is that the 2 killed nodes never come up again. In the above example, I killed redis-node-0 (master) and redis-node-1 (slave). Then redis-node-0 tried to come up, but it was not able to do it. When I queried the logs for this node redis-node-0, the message was |
OK, I see your point. Let me add that to the task. |
@rafariossaa I've tested again the previous scenario using the latest helm&redis versions (chart v15.3.2 & app v6.2.5). This seems to be fixed, and now the cluster comes up again as expected. Even when deleting all nodes it comes up again, master is elected, and all the data is there (I have persistence enabled). From my side this is successfully fixed. @abdularis how is it on your side? |
Hi @juan-vg , |
had that issue with helm chart from the installation yesterday. uninstall it then installed again with the latest version 6.2.6 . is the issue addressed in the latest version? |
Hi @koo9 |
Same issue. |
tried the latest. it so far so good here. |
Only worked when I removed it and reinstalled it. First install it did not behave properly even though it said it installed correctly without error. |
in my previous installation, it worked then after a few days, there was an error in the log complaining the local cluster does not contain the ip of the pod. after re-installing it with the latest chart, works so far. |
Same, failed after 1 day of running...I've tried to use this in the past over a year ago and had same exact problem. It's still simply not stable. |
ideally the sentinel and redis should run on separate containers, not sure if that has anything to do with what we are seeing. |
@koo9 do you mean separate pods or containers? Right now they run on separate containers inside the same pod. |
@javsalgar you are right, I mean separate pod. |
Hi all. I am facing this issue with the latest chart. Is there any work around for now? |
Hi, |
Hm not really usable right now with this problems.. I really think there should be a hint in the readme which shows that the sentinel integration is not stable. |
Hi, this issue should be solved in recent versions of the container and Helm chart. Please, feel free to reopen this ticket if something doesn't work as expected. |
excellent!Thanks |
bitnami/redis:
redis-12.7.0
Describe the bug
Please help, I tried to install bitnami/redis with
sentinel.enabled=true
and cluster slaveCount = 3, when it's successfully installed all 3 pods deployed become redis master, when I scale the stateful set for example to 5, the new pods will become slave of one of the 3 master. I think it should only one master exist and the rest is slave.Expected behavior
It should only have one master and the rest are slave
The text was updated successfully, but these errors were encountered: