You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
changing replicas sequence test 1 -> 3 -> 1 -> 3 not works on latest step, Second ZK pod doesn't starts due to 'java.lang.RuntimeException: My id 2 not in the peer list' exception
#448
kubectl delete ns test
kubectl create ns test
kubectl apply -n test -f zookeeper-operator-1-node.yaml
sleep 300
kubectl get pods -n test -l app=zookeeper # return zookeeper-0 in running state
kubectl apply -n test -f zookeeper-operator-3-node.yaml
sleep 300
kubectl get pods -n test -l app=zookeeper # return zookeeper-0, zookeeper-1, zookeeper-2 in running state
kubectl apply -n test -f zookeeper-operator-1-node.yaml
sleep 300
kubectl get pods -n test -l app=zookeeper # return zookeeper-0 in running state# error HERE !!
kubectl apply -n test -f zookeeper-operator-3-node.yaml
sleep 300
kubectl get pods -n test -l app=zookeeper # return zookeeper-0 in running and ready state, zookeeper-1 in CrashLoopBack
Error message
2022-03-29 10:22:12,345 [myid:2] - ERROR [main:QuorumPeerMain@113] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: My id 2 not in the peer list
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1077)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90)
2022-03-29 10:22:12,346 [myid:2] - INFO [main:ZKAuditProvider@42] - ZooKeeper audit is disabled. 2022-03-29 10:22:12,348 [myid:2] - ERROR [main:ServiceUtils@42] - Exiting JVM with code 1
The text was updated successfully, but these errors were encountered:
@Slach I could see that the reclaimpolicy is set to Retain , due to which pvc entries are not cleaned correctly on scale down. Could you please set reclaimpolicy to Delete and perform the same operations
I found myself with this issue, and setting reclaimPolicy: Delete did not cause the PVCs to get deleted on scale down. However, I dug into the code and found this change #313, and found that my PVCs didn't have the uid label, I just saw logs like the following:
even though I expected it to find some PVCs and delete them.
I guess this means the statefulset was prior to that change, so that label was never set on my statefulset for the PVCs.
Because I'm using solr-operator, which manages the zookeeperclusters.zookeeper.pravega.io resources for me, my solution was to just delete the zookeeper resource and let it get recreated. In my situation, this is OK. There might be neater ways of handling this.
Description
Multiple scale-up, scale-down operation doesn't wok
Looks similar with #315, but increase
initialDelaySeconds: 30
doesn't helpImportance
It is critical important for our environment, we attempt to scale-up during heavy workloads and scale-down to decrease costs.
Steps to reproduce
we use latest
0.2.13
releaseand use two following manifests
zookeeper-operator-1-node.yaml
zookeeper-operator-3-node.yaml (same manifest only changed
replicas: 3
)simple bash script to reproduce
Error message
The text was updated successfully, but these errors were encountered: