Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis cluster cannot be recovered by reverting CR after affinity cannot be satisfied #480

Closed
hoyhbx opened this issue Apr 3, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@hoyhbx
Copy link
Contributor

hoyhbx commented Apr 3, 2023

What version of redis operator are you using?

redis-operator version: We are using redis-operator built from the HEAD

Does this issue reproduce with the latest release?

Yes, it reproduces with quay.io/opstree/redis-operator:v0.10.0

What operating system and processor architecture are you using (kubectl version)?

kubectl version Output
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.1", GitCommit:"3ddd0f45aa91e2f30c70734b175631bec5b5825a", GitTreeState:"clean", BuildDate:"2022-05-24T12:26:19Z", GoVersion:"go1.18.2", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.9", GitCommit:"6df4433e288edc9c40c2e344eb336f63fad45cd2", GitTreeState:"clean", BuildDate:"2022-05-19T19:53:08Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

What did you do?

I first created a 6 node redis cluster with 3 leaders and 3 followers by applying the following YAML file. We will call this YAML file the 'original' one in the following part of this issue. What can be done with you?

apiVersion: redis.redis.opstreelabs.in/v1beta1
kind: RedisCluster
metadata:
  name: test-cluster
spec:
  clusterSize: 3
  kubernetesConfig:
    image: quay.io/opstree/redis:v6.2.5
    imagePullPolicy: IfNotPresent
    resources:
      limits:
        cpu: 101m
        memory: 128Mi
      requests:
        cpu: 101m
        memory: 128Mi
  storage:
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi

We later changed the Affinity rule, but then realized the Affinity rule cannot be satisfied. There is always one redis pod not scheduled. Then we try to recover the cluster by reverting to the original CR, to remove the unsatisfiable affinity rule. However, the redis-operator updates the statefulset to remove the Affinity rule, but the pods still have the old bad affinity rule, causing the Redis cluster to remain in the error state.

What did you expect to see?

We expect the Affinity rule from the pods to be removed after removing it from CR.

What did you see instead?

Redis cluster continue to have one less replica than desired.

Possible root cause and Comments
It may be caused by this known limitation of statefulSet: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#forced-rollback

@iamabhishek-dubey
Copy link
Member

We have already introduced the force recreation of statefulset for issues like this, please upgrade the operator to the latest version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants