Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Follower Upscaling Behavior #310

Closed
hoyhbx opened this issue Jul 19, 2022 · 0 comments · Fixed by #361
Closed

Incorrect Follower Upscaling Behavior #310

hoyhbx opened this issue Jul 19, 2022 · 0 comments · Fixed by #361
Labels
bug Something isn't working

Comments

@hoyhbx
Copy link
Contributor

hoyhbx commented Jul 19, 2022

Before going into the report, thank you so much for recognizing our issues and even provided fix for them. We really appreciate that!

What version of redis operator are you using?

redis-operator version: We are using redis-operator built from the f1c547

Does this issue reproduce with the latest release?

Yes, it reproduces with quay.io/opstree/redis-operator:v0.11.0

What operating system and processor architecture are you using (kubectl version)?

kubectl version Output
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.1", GitCommit:"3ddd0f45aa91e2f30c70734b175631bec5b5825a", GitTreeState:"clean", BuildDate:"2022-05-24T12:26:19Z", GoVersion:"go1.18.2", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.9", GitCommit:"6df4433e288edc9c40c2e344eb336f63fad45cd2", GitTreeState:"clean", BuildDate:"2022-05-19T19:53:08Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

What did you do?

I first created a 6 node redis cluster with 3 leaders and 3 followers by applying the following YAML file. We will call this YAML file the 'original' one in the following part of this issue.

apiVersion: redis.redis.opstreelabs.in/v1beta1
kind: RedisCluster
metadata:
  name: test-cluster
spec:
  clusterSize: 3
  kubernetesConfig:
    image: quay.io/opstree/redis:v6.2.5
    imagePullPolicy: IfNotPresent
    resources:
      limits:
        cpu: 101m
        memory: 128Mi
      requests:
        cpu: 101m
        memory: 128Mi
  storage:
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi

Due to production needs, I later tried to upscale follower replicas independently of leader replicas. I found problem with upscaling followers in which case we expect having #followers > #leaders

# upscale followers to 5
apiVersion: redis.redis.opstreelabs.in/v1beta1
kind: RedisCluster
metadata:
  name: test-cluster
spec:
  clusterSize: 3
  kubernetesConfig:
    image: quay.io/opstree/redis:v6.2.5
    imagePullPolicy: IfNotPresent
    resources:
      limits:
        cpu: 101m
        memory: 128Mi
      requests:
        cpu: 101m
        memory: 128Mi
  redisFollower:
    replicas: 5
  storage:
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi

What did you expect to see?

We expect the redis cluster to be upscaled from 3 leader 3 follower to 3 leader 5 follower.

What did you see instead?

After applying the new YAML file, the test-cluster-follower generates 2 extra pods, as expected. However, the new pods are never joined into the cluster, causing a resource leak.

In addition, the operator throws some error log.

1.6551893711748614e+09	ERROR	controller_redis	Error in getting redis pod IP	{"Request.RedisManager.Namespace": "redis-operator", "Request.RedisManager.Name": "test-cluster-leader-3", "error": "pods \"test-cluster-leader-3\" not found"}
redis-operator/k8sutils.createRedisReplicationCommand
	/workspace/k8sutils/redis.go:91
redis-operator/k8sutils.ExecuteRedisReplicationCommand
	/workspace/k8sutils/redis.go:124
redis-operator/controllers.(*RedisClusterReconciler).Reconcile
	/workspace/controllers/rediscluster_controller.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227

Possible root cause and Comments

The operator assumed a one-to-one relationship between leader and follower, as revealed in redis.go/ExecuteRedisReplication#L112-L121. If the operator found a follower outside the cluster, it add the follower into the cluster by making it a replica of the "corresponding" leader. For example, follower-4 will be a replica of leader-4. However, since only 3 leader replicas are present, leader-4 does not exist, leading to an error.

We are not sure whether this is a bug since it is unspecified behavior that how the operator should configure the redis cluster when there are more follower replicas than leader replicas. However, we do think it is worthwhile looking into this problem since in the CRD we are allowed to specify different replicas for leaders and followers.

If you have an idea about what to do with this, we are more than happy to discuss with you and provide fixes (if fixes are necessary).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant