K8SSAND-1467 ⁃ The newly spawned pods do not use the up-to-date server-config-init under certain conditions #324

hoyhbx · 2022-04-21T03:22:48Z

What did you do?

As mentioned in #150, configBuilderResources, the resource configuration of server-config-init can be set by configBuilderResources. We realized that under certain conditions, the new pods do not necessarily use the most up-to-date resource configuration. For example, as shown in the steps below, when we scale up the cluster and change the resource configuration at the same time, the newly spawned pod does not use the updated configBuilderResources.

We also found that even if we explicitly separate the configBuilderResources change and scale-up change into two steps, if there are pods that are not ready yet during the two operations, the statefulSet won't get updated immediately and the newly spawned pods still use the old configBuilderResources.

# Step1: Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.7.1/cert-manager.yaml

# Step2: Install operator
kubectl apply -f init.yaml
kubectl apply --force-conflicts --server-side -k 'github.com/k8ssandra/cass-operator/config/deployments/cluster?ref=v1.10.3'

# Step3: Apply custom resource
kubectl apply -f cr1_spec.yaml

# Step4: Check CR for the "Config Builder Resources" field => The field is the same as cr1_spec.yaml
kubectl describe cassandradatacenters.cassandra.datastax.com cassandra-datacenter

#  Config Builder Resources:
#    Requests:
#      Cpu:     512m
#      Memory:  100m

# Step5: Check statefulset for the resource request config of server-config-init => Same as cr1_spec.yaml
kubectl describe statefulsets.apps cluster1-cassandra-datacenter-default-sts

#   server-config-init:
#    Image:      datastax/cass-config-builder:1.0.4-ubi7
#    Port:       <none>
#    Host Port:  <none>
#    Requests:
#      cpu:     512m
#      memory:  100m

# Step6: Update cassandra-datacenter
kubectl apply -f cr2_spec.yaml

# Step7: Check CR for the "Config Builder Resources" field => The field is the same as cr2_spec.yaml
kubectl describe cassandradatacenters.cassandra.datastax.com cassandra-datacenter

#  Config Builder Resources:
#    Requests:
#      Cpu:     1024m
#      Memory:  200m

# Step8: Check Pods for the resource request config of server-config-init => Not the same as cr2_spec.yaml
kubectl describe pod cluster1-cassandra-datacenter-default-sts-0

#   server-config-init:
#    Image:      datastax/cass-config-builder:1.0.4-ubi7
#    Port:       <none>
#    Host Port:  <none>
#    Requests:
#      cpu:     512m
#      memory:  100m

kubectl describe pod cluster1-cassandra-datacenter-default-sts-1

#   server-config-init:
#    Image:      datastax/cass-config-builder:1.0.4-ubi7
#    Port:       <none>
#    Host Port:  <none>
#    Requests:
#      cpu:     512m
#      memory:  100m

init.yaml

 apiVersion: storage.k8s.io/v1
 kind: StorageClass
 metadata:
   # Changing the name to server-storage is the only change we have made compared to upstream
   name: server-storage
 provisioner: rancher.io/local-path
 volumeBindingMode: WaitForFirstConsumer
 reclaimPolicy: Delete

cr1_spec.yaml

 apiVersion: cassandra.datastax.com/v1beta1
 kind: CassandraDatacenter
 metadata:
   name: cassandra-datacenter
 spec:
   clusterName: cluster1
   config:
     cassandra-yaml:
       authenticator: org.apache.cassandra.auth.PasswordAuthenticator
       authorizer: org.apache.cassandra.auth.CassandraAuthorizer
       role_manager: org.apache.cassandra.auth.CassandraRoleManager
     jvm-options:
       initial_heap_size: 800M
       max_heap_size: 800M
   configBuilderResources:
     requests:
       memory: 100m
       cpu: 512m
   managementApiAuth:
     insecure: {}
   serverType: cassandra
   serverVersion: 3.11.7
   size: 1
   storageConfig:
     cassandraDataVolumeClaimSpec:
       accessModes:
       - ReadWriteOnce
       resources:
         requests:
           storage: 3Gi
       storageClassName: server-storage

cr2_spec.yaml (Update size and configBuilderResources)

 apiVersion: cassandra.datastax.com/v1beta1
 kind: CassandraDatacenter
 metadata:
   name: cassandra-datacenter
 spec:
   clusterName: cluster1
   config:
     cassandra-yaml:
       authenticator: org.apache.cassandra.auth.PasswordAuthenticator
       authorizer: org.apache.cassandra.auth.CassandraAuthorizer
       role_manager: org.apache.cassandra.auth.CassandraRoleManager
     jvm-options:
       initial_heap_size: 800M
       max_heap_size: 800M
   configBuilderResources:
     requests:
       memory: 200m
       cpu: 1024m
   managementApiAuth:
     insecure: {}
   serverType: cassandra
   serverVersion: 3.11.7
   size: 2
   storageConfig:
     cassandraDataVolumeClaimSpec:
       accessModes:
       - ReadWriteOnce
       resources:
         requests:
           storage: 3Gi
       storageClassName: server-storage

Did you expect to see some different?

In Step8, I expected that server-config-init should have the same resource request configurations as cr2_spec.yaml. As mentioned in #150, configBuilderResources, the resource configuration of server-config-init can be set by configBuilderResources. We updated the variable size in cr2_spec.yaml, and thus the number of Pod will increase from 1 to 2. The new Pod will trigger the init containers which run before app containers (k8s doc). Hence, at least the new Pod should reflect the new resource requirement specified in cr2_spec.yaml (memory: 200m, cpu: 1024m).

Environment

Cass Operator version:

docker.io/k8ssandra/cass-operator:v1.10.3
* Kubernetes version information:```

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:11:29Z", GoVersion:"go1.16.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:25:06Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}


* Kubernetes cluster kind:

minikube start --vm-driver=docker --cpus 4 --memory 4096 --kubernetes-version v1.21.0


# Root cause
We did some investigation and we found the following maybe the possible root cause.
The function `ReconcileAllRacks` updated the  Replica number of the StatefulSet (i.e. `size` in cr1_spec.yaml and cr2_spec.yaml) before updated the podTemplate in the StatefulSet. Hence, when we update the field `size` and `configBuilderResources` at the same time, the new Pod will be created with stale podTemplate.

To elaborate, the function `ReconcileAllRacks` updated the Replica number of the StatefulSet at [reconcile_racks.go#L2416](https://github.com/k8ssandra/cass-operator/blob/c9020efb832cbfad60142194659db230e1f6995d/pkg/reconciliation/reconcile_racks.go#L2416) and updated the podTemplate in the StatefulSet at [reconcile_racks.go#L2440](https://github.com/k8ssandra/cass-operator/blob/c9020efb832cbfad60142194659db230e1f6995d/pkg/reconciliation/reconcile_racks.go#L2440). In my opinion, we need to change the order of L2416 and L2440, and then the new Pod will be spawned with new podTemplate.



┆Issue is synchronized with this [Jira Task](https://k8ssandra.atlassian.net/browse/K8SSAND-1467) by [Unito](https://www.unito.io)
┆friendlyId: K8SSAND-1467
┆priority: Medium

The text was updated successfully, but these errors were encountered:

burmanm · 2022-04-21T08:42:26Z

Swapping the lines you proposed would not work as then the reconcile would modify the existing pods while trying to scale (among other things). The scaling behavior is intended to use the previous observed state of the CassandraDatacenter. Otherwise you would run into weird state where changing the CassDc while scaling would, instead of finishing the scaling, start suddenly modifying and restarting the existing pods before continuing with the scaling (or down scaling / decommission etc).

So at this point, the behavior is as intended. Scaling up or down Cassandra is usually a time and resource consuming process, which probably shouldn't happen with large amount of rolling restarts to existing pods unless one wants to endanger the production.

hoyhbx · 2022-05-04T21:04:57Z

Thanks for the detailed explanation! I will close this issue for now.

hoyhbx added the bug Something isn't working label Apr 21, 2022

sync-by-unito bot changed the title ~~The newly spawned pods do not use the up-to-date server-config-init under certain conditions~~ K8SSAND-1467 ⁃ The newly spawned pods do not use the up-to-date server-config-init under certain conditions Apr 21, 2022

tylergu mentioned this issue Apr 21, 2022

[Bug Report] (cass-operator) The newly spawned pods do not use the up-to-date server-config-init under certain conditions xlab-uiuc/acto#68

Closed

kevin85421 mentioned this issue Apr 21, 2022

K8SSAND-1461 ⁃ configBuilderResources does not have impact on server-config-init #321

Closed

tylergu mentioned this issue Apr 21, 2022

Progress summary 04/21 xlab-uiuc/acto#69

Closed

hoyhbx closed this as completed May 4, 2022

tylergu mentioned this issue May 10, 2022

[Bug Report] (cass-operator) Unable to update nodeAffinityLabels once setted with a bad value xlab-uiuc/acto#79

Closed

hoyhbx mentioned this issue May 19, 2022

Unable to update Cassandra cluster once it gets into unhealthy state #334

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8SSAND-1467 ⁃ The newly spawned pods do not use the up-to-date server-config-init under certain conditions #324

K8SSAND-1467 ⁃ The newly spawned pods do not use the up-to-date server-config-init under certain conditions #324

hoyhbx commented Apr 21, 2022 •

edited by sync-by-unito bot

Loading

burmanm commented Apr 21, 2022

hoyhbx commented May 4, 2022

K8SSAND-1467 ⁃ The newly spawned pods do not use the up-to-date server-config-init under certain conditions #324

K8SSAND-1467 ⁃ The newly spawned pods do not use the up-to-date server-config-init under certain conditions #324

Comments

hoyhbx commented Apr 21, 2022 • edited by sync-by-unito bot Loading

What did you do?

Did you expect to see some different?

Environment

burmanm commented Apr 21, 2022

hoyhbx commented May 4, 2022

hoyhbx commented Apr 21, 2022 •

edited by sync-by-unito bot

Loading