You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As mentioned in #150, configBuilderResources, the resource configuration of server-config-init can be set by configBuilderResources. We realized that under certain conditions, the new pods do not necessarily use the most up-to-date resource configuration. For example, as shown in the steps below, when we scale up the cluster and change the resource configuration at the same time, the newly spawned pod does not use the updated configBuilderResources.
We also found that even if we explicitly separate the configBuilderResources change and scale-up change into two steps, if there are pods that are not ready yet during the two operations, the statefulSet won't get updated immediately and the newly spawned pods still use the old configBuilderResources.
# Step1: Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.7.1/cert-manager.yaml
# Step2: Install operator
kubectl apply -f init.yaml
kubectl apply --force-conflicts --server-side -k 'github.com/k8ssandra/cass-operator/config/deployments/cluster?ref=v1.10.3'# Step3: Apply custom resource
kubectl apply -f cr1_spec.yaml
# Step4: Check CR for the "Config Builder Resources" field => The field is the same as cr1_spec.yaml
kubectl describe cassandradatacenters.cassandra.datastax.com cassandra-datacenter
# Config Builder Resources:# Requests:# Cpu: 512m# Memory: 100m# Step5: Check statefulset for the resource request config of server-config-init => Same as cr1_spec.yaml
kubectl describe statefulsets.apps cluster1-cassandra-datacenter-default-sts
# server-config-init:# Image: datastax/cass-config-builder:1.0.4-ubi7# Port: <none># Host Port: <none># Requests:# cpu: 512m# memory: 100m# Step6: Update cassandra-datacenter
kubectl apply -f cr2_spec.yaml
# Step7: Check CR for the "Config Builder Resources" field => The field is the same as cr2_spec.yaml
kubectl describe cassandradatacenters.cassandra.datastax.com cassandra-datacenter
# Config Builder Resources:# Requests:# Cpu: 1024m# Memory: 200m# Step8: Check Pods for the resource request config of server-config-init => Not the same as cr2_spec.yaml
kubectl describe pod cluster1-cassandra-datacenter-default-sts-0
# server-config-init:# Image: datastax/cass-config-builder:1.0.4-ubi7# Port: <none># Host Port: <none># Requests:# cpu: 512m# memory: 100m
kubectl describe pod cluster1-cassandra-datacenter-default-sts-1
# server-config-init:# Image: datastax/cass-config-builder:1.0.4-ubi7# Port: <none># Host Port: <none># Requests:# cpu: 512m# memory: 100m
init.yaml
apiVersion: storage.k8s.io/v1kind: StorageClassmetadata:
# Changing the name to server-storage is the only change we have made compared to upstreamname: server-storageprovisioner: rancher.io/local-pathvolumeBindingMode: WaitForFirstConsumerreclaimPolicy: Delete
In Step8, I expected that server-config-init should have the same resource request configurations as cr2_spec.yaml. As mentioned in #150, configBuilderResources, the resource configuration of server-config-init can be set by configBuilderResources. We updated the variable size in cr2_spec.yaml, and thus the number of Pod will increase from 1 to 2. The new Pod will trigger the init containers which run before app containers (k8s doc). Hence, at least the new Pod should reflect the new resource requirement specified in cr2_spec.yaml (memory: 200m, cpu: 1024m).
# Root cause
We did some investigation and we found the following maybe the possible root cause.
The function `ReconcileAllRacks` updated the Replica number of the StatefulSet (i.e. `size` in cr1_spec.yaml and cr2_spec.yaml) before updated the podTemplate in the StatefulSet. Hence, when we update the field `size` and `configBuilderResources` at the same time, the new Pod will be created with stale podTemplate.
To elaborate, the function `ReconcileAllRacks` updated the Replica number of the StatefulSet at [reconcile_racks.go#L2416](https://github.com/k8ssandra/cass-operator/blob/c9020efb832cbfad60142194659db230e1f6995d/pkg/reconciliation/reconcile_racks.go#L2416) and updated the podTemplate in the StatefulSet at [reconcile_racks.go#L2440](https://github.com/k8ssandra/cass-operator/blob/c9020efb832cbfad60142194659db230e1f6995d/pkg/reconciliation/reconcile_racks.go#L2440). In my opinion, we need to change the order of L2416 and L2440, and then the new Pod will be spawned with new podTemplate.
┆Issue is synchronized with this [Jira Task](https://k8ssandra.atlassian.net/browse/K8SSAND-1467) by [Unito](https://www.unito.io)
┆friendlyId: K8SSAND-1467
┆priority: Medium
The text was updated successfully, but these errors were encountered:
sync-by-unitobot
changed the title
The newly spawned pods do not use the up-to-date server-config-init under certain conditions
K8SSAND-1467 ⁃ The newly spawned pods do not use the up-to-date server-config-init under certain conditions
Apr 21, 2022
Swapping the lines you proposed would not work as then the reconcile would modify the existing pods while trying to scale (among other things). The scaling behavior is intended to use the previous observed state of the CassandraDatacenter. Otherwise you would run into weird state where changing the CassDc while scaling would, instead of finishing the scaling, start suddenly modifying and restarting the existing pods before continuing with the scaling (or down scaling / decommission etc).
So at this point, the behavior is as intended. Scaling up or down Cassandra is usually a time and resource consuming process, which probably shouldn't happen with large amount of rolling restarts to existing pods unless one wants to endanger the production.
What did you do?
As mentioned in #150,
configBuilderResources
, the resource configuration ofserver-config-init
can be set byconfigBuilderResources
. We realized that under certain conditions, the new pods do not necessarily use the most up-to-date resource configuration. For example, as shown in the steps below, when we scale up the cluster and change the resource configuration at the same time, the newly spawned pod does not use the updatedconfigBuilderResources
.We also found that even if we explicitly separate the
configBuilderResources
change and scale-up change into two steps, if there are pods that are not ready yet during the two operations, the statefulSet won't get updated immediately and the newly spawned pods still use the oldconfigBuilderResources
.init.yaml
cr1_spec.yaml
cr2_spec.yaml
(Updatesize
andconfigBuilderResources
)Did you expect to see some different?
In Step8, I expected that
server-config-init
should have the same resource request configurations ascr2_spec.yaml
. As mentioned in #150,configBuilderResources
, the resource configuration ofserver-config-init
can be set byconfigBuilderResources
. We updated the variablesize
incr2_spec.yaml
, and thus the number of Pod will increase from 1 to 2. The new Pod will trigger the init containers which run before app containers (k8s doc). Hence, at least the new Pod should reflect the new resource requirement specified incr2_spec.yaml
(memory: 200m, cpu: 1024m).Environment
Cass Operator version:
* Kubernetes version information:```docker.io/k8ssandra/cass-operator:v1.10.3
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:11:29Z", GoVersion:"go1.16.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:25:06Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
minikube start --vm-driver=docker --cpus 4 --memory 4096 --kubernetes-version v1.21.0
The text was updated successfully, but these errors were encountered: