-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deleting a single cluster causes other non-affected clusters to abort all PODs & restart #615
Comments
It may be related to the fact that we have 2 C* clusters in the same k8s namespace across 2 k8s clusters but we have named the datacenters the same way - primary and secondary. This resulted in a configuration overlap as cassandra DC name is the same per namespace hence operator was trying to mess with the same DC name across 2 clusters... |
@andrey-dubnik can you please provide the K8ssandraCluster manifests so we can test and try to reproduce? |
Sure - here is the template below, just use the same template to create cluster1 and after 1 is online create cluster2. I have tested the behaviour with a different DC names for cluster1 and cluster2 and there is no mass restart. I suspect this is down to both clusters pointing to the same DC name
|
@andrey-dubnik Thanks for sharing your spec. It was easy enough to reproduce the behavior. You are 100% correct that there is a naming collision. We need to adopt a naming convention for the CassandraDatacenters like we do with other resources which is to prefix their names with the K8ssandraCluster name. For your cluster, we would wind up with a CassandraDatacenter named The DC name is specified in We want Cassandra to recognize and store the DC name as |
Here is a more scaled down example that I used to reproduce on my local kind cluster. apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
name: tes1
spec:
cassandra:
serverVersion: "4.0.3"
storageConfig:
cassandraDataVolumeClaimSpec:
storageClassName: standard
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
config:
jvmOptions:
heapSize: 1Gi
networking:
hostNetwork: true
datacenters:
- metadata:
name: dc1
size: 1 and apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
name: tes2
spec:
cassandra:
serverVersion: "4.0.3"
storageConfig:
cassandraDataVolumeClaimSpec:
storageClassName: standard
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
config:
jvmOptions:
heapSize: 1Gi
networking:
hostNetwork: true
datacenters:
- metadata:
name: dc1
size: 1 |
Some extra care will have to be take to deal with existing K8ssandraClusters to avoid downtime of C* and data loss. During the reconciliation loop the K8ssandraCluster controller fetches the CassandraDatacenter. The CassandraDatacenter is created if it isn't found. That logic will need to be updated to look up by the old name as well if the CassandraDatacenter isn't found with the new naming scheme. If we find it with the old name, we want to recreate it with the new naming format. We can do a non-cascading delete and then recreate it. This way the StatefulSets, pods, etc. will remain intact. |
I think the real fix requires some additional steps also. The references (if not Kubernetes controller reference, then something in the Status of K8ssandraCluster) should always correctly point to the objects that have been created (including the cluster it resides in). Not just rely on the naming ideology and hoping they won't clash, but it could even be as far as generating the names with uuid. As for "non-cascading delete", that step requires uninstalling cass-operator without removing CRDs before removing the CassandraDatacenter. Otherwise, cass-operator will try to delete the underlying PVCs and remove secret annotations and if it fails, it will hold on to the finalizer and not allow deletion of CassDc. Also, we might want to check on the "created if not found" policy. If the resource is found which we expect to be created, perhaps we should abort at that point? Otherwise K8ssandraCluster could overwrite an existing CassDc elsewhere as well. |
We cannot rely on controller references since the objects we're dealing with can span across multiple Kubernetes clusters. While I am not a huge fan of the datacenters map we currently have in the status (since it just copies the CassandraDatacenters' status verbatim), it does specify each CassandraDatacenter. I'm not sure why we would also need to store the cluster since we can get that from the spec. I'd say that prefixing the name of the CassandraDatacenter with the name of the K8ssandraCluster or the Cassandra cluster is more than simply hoping that they won't clash. It will prevent collisions in within a namespace-scoped deployment of the operator, even for multi-cluster. It should be sufficient for cluster-scoped deployments of the operator as well. This is the approach taken for other child resources. My suggestion about the non-cascading delete had a slight oversight 😅 What about adding an annotation to the CassandraDatacenter that tells cass-operator it is a non-cascading delete? I'm not sure about aborting if the resource is found when we expect to create it. We always check first to see if the object exists, and then create if it's not found. We don't do it the other way around, attempting to create the object first. |
➤ Michael Burman commented: I think the real fix requires some additional steps also. The references (if not Kubernetes controller reference, then something in the Status of K8ssandraCluster) should always correctly point to the objects that have been created (including the cluster it resides in). Not just rely on the naming ideology and hoping they won't clash, but it could even be as far as generating the names with uuid. As for "non-cascading delete", that step requires uninstalling cass-operator without removing CRDs before removing the CassandraDatacenter. Otherwise, cass-operator will try to delete the underlying PVCs and remove secret annotations and if it fails, it will hold on to the finalizer and not allow deletion of CassDc. Also, we might want to check on the "created if not found" policy. If the resource is found which we expect to be created, perhaps we should abort at that point? Otherwise K8ssandraCluster could overwrite an existing CassDc elsewhere as well. |
This is a stale ticket (there's been name overrides for a while), so I'm closing despite the syncing solution waking this up. |
What happened?
Hi,
We have 2 clusters running associated with a single k8s namespace. We have dropped one cluster which resulted in another cluster to abort all the PODs at the same time and restart triggering a complete loss of service.
Did you expect to see something different?
We do not expect the cluster which was not affected by the delete operation to get affected at all.
How to reproduce it (as minimally and precisely as possible):
Create 2 clusters, same namespace (if relevant)
Delete one of the clusters
Remaining cluster will terminate all PODs and they will go into the restart mode
Environment
┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: K8OP-189
The text was updated successfully, but these errors were encountered: