K8SSAND-1349 ⁃ Hostname lookups on Cassandra pods fail #304

jsanda · 2022-03-24T13:39:46Z

What happened?
I created a CassandraDatacenter with cluster name of test and no racks and 3 C* nodes. This results in the following pods:

test-dc1-default-sts-0
test-dc1-default-sts-1
test-dc1-default-sts-2

As per the Kubernetes docs there should be a DNS record for each pod such that the following should be resolvable (assuming the CassandraDatacenter is created in cass-operator namespace),

test-dc1-default-sts-0.test-dc1-all-pods-service.cass-operator.svc.cluster.local

Note that test-dc1-all-pods-service is the name of the headless service with which cass-operator configures the StatefulSet. Specifically, cass-operator sets the StatefulSet.Spec.ServiceName field to the name of the all-pods service.

The DNS lookup fails because the StatefulSet.Spec.ServiceName is set to the empty string.

Did you expect to see something different?
The StatefulSet.Spec.ServiceName property should be set to the all-pods service name then hostname lookups will work.

How to reproduce it (as minimally and precisely as possible):
Create a CassandraDatacenter, e.g.,

apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: dc1
spec:
  clusterName: test
  config:
    jvm-server-options:
      initial_heap_size: 512M
      max_heap_size: 512M
  serverType: cassandra
  serverVersion: 4.0.3
  size: 1
  storageConfig:
    cassandraDataVolumeClaimSpec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi
      storageClassName: standard

Wait for the operator to create the StatefulSet and then see that the ServiceName property is not set.

Environment

Cass Operator version:

I believe this affects versions 1.7.1 to 1.10.1

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-1349
┆priority: Medium

The text was updated successfully, but these errors were encountered:

jsanda · 2022-03-28T04:24:02Z

I am reopening this since #305 does not fix this for upgrade scenarios. With the changes in my PR the operator will try to update the ServiceName property of the StatefulSet which fails with an error message like this:

1.6484380208310935e+09 ERROR controllers.CassandraDatacenter.cassandradatacenter_controller.controller.cassandradatacenter-controller Reconciler error {"reconciler group": "cassandra.datastax.com", "reconciler kind": "CassandraDatacenter", "name": "dc1", "namespace": "test-upgrade-operator", "error": "StatefulSet.apps \"cluster1-dc1-r1-sts\" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden"}

Since the ServiceName property is immutable, we have to recreate the StatefulSet. Simply deleting the StatefulSet will cause the operator to recreate it with ServiceName set correctly but it will also delete the Cassandra pods which is undesirable. There is a work around for this.

We remove the StatefulSet owner reference from all Cassandra pods and replace it with an owner reference to the CassandraDatacenter. Then we delete the StatefulSet, and the pods remain intact (Note that PVCs remain intact). After deleting the StatefulSet, we then remove the CassandraDatacenter owner reference from the pods so that when we recreate the StatefulSet the pods end up with the correct owner references. This additional logic needs to be done in the CheckRackCreation method.

burmanm · 2022-03-28T09:09:53Z

Similar to #103

burmanm · 2022-03-28T09:27:50Z

Unless you started working on this, I could grab this. There are also potential pre-1.7.1 users who have old ServiceName, thus we might want to actually get rid of that if statement and simply update the StS in any case.

Also, we need to check in the upgrade_operator test then that we don't accidentially now delete the cluster completely as the test currently would probably pass if the pods are deleted and the newly-created StS creates new pods. That's not acceptable.

jsanda · 2022-03-29T16:34:13Z

I did some more testing and want to note a couple things. First, it is unnecessary to add an owner reference to the pods for the CassandraDatacenter. It is sufficient to simply remove the StatefulSet owner reference. This is what happens with kubectl delete --cascade=orphan.

Secondly, adding back an owner reference to the new StatefulSet causes the pods to be recreated. I spent some time reviewing the StatefulSet controller code to see if I could track down what causes the update, but I came up short. While it would be nice to avoid a rolling restart of Cassandra, its manageable.

jsanda added the bug Something isn't working label Mar 24, 2022

jsanda self-assigned this Mar 24, 2022

jsanda mentioned this issue Mar 24, 2022

Set the ServiceName for new StatefulSets #305

Merged

5 tasks

sync-by-unito bot changed the title ~~Hostname lookups on Cassandra pods fail~~ K8SSAND-1349 ⁃ Hostname lookups on Cassandra pods fail Mar 24, 2022

burmanm closed this as completed in #305 Mar 24, 2022

jsanda reopened this Mar 28, 2022

jsanda assigned burmanm Mar 28, 2022

adejanovski added the zh:To-Do label Mar 30, 2022

burmanm mentioned this issue Apr 1, 2022

Recreate StatefulSet if required for updates to StS #309

Merged

5 tasks

burmanm closed this as completed in #309 Apr 5, 2022

jsanda mentioned this issue Apr 13, 2022

K8SSAND-1453 ⁃ Upgrade to cass-operator 1.10.3 (or later) k8ssandra/k8ssandra-operator#520

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8SSAND-1349 ⁃ Hostname lookups on Cassandra pods fail #304

K8SSAND-1349 ⁃ Hostname lookups on Cassandra pods fail #304

jsanda commented Mar 24, 2022 •

edited by sync-by-unito bot

Loading

jsanda commented Mar 28, 2022

burmanm commented Mar 28, 2022

burmanm commented Mar 28, 2022

jsanda commented Mar 29, 2022

K8SSAND-1349 ⁃ Hostname lookups on Cassandra pods fail #304

K8SSAND-1349 ⁃ Hostname lookups on Cassandra pods fail #304

Comments

jsanda commented Mar 24, 2022 • edited by sync-by-unito bot Loading

jsanda commented Mar 28, 2022

burmanm commented Mar 28, 2022

burmanm commented Mar 28, 2022

jsanda commented Mar 29, 2022

jsanda commented Mar 24, 2022 •

edited by sync-by-unito bot

Loading