Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use controller references to clean up DC components #1022

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

olim7t
Copy link
Contributor

@olim7t olim7t commented Jul 14, 2023

What this PR does:
Set an ownership reference to the CassandraDatacenter for remote resources. This avoids having to clean up manually.

Which issue(s) this PR fixes:
Fixes #961
Fixes #992

Checklist

  • Changes manually tested
  • Automated Tests added/updated
  • Documentation added/updated
  • CHANGELOG.md updated (not required for documentation PRs)
  • CLA Signed: DataStax CLA

@sonarcloud
Copy link

sonarcloud bot commented Jul 14, 2023

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
4.8% 4.8% Duplication

idea Catch issues before they fail your Quality Gate with our IDE extension sonarlint SonarLint

@olim7t olim7t force-pushed the refactor/cleanup-with-refs branch from fbd57a4 to 87babeb Compare September 7, 2024 00:05

if r.deleteServices(ctx, kc, dcTemplate, namespace, remoteClient, logger) {
hasErrors = true
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deployments and services cleanup was added for #980, but we changed the way Medusa is deployed since then, so this was already stale code.

@olim7t olim7t force-pushed the refactor/cleanup-with-refs branch 4 times, most recently from b138101 to a2ebd0e Compare September 12, 2024 02:03
@adejanovski
Copy link
Contributor

@olim7t , is this ready for review?

@olim7t olim7t force-pushed the refactor/cleanup-with-refs branch from a2ebd0e to fd1877e Compare September 12, 2024 23:02
@olim7t olim7t marked this pull request as ready for review September 12, 2024 23:08
@olim7t olim7t requested a review from a team as a code owner September 12, 2024 23:08
@olim7t
Copy link
Contributor Author

olim7t commented Sep 13, 2024

is this ready for review?

It is now. I've tested manually to confirm that all impacted resources are correctly cleaned up:

  • Stargate
  • Reaper
  • Medusa: ConfigMap and CronJob
  • -metrics-agent ConfigMap
  • -per-node-config ConfigMap
  • -cass-vector ConfigMap

I'm still seeing some secrets left behind after I delete a DC, for example <cluster-name>-medusa, <cluster-name>-reaper and <cluster-name>-reaper-ui.
But those are cluster-level secrets that get replicated, not DC-level. So they weren't cleaned up by the code that I removed. I think we should assign them an owner reference to the K8ssandraCluster (assuming the replicated-secret controller cleans up the replicated copies when they get deleted).

Copy link

@vincentcreusot vincentcreusot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty good
Been able to run unit tests, all pass, and test removing the CassandraDatacenter which removes the Stargate resource correctly.

@olim7t olim7t force-pushed the refactor/cleanup-with-refs branch from fd1877e to 616cf6d Compare September 24, 2024 00:31
Copy link

sonarcloud bot commented Sep 24, 2024

Copy link
Member

@Miles-Garnsey Miles-Garnsey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started doing a review on this before I noticed Vincent had already done some work on it. Had a few comments and suggestions, also one big architecture question around whether this is going to orphan resources at the worst possible time for users.

I'm still testing manually, had issues doing so on a multi-cluster setup (no log output to indicate an error but the k8ssandraCluster never reconciled to a CassDC), so I'm going to see if a single cluster setup works better.

controllers/k8ssandra/auth_test.go Show resolved Hide resolved
@@ -78,18 +76,63 @@ func createSingleDcClusterWithMetricsAgent(t *testing.T, ctx context.Context, f
t.Log("check that the datacenter was created")
dcKey := framework.ClusterKey{NamespacedName: types.NamespacedName{Namespace: namespace, Name: "dc1"}, K8sContext: f.DataPlaneContexts[0]}
require.Eventually(f.DatacenterExists(ctx, dcKey), timeout, interval)

t.Log("update datacenter status to ready")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: I don't love having so much indirection and would almost rather keep this logic inline. But there is a function that does this already that might be useful to you here.

func (r *K8ssandraClusterReconciler) deleteServices(
// setDcOwnership loads the remote resource identified by controlledKey, sets dc as its owner, and writes it back. If
// the remote resource does not exist, this is a no-op.
func setDcOwnership[T client.Object](
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Wouldn't you want to error when the resource isn't found? It seems likely that there could be plenty of situation where this might get called before a resource exists, given how asynchronous things can be in k8s.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually... This comment is just wrong, it does error, it isn't a no-op. Can you fix this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a no-op:

	if err := remoteClient.Get(ctx, controlledKey, controlled); err != nil {
		if errors.IsNotFound(err) {
			return result.Continue()
		}

I did it that way because it's not always trivial to determine if the resource is supposed to exist (based on the K8ssandraClusterSpec); we'd have to duplicate logic from the creation code.

All those resources are required for the CassandraDatacenter to start correctly, so I think we can safely assume that they exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reservations about this because if the resource name changes (e.g. due to cluster name overrides or some other weird piece of logic) we could easily end up in a situation where the resources aren't marked with an owner correctly, and this failure would happen silently.

// setupMedusaCleanup adds owner references to ensure that the remote resources created by reconcileMedusa are correctly
// cleaned up when the CassandraDatacenter is deleted. We do that in a second pass because the CassandraDatacenter did
// not exist yet at the time those resources were created.
func (r *K8ssandraClusterReconciler) setupMedusaCleanup(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: I have a slight concern here but I'm not sure if we can do anything about it without some deep changes. If the cassDC creation fails, then doesn't the running order here mean that deletion of the k8ssandraCluster will not result in cleanup of the descendant resources?

This seems like it'll cause some problems, because when a cassDC creation fails users will often want to blow away the k8ssandraCluster and start fresh. That'll orphan the medusa/Reaper resources at the moment AFAICT because they won't yet have the required ownership to the cassDC?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much effort would it be to ensure that Reaper and Medusa are created only after the CassDC? That should already be the case I thought, but maybe I'm wrong here.

If it isn't the case, there are workarounds for this that aren't too arduous, but they require a new type (like CassandraDatacenterRequest) to exist, which take ownership until the actual cassDC is created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the cassDC creation fails, then doesn't the running order here mean that deletion of the k8ssandraCluster will not result in cleanup of the descendant resources?

Correct, that is an issue.

How much effort would it be to ensure that Reaper and Medusa are created only after the CassDC?

That would help, but I don't think it can be done for all resources, some must exist before the CassandraDatacenter (e.g. ConfigMaps that we mount as volumes on the pod).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a great deal of effort (and it sounds like it might be...) can I suggest that we raise a refactoring ticket to implement a new kind: CassandraDatacenterRequest which is used as the owner until the real CassDC comes online? Then the Request object can be cleaned up and trigger the cleanups we need, ensuring that this happens even if the CassDC creation fails.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not introduce a new bogus object to deal with cleanups on failed CassDC creations.
The assumption is that whatever prevents the creation of the cassdc will get resolved through another apply operation to patch the original K8ssandraCluster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I'd describe this as a "bogus object", this is similar to the pattern followed by cert manager and k8s native APIs for cert issuance, where things might fail without any fault on the part of the object creator.

I'd rather not introduce a new bogus object to deal with cleanups on failed CassDC creations.
The assumption is that whatever prevents the creation of the cassdc will get resolved through another apply operation to patch the original K8ssandraCluster.

I don't understand why we'd make that assumption. There are too many things that could fail here (sometimes in non-recoverable ways). E.g. storage class can't be changed in an existing DC from memory - if the user then wants to delete the DC to re-create with a new storage class, that will always result in orphaned objects under the current design.

If we're fine with orphaning objects then we can leave it as is, but I think this is going to cause some level of pain down the track.

Name: fmt.Sprintf("%s-medusa", kc.SanitizedName()),
}
result := setDcOwnership(ctx, dc, configMapKey, &corev1.ConfigMap{}, remoteClient, r.Scheme, logger)
if result.Completed() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: This doesn't seem like the right use of Completed(). Actually, I try not to use Completed() at all - it isn't clear what it means because the callBackSoon implementation of ReconcileResult also returns Completed() == true.

I would use one of these:

	IsError() bool
	IsRequeue() bool
	IsDone() bool

If you have an IsError() or IsRequeue() then you should return out to bail on the reconcile loop and allow the controller to requeue it. Conversely, actually if it is IsDone() then you can continue the reconciliation.

Right now, I'm pretty sure you'll end up bailing early on requeues which isn't what you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conversely, actually if it is IsDone() then you can continue the reconciliation

No, I think what IsDone() means is "the current reconciliation loop is done", in other words: drop everything we're doing and exit the main reconciliation function without any requeues.

When we want to continue the reconciliation, we use Continue(), and Continue().Completed() is false. In fact it's the only implementation that returns false, so Completed() means "is not Continue()".

So I think the use of Completed() here is correct: we set the ownership of the ConfigMap;

  • if that succeeded, it returns Continue() and we proceed with the ownership of the CronJob
  • if that didn't succeed (it errored out, was requeued, or we decided the whole reconciliation loop was fulfilled), then we return without addressing the CronJob

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think what IsDone() means is "the current reconciliation loop is done", in other words: drop everything we're doing and exit the main reconciliation function without any requeues.

The thing is, the intention has never been documented, so whatever any of us think, it has been used in various ways throughout the codebase 😅

Because ReconcileObject is meant to be the core Reconciliation abstraction in the codebase, I think that's as good a reference as any. ReconcileObject does not function the way you're describing, it actually returns a result.Done() here just when the reconciliation is successful. If you bubble this unmodified back to the top of the stack here (which you probably should), then your function is going to terminate early if you follow your logic. You probably don't want this.

The issue I have with Completed() is that it is unclear what is completed. It doesn't seem to match the use of Done(), so that's problematic IMO.

The problem here is just that your usage isn't aligning with ReconcileObject from what I can see. If you think the ReconcileObject behaviour should change, that's fine (I don't have an opinion, and again, this has never been properly thrashed out), but we'd need to tackle that separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I have with Completed() is that it is unclear what is completed. It doesn't seem to match the use of Done(), so that's problematic IMO.

Done() and Completed() are different. Done() is indeed Completed also, but Error() or CallbackSoon() are not Done(), but are Completed().

Completed() means the current reconcile call is done as far as we can right now, but does not mean our entire reconciliation process is done. We might requeue through error or requeue request to process it again. Done() is exactly that, we will not return to that reconcile part without any external changes and this process will not requeue.

This process hasn't changed since 2020 from where it was copied from. The tests indicate that as well.

https://github.com/datastax/cass-operator/blob/v1.0.0/operator/internal/result/result_helper_test.go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think those tests are sufficient documentation to indicate that the reconcile.Result type is meant to work that way @burmanm. They are talking about "reconciling being complete" but that clearly isn't what Completed() is actually intended to mean. As you say, it is probably intended that "completed" here means that the current reconciliation run is completed, but reconciliation as a whole is not complete.

Again, what particular reconciliation action is done or completed, or needs to be requeued is not clear from any of this logic, nor the tests. And for ReconcileObject, Done() is used to indicate that the action was successful. So that's why I suggest we make things more explicit by avoiding the use of Completed() and instead returning when something explicitly needs to be requeued or has errored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what particular reconciliation action is done or completed [...] is not clear

As I understand it, the action under consideration is always the top-level reconciliation loop, for example K8ssandraClusterReconciler.Reconcile(). So if a component returns Done(), it should "bubble up" the stack: every caller stops what it was doing and immediately returns, until we reach Reconcile() and return ctrl.Result{} to the runtime.

@burmanm would you agree with that assessment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -172,3 +172,20 @@ func (r *K8ssandraClusterReconciler) reconcileUserProvidedPerNodeConfiguration(

return result.Continue()
}

// setupMedusaCleanup adds an owner reference to ensure that the remote ConfigMap created by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: comment doesn't match the function name.

controllers/k8ssandra/reaper.go Show resolved Hide resolved
@@ -254,7 +254,10 @@ func createMultiDcClusterWithControlPlaneReaper(t *testing.T, ctx context.Contex
Name: "reaper"},
}
t.Log("check that control plane reaper is created")
require.Eventually(f.ReaperExists(ctx, cpReaperKey), timeout, interval)
withReaper := f.NewWithReaper(ctx, cpReaperKey)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Can we have an assertion here to ensure that the right ownerReferences are present? Can we also ensure that all the other resources you're modifying with the ownerReference have a test somewhere in the envtests to ensure that's getting placed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is actually a bit of a special case, because the control plane Reaper is not owned by a DC.

All other cases go through f.ReaperExists(), which now has the ownership check baked in.

controllers/k8ssandra/stargate.go Show resolved Hide resolved
// not exist yet at the time those resources were created.
func (r *K8ssandraClusterReconciler) setupVectorCleanup(
ctx context.Context,
kc *k8ssandraapi.K8ssandraCluster,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: with these functions, it would be nice not to pass the whole chunk of kc state, and just pass a string instead. Since I think we're just grabbing the kc.SanitizedName() in all cases.

@Miles-Garnsey
Copy link
Member

I'm having trouble manually testing this, have tried a few times now.

I'm running:

make single-up && kubectl apply -f build/k8ssandra-cluster.yaml

But this doesn't reconcile to a cassDC - it just gives me:

Every 2.0s: kubectl --context kind-k8ssandra-0 get cassandradatacenters --all-namespaces                                                                                  Unknown-CJR0QR29NM: Tue Sep 24 12:37:16 2024

No resources found

and

2024-09-24T10:28:40.825Z	INFO	k8ssandracluster-webhook	validate K8ssandraCluster create	{"K8ssandraCluster": "test"}
W0924 10:34:24.982370       1 warnings.go:70] Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.
W0924 10:34:44.960130       1 warnings.go:70] Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.
W0924 10:35:04.961546       1 warnings.go:70] Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.
W0924 10:35:30.957564       1 warnings.go:70] Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.
W0924 10:36:23.959466       1 warnings.go:70] Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.
W0924 10:36:38.953898       1 warnings.go:70] Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.
2024-09-24T10:36:39.866Z	INFO	k8ssandracluster-webhook	validate K8ssandraCluster update	{"K8ssandraCluster": "test"}
W0924 10:36:53.953320       1 warnings.go:70] Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.

from the logs.

I'll try again with the same procedure on the latest release branch and see if it also has the same behaviour, my k8ssandraCluster might have some broken since I'm trying to turn on Reaper and Medusa both configured with medusa: {} and reaper: {} and that might not be supported.

@adejanovski
Copy link
Contributor

@Miles-Garnsey, you shouldn't be able to enable medusa by creating an empty object. It's going to require some properties to be defined.
Reaper will deploy fine with reaper: {} though.

@Miles-Garnsey
Copy link
Member

I tried a variety of things to test this, including removing the medusa: {} block, and also reverting to the simplest manifest I could think of (on main no less!)

I still see no progress towards rollout and no logs from the operator. Will continue trying to test tomorrow.

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: test
spec:
  cassandra:
    serverVersion: 4.0.4
    datacenters:
      - metadata:
          name: dc1
        size: 1
        storageConfig:
          cassandraDataVolumeClaimSpec:
            storageClassName: standard
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 5Gi
        config:
          jvmOptions:
            heapSize: 384Mi

@Miles-Garnsey
Copy link
Member

I can't figure out what I'm doing wrong here and am giving up on manually testing this. I'm running:

make single-up && \
kubectl apply -f test/testdata/fixtures/single-dc/k8ssandra.yaml

This is on main and involves a cluster definition directly used in the e2e tests. Somehow I still can't get a cluster up. The last log line I see from k8ssandra-operator is 2024-09-25T14:28:29.609Z INFO k8ssandracluster-webhook validate K8ssandraCluster create {"K8ssandraCluster": "test"}

No errors are logged, so I'm not sure what's causing the blockage. Obviously whatever test scripts I had working previously no longer work, or something is broken in the makefile on main.

I'll leave my comments as they are, since I think there are some questions that we need to answer that entail either some changes in this PR or some follow up refactoring tickets, since as things stand I think we're going to end up orphaning resources quite often.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor K8ssandraCluster deletion cleanup operation Refactor cleanup
5 participants