prune should respect dependsOn in a reverse order #301

paschdan · 2021-03-22T19:42:12Z

When 2 kustomizations exists, where one dependsOn the other, then the pruning should happen in reversed ordering:

Usecase:

kustomization operator contains CRD and an operator for this CRD
kustomization usage contains a CR from the CRD, where a finalizer from the operator is existing

# operator.kustomization.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
kind: Kustomization
metadata:
  name: operator
  namespace: flux-system
spec:
  interval: 10m0s
  sourceRef:
    kind: GitRepository
    name: flux-system
  path: ./apps/operator
  prune: true
  validation: client

# usage.kustomization.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
kind: Kustomization
metadata:
  name: usage
  namespace: flux-system
spec:
  interval: 10m0s
  sourceRef:
    kind: GitRepository
    name: flux-system
  path: ./apps/usage
  dependsOn:
    - operator 
  prune: true
  validation: client

When then a commit is existing, where usage.kustomization.yaml and operator.kustomization.yaml is deleted, the garbage collection should respect the dependsOn in a reverse order:

first the usage.kustomization.yaml should be deleted (with e.g. a namespace, which when pruned deletes all custom resources, and thus as the operator.kustomization.yaml is not deleted at this point in time finalizers can be applied)
and then it can delete the operator.kustomization.yaml, as no depending kustomization is there anymore.

currently the operator.kustomization.yaml and the usage.kustomization.yaml will be deleted, which may result in not finishable finalizers on resources in usage.kustomization.yaml because the operator may be gone.

The text was updated successfully, but these errors were encountered:

stealthybox · 2021-03-25T23:27:30Z

Succinctly, I think the logic here is:
On delete, in the Kustomization finalizer, check the in-controller dependency graph to see if I have any children.
If so, block/requeue indefinitely until there are no children + maybe add a status condition showing that we are waiting for dependent children to be deleted.

This specifically needs to be done in the finalizer for the Kustomization that's being dependent on.

stefanprodan · 2021-03-26T09:13:29Z

@stealthybox we could update the Ready condition to make the deadlocks visible to users as the ready condition is showed on flux/kubectl get.

stealthybox · 2021-03-27T04:30:22Z

Since this Finalizer behavior depends on the lifecycle of other objects, it could cause a state where an administrator or parent team's repository becomes undesirably unreconcilable, simply because of another Team's repo.

This would be improved with the fluxcd/flux2#582 proposal however which would provide an RBAC solution to revoking access to dependsOn!!

We should consider what the workflow or options could be for a team to forcefully indicate that their Kustomization should be deleted regardless of the consequences on dependent Kustomizations.

Perhaps we don't want to support that.

Option B

When a Kustomization contains a CRD, make the Finalizer block until there are no more dependent CR's in the entire cluster.

This seems reasonable but might be hard to query for across all Namespaces. Maybe there's a performant mechanism?

Then a simple Boolean could opt-out, specifically for CRD's

This doesn't cover other types of dependencies.

Technically, it's possible (but complicated) for both of these behaviors to be implemented.

stealthybox · 2021-06-30T17:31:47Z

Regarding general Kubernetes objects:
It's possible to describe objects that are dependent on each other in Kubernetes atomically, because reconcile loops will eventually converge as dependencies are realized.

However, it is very difficult to atomically delete these resources, because deleting in the wrong order can cause issues.

fluxcd/helm-controller#270 explores some strategies to manage these needs using annotations, Finalizers, and Admission Control.

I suspect that this is a limitation of Kubernetes API design, since dependencies are not required information.
Inferring every kind of dependency sounds intractable.
We will likely need mechanisms for users to annotate objects and opt into this behavior if they want atomic deletes that respect some sort of order.

For Kustomizations and HelmReleases, we have dependsOn which is quality information.
There may be some heuristic things we can do (such as GVK-delete-order or simple lookups) for common cases regarding sub-resources.

Perhaps there is an already existing community component that can enforce ordered deletes between arbitrary in a generic, non-flux specific way.
If we need build a solution to this, it could become independently valuable.

stealthybox · 2021-07-01T15:47:18Z

Client-side CLI removal using the user's privileged credentials could help users reconcile undesired removal situations.

artem-nefedov · 2021-09-08T22:21:13Z

This caused problems for me as well. Had to write custom logic in wrapper script to handle deletion correctly (which kinda contradicts gitops deployment principle).

stefanprodan · 2021-09-09T07:02:59Z

This caused problems for me as well. Had to write custom logic in wrapper script to handle deletion correctly (which kinda contradicts gitops deployment principle).

You can do this via Git, delete the dependants first and commit the changes, Flux will finalise them accordingly. Then delete the top Kustomization and commit it.

artem-nefedov · 2021-09-09T07:24:53Z

@stefanprodan My case is: We do blue/green k8s cluster upgrades. Instead of upgrading k8s cluster in-place, we bring up second cluster with the same apps deployed, switch traffic to it, then delete the original cluster, thus ensuring safe upgrades with zero downtime and ability to instantly roll back or abort upgrade in case of problems. GitOps controller (flux) provides a way to ensure that new clusters has the same apps (both clusters monitor the same git repository), so deleting by git commits is not possible.

What actually causes problem: We install Istio operator controller with one Kustomization, then install IstioOperator resource with another Kustomization that depends on first one. Creation works fine, but deletion ignores the order and deletes operator controller before Istio control plane/ingress is deleted, which causes deletion of Istio to hand indefinitely. And we cannot just ignore Istio and delete the cluster anyways, because Istio creates a service of LoadBalancer type, which creates an actual load balancer, which will be orphaned if not deleted in cluster correctly.

Current workaround: patch entrypoint Kustomization in old cluster to exclude IstioOperator resource Kustomization, wait till Istio operator controller actually deletes Istio, then delete entrypoint Kustomization.

artem-nefedov · 2023-07-24T10:24:57Z

Are there any plans to get this worked on?
Weave tf-controller implements dependency handling in a very similar fashion, and it does in fact handles deletion in the correct order, which makes me think it should be possible to do in kustomize-controller too.

sgerace · 2023-10-06T14:41:06Z

Has anyone developed any workarounds for this behavior? We're running into the same issue when trying to cleanup resources deployed within a vcluster. vcluster creates a secret in the cluster that is used in subsequent HelmReleases to deploy things into the vcluster, similar to the following:

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: example-release
  namespace: example-namespace
spec:
  interval: 5m0s
  dependsOn:
    - name: vcluster
  targetNamespace: default
  storageNamespace: default
  kubeConfig:
    secretRef:
      name: vc-vcluster
      key: config
  chart:
    spec:
      ...

Notice the kubeConfig.secretRef that references the secret created by vcluster. Creation works just fine, the issue is that when the flux configuration is destroyed, it seems that the vs-vcluster secret is often destroyed before the example-release is destroyed. Once the secret is gone, the reconciler is unable to connect to the vcluster (because it no longer exists) and thus any attempt to determine if example-release exists will fail. This places the example-release HelmRelease in an unrecoverable state and requires someone to manually go into the HelmRelease and remove the finalizer added by flux.

It is worth noting that in our case, first removing all HelmReleases except for the vcluster and then removing the flux configuration (as proposed by @stefanprodan) is really inconvenient due to the amount of automation that we in place, so we'd prefer to have the resources cleaned up properly by simply deleting the flux configuration.

artem-nefedov · 2023-10-06T14:49:12Z

@sgerace We made a workaround, but it's not really suitable for general case. In our case, Kustomizations are installed from HelmRelease (for reasons unrelated to this issue), and we made a custom pre-delete helm hook to handle deletion in proper order.

sgerace · 2023-10-06T15:00:42Z

@artem-nefedov, that's interesting, I'm wondering if you'd be willing to share any of the details around the pre-delete helm hook that you've developed? That's the direction we've started investigating, so any information you might be able to provide would help us come up with a suitable workaround for our situation.

artem-nefedov · 2023-10-06T17:33:23Z

@sgerace We have multiple Kustomizations installed from a single helm chart via HelmRelease. The chart has a pre-delete hook Job (meaning it runs before any actual object is deleted by helm). The logic inside the Job is pretty simple: it deletes some Kustomizations that should be deleted first based on label selector, then sleeps for some time and exits. After that, remaining Kustomizations are deleted by helm itself.

Unrelated to the topic, but also worth mentioning that we also have post-install and post-upgrade hooks that wait for successful reconciliation of Kustomizations, because helm-controller can't perform health checks on custom resources (even ones that are part of flux suite). This can be circumvented if you instead define healthChecks in Kustomization that creates HelmRelease.

beasteers · 2024-04-01T18:27:02Z

I wonder if a temporary workaround would be to support a manual depends-on annotation for child resources and the reconciler could just do a quick topological sort before deleting. Even an (opt-in?) topological sort at the level of kustomizations could be enough to fix this bug. At least assuming the garbage collector could accumulate all of the resources across kustomizations before pruning.

stealthybox mentioned this issue Jun 29, 2021

Unable to delete namespace for HelmReleases impersonating ServiceAccount fluxcd/helm-controller#270

Closed

hiddeco mentioned this issue Jul 12, 2021

In-order cleanup is required for tools like rook fluxcd/flux2#1614

Closed

kingdonb mentioned this issue Aug 26, 2021

HelmRelease DependsOn is Not Considered for Delete Operations fluxcd/flux2#1744

Open

1 task

ghost mentioned this issue Aug 17, 2022

Garbage collection concurrency issue #711

Open

robinkb mentioned this issue Feb 25, 2024

[Proposal] Multi-step delete stefanprodan/timoni#351

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prune should respect dependsOn in a reverse order #301

prune should respect dependsOn in a reverse order #301

paschdan commented Mar 22, 2021

stealthybox commented Mar 25, 2021

stefanprodan commented Mar 26, 2021 •

edited

Loading

stealthybox commented Mar 27, 2021 •

edited

Loading

stealthybox commented Jun 30, 2021

stealthybox commented Jul 1, 2021

artem-nefedov commented Sep 8, 2021

stefanprodan commented Sep 9, 2021

artem-nefedov commented Sep 9, 2021

artem-nefedov commented Jul 24, 2023

sgerace commented Oct 6, 2023 •

edited

Loading

artem-nefedov commented Oct 6, 2023

sgerace commented Oct 6, 2023

artem-nefedov commented Oct 6, 2023 •

edited

Loading

beasteers commented Apr 1, 2024 •

edited

Loading

prune should respect dependsOn in a reverse order #301

prune should respect dependsOn in a reverse order #301

Comments

paschdan commented Mar 22, 2021

stealthybox commented Mar 25, 2021

stefanprodan commented Mar 26, 2021 • edited Loading

stealthybox commented Mar 27, 2021 • edited Loading

stealthybox commented Jun 30, 2021

stealthybox commented Jul 1, 2021

artem-nefedov commented Sep 8, 2021

stefanprodan commented Sep 9, 2021

artem-nefedov commented Sep 9, 2021

artem-nefedov commented Jul 24, 2023

sgerace commented Oct 6, 2023 • edited Loading

artem-nefedov commented Oct 6, 2023

sgerace commented Oct 6, 2023

artem-nefedov commented Oct 6, 2023 • edited Loading

beasteers commented Apr 1, 2024 • edited Loading

stefanprodan commented Mar 26, 2021 •

edited

Loading

stealthybox commented Mar 27, 2021 •

edited

Loading

sgerace commented Oct 6, 2023 •

edited

Loading

artem-nefedov commented Oct 6, 2023 •

edited

Loading

beasteers commented Apr 1, 2024 •

edited

Loading