Re-define the scope of clusterctl move #3354

fabriziopandini · 2020-07-16T12:41:17Z

Detailed Description

Clusterctl move was introduced in v0.3 as an evolution of pivot command but with a smaller scope that the original command.

The moving process was designed around the bootstrap use case, and it relies on three core ideas:

identify all the object candidate for move, using as a starting point the list of CRDs installed by clusterctl + Secrets & ConfigMaps.
use the ownerRef chain to filter out the exact list of objects to be moved, using the Cluster object as a root.
use the ownerRef chain to define the order for creating and deleting objects.

However, recently a set of new requirements are being presented, so, considering also the discussions about v1alpha4 roadmap, it is now the right time for considering if the scope of this command should be re-defined.

Following topics should be addressed IMO:

What to include in a move operation

The initial design was conceived in order to move all the Clusters in a namespace.
Recently this was extended for including ClusterResourseSet as a root (#3243), there is a WIP PR for identity principals (#3254) and a PR for moving objects which are not related to Cluster API provider (#3337)

Should we definitely drop the idea to support moving a single Cluster, given the recent changes?
Should we support move Cluster in all namespaces in a single operation?
What is the contract around moving global objects (not in a namespace)? What happens if they are used by two namespaces? Should we leave them in the original cluster? What happen if they get moved two times?
What is the contract around moving objects which are not related to Cluster API provider? What happens if they are used by two namespaces? Should we leave them in the original cluster? What happens if we are moving an object while it is being reconciled?
Should we continue to use the ownerRef chain to identify the exact list of objects to be moved or find a more generic mechanism that does not requires a code change every time a new object gets into the scope? If not how can we exclude e.g. Secrets or ConfigMaps not linked to any cluster?
How can we ensure that the entire tree of objects which are not related to Cluster API provider is moved?
Should we include moving providers in the scope?

What use case are we covering

Should we support move after the initial bootstrap? What are the different requirements for this use case?

/kind feature

@wfernandes @vincepri

detiber · 2020-07-16T15:03:48Z

Detailed Description

Clusterctl move was introduced in v0.3 as an evolution of pivot command but with a smaller scope that the original command.

The moving process was designed around the bootstrap use case, and it relies on three core ideas:

identify all the object candidate for move, using as a starting point the list of CRDs installed by clusterctl + Secrets & ConfigMaps.

use the ownerRef chain to filter out the exact list of objects to be moved, using the Cluster object as a root.

use the ownerRef chain to define the order for creating and deleting objects.

However, recently a set of new requirements are being presented, so, considering also the discussions about v1alpha4 roadmap, it is now the right time for considering if the scope of this command should be re-defined.

Following topics should be addressed IMO:

What to include in a move operation

The initial design was conceived in order to move all the Clusters in a namespace.
Recently this was extended for including ClusterResourseSet as a root (#3243), there is a WIP PR for identity principals (#3254) and a PR for moving objects which are not related to Cluster API provider (#3337)

Should we definitely drop the idea to support moving a single Cluster, given the recent changes?

I could potentially see a two different use cases for move:

I want to move management of this single Cluster to a different management cluster
- not completely sure this is needed
I want to move management of all Clusters to a different management cluster
- might want to limit to a single namespace if using namespaces for multi-tenancy
- applies to "pivot" use case for bootstrapping
- applies to the "pivot" use case for deleting a management cluster
- applies to the general use case of wanting to migrate where the management cluster is running

Should we support move Cluster in all namespaces in a single operation?

I think so, otherwise we are deferring the complexity of doing so onto the user.

What is the contract around moving global objects (not in a namespace)? What happens if they are used by two namespaces? Should we leave them in the original cluster? What happen if they get moved two times?

If we do not move them for the user, then we should block operations if they do not already exist, otherwise we will create a situation where the management cluster behaves very differently pre and post move.

If we do decide that we should delete them as part of the move, we should probably only do so after they are no longer in use by any Clusters that remain in the source management cluster. Ideally I'd like to rely on ownerReferences for this, but I don't think we can since they do not have a Namespace field. We also cannot rely on finalizers since an owning controller would not be blocked on performing reconciliation of deletion. Which leaves two options:

Make it part of the contract that any controllers for global resources that should be moved block deletion if any Clusters they apply to still exist (pushes the responsibility on external tooling to solve the problem, some of which may not know anything about Cluster API)
Somehow add logic into clusterctl move to do the right thing and only delete if nothing is referencing it (will be difficult since there is no way to necessarily know how global resources are linked to Clusters, at least for global resources that are not defined in-tree for Cluster API)

Given the above, I think we should probably support deletion for resources that we explicitly know about (defined in-tree for Cluster API), and defer handling of other global resources to provider (or integration) specific post-move instructions.

It might be good to start thinking longer term if we could support some of this through a plugin mechanism, though. Similar to the provider-specific pre-requisites, it would be nice to have a way to automate some of these provider-specific steps, but we should not necessarily own the implementation of them.

What is the contract around moving objects which are not related to Cluster API provider? What happens if they are used by two namespaces? Should we leave them in the original cluster? What happens if we are moving an object while it is being reconciled?

We probably need to allow for providers/integrations to define pre (and post) steps needed around the use of clusterctl move.

That way they can specify any pre actions needed to "pause" reconciliation during the move and any "post" actions needed to resume reconciliation and possibly clean up resources left behind.

Should we continue to use the ownerRef chain to identify the exact list of objects to be moved or find a more generic mechanism that does not requires a code change every time a new object gets into the scope? If not how can we exclude e.g. Secrets or ConfigMaps not linked to any cluster?

I think use of the ownerRef chain is good for any resources that are namespace scoped and exist in the same namespace, but it definitely breaks down for global and cross-namespace references.

I don't think we should move away from the use of ownerRef, but we should likely explore an alternative that could be used in a common way for any global resources.

I'm not sure that we should support cross-namespace references unless there is a specific use case that is defined that we agree to support, though.

How can we ensure that the entire tree of objects which are not related to Cluster API provider is moved?

I don't necessarily think that we can. We can define a convention for supported migration, but can likely leave the rest to pre/post steps needed for a specific provider/integration which may need it. In those cases, I'm wondering if it would help to add a flag to the move command to skip unpausing of the resources until the post steps can be done. We might need a pause/unpause command that can be run after any post steps are run to resume reconciliation of resources, though.

Should we include moving providers in the scope?

I don't think we should support moving the provider controllers, but to the extent that we can we should move provider resources.

Alternatively, going back to the idea of a plugin system, we could define provider-specific hooks that a provider can optionally implement, which would allow them to handle their resources in the appropriate ways.

What use case are we covering

Should we support move after the initial bootstrap? What are the different requirements for this use case?

There are two that I can see:

management cluster deletion, since not supporting this would result in orphaned resources related to the self-managed Cluster
I want to move management of my clusters to another cluster for
- This could be that the hardware the management cluster is running on is being decommissioned or they just want to change where the management cluster is running (previously on a machine-based infrastructure, but decided to host the management cluster on a managed provider for example)

/kind feature

@wfernandes @vincepri

fejta-bot · 2020-10-14T15:28:03Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fabriziopandini · 2020-10-14T15:33:24Z

/lifecycle frozen

fabriziopandini · 2020-11-12T22:09:17Z

/area clusterctl

voor · 2021-01-27T20:15:21Z

When we create clusters we use an external tool kapp which creates ConfigMaps linked to the deployment of the Cluster CRs, being able to provide additional ConfigMaps (which might reside in another namespace) would be convenient and helpful for this use case.
Sometimes we need to organize clusters in different namespaces from where they were initially created, moving between namespaces would be helpful for this.

wfernandes · 2021-01-28T17:45:33Z

@voor Would it be possible to mark external objects like those ConfigMaps with the clusterctl.cluster.x-k8s.io/move label?

See https://cluster-api.sigs.k8s.io/clusterctl/provider-contract.html?highlight=move#move for more info

vincepri · 2021-02-19T18:00:54Z

@fabriziopandini Do we still need this issue?

fabriziopandini · 2021-02-24T16:16:13Z

Do we still need this issue?

IMO yes, at least until we can derive some AI from this discussion...

sedefsavas · 2021-03-25T20:48:52Z

Moving global objects are needed for CAPA multi-tenancy and we need to backport it to v0.3 as we will support multitenancy in the next release.

CAPZ also needs this support in v0.3 releases IIRC @nader-ziada

fabriziopandini · 2021-03-26T10:27:40Z

@sedefsavas I suggest to open a separated issue for the specific problem of moving global objects so we can rally on details:

What is a generic procedure (that works across providers) that allows to detect which global objects to move?
What is the contract around moving global objects?
- What happen if they already exists in the target cluster?
- Should we leave them in the original cluster given that they might be used by objects in other namespaces?

fabriziopandini · 2021-05-16T21:05:18Z

@vincepri I'm +1 for closing this issue given that we have an influx of activities on more detailed use cases:

move for multi-tenancy credentials (Adapt clusterctl move to the new multi-tenancy model #3042)
move single cluster (✨ Add the ability to specify name of cluster to move #4605)
backup restore (clusterctl backup/restore #3441)

vincepri · 2021-05-17T16:21:29Z

/close

k8s-ci-robot · 2021-05-17T16:21:35Z

@vincepri: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 16, 2020

ncdc added this to the v0.4.0 milestone Jul 16, 2020

nader-ziada mentioned this issue Jul 21, 2020

🏃add draft of multi-tenancy proposal kubernetes-sigs/cluster-api-provider-azure#809

Merged

3 tasks

fabriziopandini mentioned this issue Aug 3, 2020

✨ support ability to move custom objects in clusterctl #3337

Merged

vincepri mentioned this issue Sep 24, 2020

clusterctl backup/restore #3441

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 14, 2020

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 14, 2020

k8s-ci-robot added the area/clusterctl Issues or PRs related to clusterctl label Nov 12, 2020

fabriziopandini mentioned this issue Nov 20, 2020

clusterctl move drops external resources #3933

Closed

vincepri mentioned this issue Dec 18, 2020

📖 Add CAPI Provider Operator CAEP #3833

Merged

gab-satchi mentioned this issue Mar 25, 2021

First draft of multi-tenancy proposal kubernetes-sigs/cluster-api-provider-vsphere#1149

Merged

bvberg mentioned this issue May 13, 2021

✨ Add the ability to specify name of cluster to move #4605

Closed

k8s-ci-robot closed this as completed May 17, 2021

fabriziopandini mentioned this issue Jul 4, 2021

🐛 clusterctl move preflight check can fail even if the target cluster is up and running #4870

Closed

fabriziopandini mentioned this issue Aug 26, 2022

Allow to move part of all managed clusters to another management cluster #7061

Closed

fabriziopandini mentioned this issue Dec 27, 2022

Allow Cluster API to stop controlling my workload cluster #7800

Closed

fabriziopandini mentioned this issue Jan 19, 2024

clusterctl move should be able to handle multiple clusters being in the same namespace #9705

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-define the scope of clusterctl move #3354

Re-define the scope of clusterctl move #3354

fabriziopandini commented Jul 16, 2020

detiber commented Jul 16, 2020

What to include in a move operation

What use case are we covering

fejta-bot commented Oct 14, 2020

fabriziopandini commented Oct 14, 2020

fabriziopandini commented Nov 12, 2020

voor commented Jan 27, 2021

wfernandes commented Jan 28, 2021

vincepri commented Feb 19, 2021

fabriziopandini commented Feb 24, 2021

sedefsavas commented Mar 25, 2021

fabriziopandini commented Mar 26, 2021

fabriziopandini commented May 16, 2021

vincepri commented May 17, 2021

k8s-ci-robot commented May 17, 2021

Re-define the scope of clusterctl move #3354

Re-define the scope of clusterctl move #3354

Comments

fabriziopandini commented Jul 16, 2020

What to include in a move operation

What use case are we covering

detiber commented Jul 16, 2020

What to include in a move operation

What use case are we covering

fejta-bot commented Oct 14, 2020

fabriziopandini commented Oct 14, 2020

fabriziopandini commented Nov 12, 2020

voor commented Jan 27, 2021

wfernandes commented Jan 28, 2021

vincepri commented Feb 19, 2021

fabriziopandini commented Feb 24, 2021

sedefsavas commented Mar 25, 2021

fabriziopandini commented Mar 26, 2021

fabriziopandini commented May 16, 2021

vincepri commented May 17, 2021

k8s-ci-robot commented May 17, 2021