-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test clusterctl move
with ASO
#3795
Comments
/assign |
I think as long as the
To ensure this, the ASO resource needs to be annotated before being deleted from the source management cluster. In my first test, the ASO resource was annotated about 2s before it was deleted. That seems like a reasonably comfortable buffer to me if that's consistent.
This scenario is to prevent the ASO on the source and destination clusters from actively reconciling the same resource, which could pose problems if the resource gets modified during move. This can be ensured when the I'm not sure that's a problem worth worrying about though since both ASO instances may be reconciling two definitions of the same resource, but as long as both definitions are equivalent during the move, the ASO control planes in each cluster will be doing redundant work but will not actively conflict with each other. So I doubt this would cause any problems. I'll do more testing to get a better sample size of the timings here and try with more clusters being moved at once. These timings should maybe also be taken with a grain of salt since I'm not sure if the capz-controller-manager Pod and clusterctl environments are using the same clock or how well they're synced if not. tl;dr Things look ok without the CAPI change so far, more testing needed. |
It looks like moving more (10) clusters gives us more buffer, with about 6.5s between annotating and the first moved resource being created and another ~13s before any moved resource gets deleted. Overall, I'm reasonably confident users won't run into issues even without the CAPI fix, at least for this first iteration of ASO in CAPZ that manages only resource groups. cc @dtzar I'll look to see what we might be able to add to the tests to catch when the annotation doesn't get applied in time, but I'm not optimistic we can do anything meaningful without tweaking clusterctl. |
I think this point is probably already tested well enough for now since the test will ensure that the Cluster is
I suppose we could put a watch on the ResourceGroup while the move is taking place to check for this explicitly. I have a feeling that if resources really were being deleted, that would probably at least make the test time out if it has to recreate the cluster from scratch, if that doesn't completely blow up the test some other way. I'll do this same testing once there are a few more ASO resources in the mix but close this for now. I don't think this is critical enough at the moment and it doesn't seem like there's a simple high-value check we can add to the tests. /close |
@nojnhuh: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This issue aims to assess how necessary kubernetes-sigs/cluster-api#8473 is to address for CAPZ's ASO migration. The following are criteria required for
clusterctl move
to work that are most at risk without a solution to the linked CAPI issue:Ready
Deleting
stateThere is a chance the above criteria may be met without a solution to the above CAPI issue, in which case we may be able to afford to be more patient in addressing that.
See also:
#3525
The text was updated successfully, but these errors were encountered: