Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move byohosts after clusterctl move to new management cluster #853

Open
robinAwallace opened this issue Oct 30, 2023 · 6 comments
Open

Move byohosts after clusterctl move to new management cluster #853

robinAwallace opened this issue Oct 30, 2023 · 6 comments

Comments

@robinAwallace
Copy link

What steps did you take and what happened:

Hello,

I have a BYOH cluster that I would like to move to a new management cluster using the clusterctl move command clusterctl move --kubeconfig <byoh-management-cluster> --to-kubeconfig <new-management-cluster>.

All resource are moved but the byohost are not moved. Which is maybe not that strange. But Im not able to register the machines to the new management cluster.

I have tried to generate new bootstrap-kubeconfigs for the new management cluster and send them to the machines and restart the byoh-agent with no success.

What did you expect to happen:

After clusterctl move I would like to re-register the machines to the new management cluster.

Anything else you would like to add:

Environment:

  • Cluster-api-provider-bringyourownhost version: v0.4.0
  • Kubernetes version: (use kubectl version --short): v1.26.6
  • OS (e.g. from /etc/os-release): Ubuntu-20.04
@dharmjit
Copy link
Contributor

dharmjit commented Nov 1, 2023

Hi @robinAwallace, I think the cluster move is not validated for BYOH and it certainly requires agent restart to talk to the new management cluster.

All resource are moved but the byohost are not moved. Which is maybe not that strange

This might be due to the permissions on ByoHost CRDs. Did you get any errors with clusterctl move?

restart the byoh-agent with no success

Can you share the agent output/errors too?

@robinAwallace
Copy link
Author

Hello 🙂

No there where no errors from clusterctl move. The only error I got was that the byoh controller could not find the byohosts.

But I got it to work.
After doing clusterctl move I had to move the byo host objects manually from the first cluster to the new management cluster. To do this I had to delete the byoh webhook stopping you do add byoh objects.

Then I had to create new kubeconfigs with the correct cert and ip to the new control-plane. Also I had to create a new csr to validate byoh agent user. Finally I sent the new kubeconfig to the nodes at /.byoh/config and restarted the agent. Then everything work fine 🥳

@dharmjit
Copy link
Contributor

dharmjit commented Nov 2, 2023

Awesome, there are still some UX gaps but it will be nice to have the above manual process captured in some doc. Would you like to create a PR for documentation of the steps that you have followed?

@syndicut
Copy link

But I got it to work. After doing clusterctl move I had to move the byo host objects manually from the first cluster to the new management cluster. To do this I had to delete the byoh webhook stopping you do add byoh objects.

Then I had to create new kubeconfigs with the correct cert and ip to the new control-plane. Also I had to create a new csr to validate byoh agent user. Finally I sent the new kubeconfig to the nodes at /.byoh/config and restarted the agent. Then everything work fine 🥳

Tried to follow the same process, but byoh-agent got stuck with:

I0315 15:46:24.611558   10704 host_reconciler.go:91]  "msg"="Machine ref not yet set"

I believe the reason for this is that Status for ByoHost is not copied to destination cluster, but because it has a AttachedByoMachineLabel byoh infrastructure controller is not setting it. Tried to delete AttachedByoMachineLabel label and restart byoh infrastructure controller, but it didn't help - byoh infrastructure controller now says that:

I0315 16:16:12.321312       1 byomachine_controller.go:270]  "msg"="Attempting host reservation" 
I0315 16:16:12.321493       1 byomachine_controller.go:519]  "msg"="No hosts found, waiting.."

@robinAwallace
Copy link
Author

Hmm, I did not have this issue.

But yes as you say it does not copy over the ByoHosts when running the move command. So I had to copy them manually by doing a kubectl get byohosts.infrastructure.cluster.x-k8s.io -n <namesapce> <byohost> -oyaml and save it to a file. But before I can apply it to the new management cluster I have to temporarily remove the webhook, validatingwebhookconfigurations.admissionregistration.k8s.io byoh-validating-webhook-configuration.

I hope you get it to work 🙂

@syndicut
Copy link

Hmm, I did not have this issue.

But yes as you say it does not copy over the ByoHosts when running the move command. So I had to copy them manually by doing a kubectl get byohosts.infrastructure.cluster.x-k8s.io -n <namesapce> <byohost> -oyaml and save it to a file. But before I can apply it to the new management cluster I have to temporarily remove the webhook, validatingwebhookconfigurations.admissionregistration.k8s.io byoh-validating-webhook-configuration.

I hope you get it to work 🙂

I think my problem was that I skipped that part:

Also I had to create a new csr to validate byoh agent user. Finally I sent the new kubeconfig to the nodes at /.byoh/config

But I got it got work, though I had to add a little patch (nebius#9)

This way the move process is very simple:

  1. You just do clusterctl move
  2. Then repeat steps defined here https://github.com/vmware-tanzu/cluster-api-provider-bringyourownhost/blob/main/docs/getting_started.md#generating-the-bootstrap-kubeconfig-file (and copy kubeconfig to the host)
  3. Then just remove ~/.byoh/config and restart byoh-agent - it then recreates ByoHost in new cluster and everything just works

I think I'll write some e2e tests and bring PR with it (and some documentation about move process)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants