Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove host from inventory #288

Open
kerenbenzion opened this issue Dec 21, 2021 · 13 comments
Open

Remove host from inventory #288

kerenbenzion opened this issue Dec 21, 2021 · 13 comments
Milestone

Comments

@kerenbenzion
Copy link

I am trying to use byoh provider and I ran into an issue which I cannot delete an host which is not in use.
I am trying to run uninstall of the agent on the host side , and to run delete on the management side and I get the following error:

the command I am trying to run :

  • kubectl delete byohosts.infrastructure.cluster.x-k8s.io lab-preprod-k8s-worker-lab-5

the result:

  • Error from server (Forbidden): admission webhook "vbyohost.kb.io" denied the request: byohost.infrastructure.cluster.x-k8s.io "lab-preprod-k8s-worker-lab-5" is forbidden: cannot delete ByoHost when MachineRef is assigned

I have also tried to edit the byoh host object , but I dont manage to commit any changes when I am deleting the machine ref.

Can you let me know what should I do?

  • Kubernetes version: (use kubectl version --short): Client Version: v1.22.4
    Server Version: v1.22.4
  • OS: ubuntu 20.4
@dharmjit
Copy link
Contributor

Hi @kerenbenzion, There is a delete webhook that restricts deleting a ByoHost CR with a status.MachineRef pointing to a ByoMachine. The Host-Agent should be running and then you can try to scale out MD/KCP.

Note that removing a specific ByoHost/ByoMachine from the workload cluster is not supported as of now.

@kerenbenzion
Copy link
Author

@dharmjit if I understood correctly, currently we can do only scale up ? We cannot decrease number of hosts if machine ref has been assigned already ?
Will it be supported?

@dharmjit
Copy link
Contributor

currently we can do only scale up

We can do both scale up/scale down for both KCP/MD as per the CAPI contract. you can use below to update the number of replicas

  • kubectl scale kcp|md
  • kubectl edit kcp|md

We cannot decrease number of hosts.

If I understand your question correctly, you can detach hosts from the cluster with scale-out operation and once you have identified the host which is removed, you can stop the Host-Agent and delete the specific ByoHost CR.

@kerenbenzion
Copy link
Author

Hi,

I am trying to understand if lets say one of my hosts (worker/controlplane) have issues (physical/os ) and I need some time to fix it
Can I remove the byoh host untill the problem is fixed?

@dharmjit
Copy link
Contributor

Can I remove the byoh host untill the problem is fixed?

Removing a specific host is not supported as of now.

@kerenbenzion
Copy link
Author

Do you know/if it will be supported?

@anusha94
Copy link
Contributor

anusha94 commented Dec 22, 2021

Hey @kerenbenzion

Some context on why you are forbidden to delete a byohost directly. We want to avoid any accidental deletion of byohosts from the cluster which can disrupt your cluster state and workload scheduled on that node.

Just re-iterating what @dharmjit mentioned. For any maintenance activity on the host, you should first do a scale-down operation - as of now we can't choose a particular host while scaling down, so your best option now is to teardown the cluster so that all hosts will be released back to capacity pool or try to scale down (control plane / worker) and hope your host is the one being deprovisioned.
Since MachineRef is set on the byohost.Status field, I'm not quite sure if you can edit that - that is why you are not able to commit changes.

When doing a scale-down operation or cluster deprovision, you can check the agent logs and when you see something like MachineRef not set - that is when you are free to issue the kubectl delete byohost command.

Do you know/if it will be supported?

This is a valid use case and thank you for pointing it out. I will open an issue regarding this (most probably after the holidays 😄 )

When the team is back we will look at prioritizing this issue and include in one of the releases.

@huchen2021
Copy link
Contributor

Hi,

I am trying to understand if lets say one of my hosts (worker/controlplane) have issues (physical/os ) and I need some time to fix it Can I remove the byoh host untill the problem is fixed?

There is a way can do this, be careful, it is not recommend on your product environment. Delete entire byoh provider resource first, and then clean your cluster resource completely. It should solve your current dilemma.

@anusha94 anusha94 added this to the Next milestone Mar 4, 2022
@dharmjit
Copy link
Contributor

dharmjit commented Apr 7, 2022

Hi @kerenbenzion, We were not aware earlier but CAPI already provides a way to delete a specific Machine. You could perform below steps.

  1. Identify the machine to delete viakubectl get machines
  2. Annotate the machine via kubectl annotate machine machine-name "cluster.x-k8s.io/delete-machine"="yes"
  3. Scale down the KCP/MD via kubectl scale md md-name --replicas=1

Let us know if this works for you.

@keshavcruise
Copy link

This might be an absolute hack. But I got around deleting stagnant byoHosts by temporarily removing the validatingwebhookconfiguration for byoHosts

@eugene-chernyshenko
Copy link

I had the similar issue with removing the byohost

k delete byoh byoh-1

Error from server (cannot delete ByoHost when MachineRef is assigned): admission webhook "vbyohost.kb.io" denied the request: cannot delete ByoHost when MachineRef is assigned

Then I tried to remove the machineRef

k edit byoh byoh-1

machineRef:
  apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
  kind: ByoMachine
  name: byoh-cluster-control-plane-jx95s
  namespace: default
  uid: f36c4978-089b-42a3-8d05-a4148f25b5a8

and got this error

error: byohosts.infrastructure.cluster.x-k8s.io "byoh-1" could not be patched: Internal error occurred: failed calling webhook "vbyohost.kb.io": failed to call webhook: Post "https://byoh-webhook-service.byoh-system.svc:443/validate-infrastructure-cluster-x-k8s-io-v1beta1-byohost?timeout=10s": EOF

k -n byoh-system logs -f byoh-controller-manager-6c4555b4dd-szlkp

2023/05/15 21:53:25 http: panic serving 10.42.0.1:48090: runtime error: index out of range [2] with length 2
goroutine 1881 [running]:
net/http.(*conn).serve.func1()
        /usr/local/go/src/net/http/server.go:1825 +0xbf
panic({0x1ae6a20, 0xc000b4a558})
        /usr/local/go/src/runtime/panic.go:844 +0x258
github.com/vmware-tanzu/cluster-api-provider-bringyourownhost/apis/infrastructure/v1beta1.(*ByoHostValidator).handleCreateUpdate(0xc00044fc48, 0xc000c5adc0)
        /workspace/apis/infrastructure/v1beta1/byohost_webhook.go:59 +0x6d5
github.com/vmware-tanzu/cluster-api-provider-bringyourownhost/apis/infrastructure/v1beta1.(*ByoHostValidator).Handle(_, {_, _}, {{{0xc000aff7d0, 0x24}, {{0xc000acfc40, 0x1f}, {0xc0005fe338, 0x7}, {0xc0005fe390, ...}}, ...}})
        /workspace/apis/infrastructure/v1beta1/byohost_webhook.go:35 +0x117
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle(_, {_, _}, {{{0xc000aff7d0, 0x24}, {{0xc000acfc40, 0x1f}, {0xc0005fe338, 0x7}, {0xc0005fe390, ...}}, ...}})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/webhook/admission/webhook.go:146 +0xa2
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP(0xc00083b180, {0x7fe8940926b8?, 0xc00047c690}, 0xc00066f100)
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/webhook/admission/http.go:99 +0xe90
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1({0x7fe8940926b8, 0xc00047c690}, 0x1f2ee00?)
        /go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:40 +0xd4
net/http.HandlerFunc.ServeHTTP(0x1f2ee78?, {0x7fe8940926b8?, 0xc00047c690?}, 0xc000c5ba38?)
        /usr/local/go/src/net/http/server.go:2084 +0x2f
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1({0x1f2ee78?, 0xc0001880e0?}, 0xc00066f100)
        /go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:117 +0xaa
net/http.HandlerFunc.ServeHTTP(0x2d0b040?, {0x1f2ee78?, 0xc0001880e0?}, 0xc000c5b9c0?)
        /usr/local/go/src/net/http/server.go:2084 +0x2f
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2({0x1f2ee78, 0xc0001880e0}, 0xc00066f100)
        /go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:84 +0xbf
net/http.HandlerFunc.ServeHTTP(0x7573f31c1d?, {0x1f2ee78?, 0xc0001880e0?}, 0xc000386070?)
        /usr/local/go/src/net/http/server.go:2084 +0x2f
net/http.(*ServeMux).ServeHTTP(0xc00067a33f?, {0x1f2ee78, 0xc0001880e0}, 0xc00066f100)
        /usr/local/go/src/net/http/server.go:2462 +0x149
net/http.serverHandler.ServeHTTP({0x1f21638?}, {0x1f2ee78, 0xc0001880e0}, 0xc00066f100)
        /usr/local/go/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc000782280, {0x1f2fca8, 0xc00084bbc0})
        /usr/local/go/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
        /usr/local/go/src/net/http/server.go:3071 +0x4db

btw no matter with cluster or after removing cluster

@haiwu
Copy link

haiwu commented Jul 31, 2023

I just tried these steps, and it seems to work:

    Identify the machine to delete viakubectl get machines
    Annotate the machine via kubectl annotate machine machine-name "cluster.x-k8s.io/delete-machine"="yes"
    Scale down the KCP/MD via kubectl scale md md-name --replicas=1

@harsh-p-soni
Copy link

Any solution for this as I am unable to delete byohost even though machine resource is not present ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants