Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenShift Support #75

Closed
jsanchezmartinez opened this issue Mar 7, 2024 · 13 comments · Fixed by #76
Closed

OpenShift Support #75

jsanchezmartinez opened this issue Mar 7, 2024 · 13 comments · Fixed by #76
Assignees
Labels
enhancement New feature or request

Comments

@jsanchezmartinez
Copy link

Hi,
When running in OpenShift, there are no VirtualMachineScaleSets (only VirtualMachines), and for that reason, the DaemonSet is crashing (attached logs below).
Can we request for OpenShift support?

{"file":"github.com/maksim-paskal/aks-node-termination-handler/cmd/main.go:55","func":"main.main","level":"info","msg":"Starting 1.0.13-d8d5a71-1707463489...","time":"2024-03-07T10:48:45Z"}
{"file":"github.com/maksim-paskal/aks-node-termination-handler/pkg/alert/alert.go:29","func":"github.com/maksim-paskal/aks-node-termination-handler/pkg/alert.Init","level":"warning","msg":"not sending Telegram message, no token","time":"2024-03-07T10:48:45Z"}
{"file":"github.com/maksim-paskal/aks-node-termination-handler/pkg/client/client.go:45","func":"github.com/maksim-paskal/aks-node-termination-handler/pkg/client.Init","level":"info","msg":"No kubeconfig file use incluster","time":"2024-03-07T10:48:45Z"}
{"error":"error in getting azure resource name: azure:///subscriptions/dd6b40ef-de5f-4649-95a7-bd2337c71900/resourceGroups/ocp-azure-uat-euw-8npmn-rg/providers/Microsoft.Compute/virtualMachines/master-1: azureProviderID not valid","file":"github.com/maksim-paskal/aks-node-termination-handler/cmd/main.go:86","func":"main.main","level":"fatal","msg":"","time":"2024-03-07T10:48:45Z"}
@maksim-paskal
Copy link
Owner

@jsanchezmartinez thanks for opening this issue. By default this tool use AKS (as a Azure service) nodes. All nodes that AKS creates have .spec.providerID that corresponds to this format

AzureProviderID = "^azure:///subscriptions/(.+)/resourceGroups/(.+)/providers/Microsoft.Compute/virtualMachineScaleSets/(.+)/virtualMachines/(.+)$" //nolint:lll

I never use OpenShift, but theoretically this tool can works on OpenShift nodes. Can you share some recipe how to build OpenShift cluster on Azure?

I will try to create cluster on Azure, and try to run this tool on this nodes.

@maksim-paskal maksim-paskal added the enhancement New feature or request label Mar 7, 2024
@maksim-paskal maksim-paskal self-assigned this Mar 7, 2024
@jsanchezmartinez
Copy link
Author

Hi @maksim-paskal,
Easiest way is using ARO service (https://azure.microsoft.com/en-us/products/openshift/ and https://portal.azure.com/?feature.msaljs=true#view/HubsExtension/BrowseResource/resourceType/Microsoft.RedHatOpenShift%2FOpenShiftClusters).
In OpenShift there are no VirtualMachineScaleSets (only VMs), so the Azure provider ID is a bit different: "^azure:///subscriptions/(.+)/resourceGroups/(.+)/providers/Microsoft.Compute/virtualMachines/(.+)$"

@maksim-paskal
Copy link
Owner

@jsanchezmartinez can you reboot OpenShift server from Azure Portal (for example worker node) ?

I create OpenShift cluster on Azure - but all operation with server are forbidden for my user (restriction I think was set while creating OpenShift cluster on resource group with OpenShift servers).

I don't know how to test my changes.

@jsanchezmartinez
Copy link
Author

OpenShift VMs cannot be restarted when using Azure ARO. I can restart in some of our clusters, because are self managed.
Why do you need to restart a VM from OpenShift to test the changes?

@maksim-paskal
Copy link
Owner

aks-node-termination-handler listen all events from Azure Scheduled Events and reboot is one of events that Azure sends...

How you plan to use this tool? Are you plan to use Azure Spot?

@jsanchezmartinez
Copy link
Author

Yes. We are currently running spot instances and we want mainly to drain nodes when eviction events are detected.
Maybe you can test simulating eviction events through Azure API: https://learn.microsoft.com/en-us/rest/api/compute/virtual-machines/simulate-eviction?view=rest-compute-2023-10-02&tabs=HTTP

@maksim-paskal
Copy link
Owner

How can I add to OpenShift cluster spot instances?

@jsanchezmartinez
Copy link
Author

Basically, you have to pick an existing worker MachineSet, copy/paste it and adapt (https://learn.microsoft.com/en-us/azure/openshift/howto-spot-nodes).
This is the important part to be added:

      providerSpec:
        value:
          spotVMOptions: {}

@maksim-paskal
Copy link
Owner

Simulation API is not available for OpenShift servers, there are some restrictions on servers resource group . If you have OpenShift cluster with Spots, you can test my change in your cluster.

OpenShift clusters have some restriction for pods that want to connect to 169.254.169.254, aks-node-termination-handler needs access to that address for reading events. You must enable this installing chart with --set hostNetwork=true

helm upgrade aks-node-termination-handler \
--install \
--namespace kube-system \
https://github.com/maksim-paskal/aks-node-termination-handler/releases/download/v1.0.13/aks-node-termination-handler-1.1.5.tgz \
--set priorityClassName=system-node-critical \
--set image=paskalmaksim/aks-node-termination-handler:dev \
--set hostNetwork=true

If you can test it, this will be awesome - after test I will release that feature.

@jsanchezmartinez
Copy link
Author

I'll try to test today or next Monday and come back. Thanks :)

@jsanchezmartinez
Copy link
Author

Seems to be working fine (see attached logs screenshot). Do you need/want anything else to validate?

image

@maksim-paskal
Copy link
Owner

Let's watch. If something wrong happens, please describe that issue. I will release these changes next week (Thursday), if no issues are found.

maksim-paskal added a commit that referenced this issue Mar 8, 2024
Azure offers [Red Hat OpenShift](https://azure.microsoft.com/en-us/products/openshift/), and clients can also deploy a self-managed cluster. The nodes of this cluster can handle the [Azure Metadata Service](https://learn.microsoft.com/en-us/azure/virtual-machines/linux/scheduled-events) in the same way as [AKS](https://azure.microsoft.com/en-us/products/kubernetes-service) managed nodes.

The difference is that AKS uses `virtualMachineScaleSets`, while OpenShift and self-managed cluster uses `virtualMachines` for their nodes.

This change can help Azure customers that use OpenShift or self-managed kubernetes clusters for their infrastucture.

Closes: #75

Signed-off-by: Maksim Paskal <[email protected]>
@maksim-paskal
Copy link
Owner

This changes was released, please swith your dev installation to production

helm repo add aks-node-termination-handler https://maksim-paskal.github.io/aks-node-termination-handler/
helm repo update

helm upgrade aks-node-termination-handler \
--install \
--namespace kube-system \
aks-node-termination-handler/aks-node-termination-handler \
--set priorityClassName=system-node-critical \
--set hostNetwork=true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants