Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows nodes Support #67

Closed
kubebn opened this issue Jan 25, 2024 · 8 comments · Fixed by #68
Closed

Windows nodes Support #67

kubebn opened this issue Jan 25, 2024 · 8 comments · Fixed by #68
Assignees
Labels
enhancement New feature or request

Comments

@kubebn
Copy link

kubebn commented Jan 25, 2024

Hello,

We've been lucky so far while using AWS and aws-handler does support Windows nodes.

image

We do have some Windows Nodepools running in the AKS therefore, I am wondering if there are any plans for Windows support? Thanks

@maksim-paskal maksim-paskal self-assigned this Jan 29, 2024
@maksim-paskal maksim-paskal added the enhancement New feature or request label Jan 29, 2024
@maksim-paskal
Copy link
Owner

@kubebn Changes was released, please try to run this pods on Windows nodes (you need to reinstall the stable chart)

@kubebn
Copy link
Author

kubebn commented Feb 2, 2024

Hi @maksim-paskal , today I was planning to test Windows spot instances. However, I am getting confused with configuration.

I have these values:

  image: paskalmaksim/aks-node-termination-handler:v1.0.12
  # imagePullPolicy: Always
  priorityClassName: system-node-critical
  securityContext:
    runAsNonRoot: true
    privileged: false
    readOnlyRootFilesystem: true
    allowPrivilegeEscalation: false
    capabilities:
      drop:
      - ALL
    windowsOptions:
      runAsUserName: "ContainerUser"

  tolerations:
  - key: "kubernetes.azure.com/scalesetpriority"
    operator: "Equal"
    value: "spot"
    effect: "NoSchedule"
  - effect: NoSchedule
    key: windows
    operator: Equal
    value: "true"

I am getting:

aks-node-termination-handler-4mbxh   1/1     Running            0               21h     10.61.118.114   aks-lmd8spot1e4d-83777213-vmss00003j   <none>           <none>
aks-node-termination-handler-4ncjp   0/1     ErrImagePull       0               113s    10.61.66.36     aksw8s3e400001n                        <none>           <none>
aks-node-termination-handler-4nl89   1/1     Running            0               21h     10.61.116.237   aks-lmd8spot3e4d-27803674-vmss000035   <none>           <none>
aks-node-termination-handler-57rm6   0/1     ErrImagePull       0               113s    10.61.115.207   aksw8s2e400001o                        <none>           <none>
aks-node-termination-handler-58bg4   0/1     ErrImagePull       0               114s    10.61.102.177   aksw8s3e4000021                        <none>           <none>
---
k describe pod aks-node-termination-handler-57rm6
...
Containers:
  aks-node-termination-handler:
    Container ID:
    Image:         paskalmaksim/aks-node-termination-handler:v1.0.12
    Image ID:
    Port:          17923/TCP
    Host Port:     0/TCP
...
  Normal   Pulling          49s (x4 over 2m19s)  kubelet            Pulling image "paskalmaksim/aks-node-termination-handler:v1.0.12"
  Warning  Failed           49s (x4 over 2m19s)  kubelet            Error: ErrImagePull
  Normal   BackOff          23s (x7 over 2m19s)  kubelet            Back-off pulling image "paskalmaksim/aks-node-termination-handler:v1.0.12"

Is there anything else needs to be added so it can read Windows image manifest correctly?

@maksim-paskal
Copy link
Owner

@kubebn in production we don't have any Windows server, for my test I create simple cluster with Windows and Linux nodes, see README

Your logs doesn't have any reason why it not pull image, maybe your Windows nodes have some specific network settings, or it's some specific instance error....

Please try to create new AKS cluster (see README) and try to install aks-node-termination-handler in this cluster with default helm chart settings:

helm upgrade aks-node-termination-handler \
--install \
--namespace kube-system \
aks-node-termination-handler/aks-node-termination-handler \
--set priorityClassName=system-node-critical

and than install chart with your own values.yaml

@kubebn
Copy link
Author

kubebn commented Feb 2, 2024

I tried to install it straight using windows images: paskalmaksim/aks-node-termination-handler:v1.0.12-windows-amd64

Got this error message:

    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       StartError
      Message:      failed to create containerd task: failed to create shim task: hcs::CreateComputeSystem 182f8aea9ccc95e5b750a40a9e3a63bf0188ed4f2bfaff499a3052b25bfe4265: The container operating system does not match the host operating system.: unknown
      Exit Code:    128

Is it actually compatible with Windows 2019?

  OS Image:                   Windows Server 2019 Datacenter
  Operating System:           windows
  Architecture:               amd64

@maksim-paskal
Copy link
Owner

@kubebn it's some kubernetes windows specific error, more info here it means that docker image that build for Windows 2022 can't start on Windows 2019, and vice versa, it can be fixed only with different docker images for specific Windows version.

I see that AKS clusters have Windows 2022 by default

Windows Server 2022 is the default operating system for Kubernetes versions 1.25.0 and higher. Windows Server 2019 is the default OS for earlier versions.

I build test images for your test, you can change image for your pods to check if it close your issues:
Windows 2022: paskalmaksim/aks-node-termination-handler:test-7772698645-windows-ltsc2022-amd64
Windows 2019: paskalmaksim/aks-node-termination-handler:test-7772698645-windows-ltsc2019-amd64

Can you migrate your workflows from Windows 2019 to Windows 2022?
What Operation Systems you cluster have (Linux + Windows 2019 or Linux + Windows 2019 + Windows 2022) ?

@kubebn
Copy link
Author

kubebn commented Feb 4, 2024

Hi, yes we are aware that 2019 will be deprecated soon but unfortunately can’t migrate all of them now.

I will try those images on Monday, I guess I will just create two daemonsets for diff versions.

We have Linux and both Windows versions.

@maksim-paskal
Copy link
Owner

There is more elegant way to run pods in your landscape Linux + Windows 2019 + Windows 2022 - you need two installation of aks-node-termination-handler:

values.yaml of first installation (exclude Windows 2019 nodes)

priorityClassName: system-node-critical

image: paskalmaksim/aks-node-termination-handler:latest

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.azure.com/os-sku
          operator: NotIn
          values:
          - Windows2019

values.yaml of second installation (only Windows 2019 nodes)

priorityClassName: system-node-critical

image: paskalmaksim/aks-node-termination-handler:latest-ltsc2019

nodeSelector:
  kubernetes.azure.com/os-sku: Windows2019

It's my proof of concept for new release, I try to implement this on this week

@maksim-paskal
Copy link
Owner

@kubebn Windows 2019 now has support, see readme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants