Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healthy check issue? #27

Open
hsyrjaos opened this issue Dec 23, 2024 · 0 comments
Open

Healthy check issue? #27

hsyrjaos opened this issue Dec 23, 2024 · 0 comments

Comments

@hsyrjaos
Copy link

Issue description

Healthy check compared to wrong field -> not working.
Here is the possible fix:
diff --git a/habanalabs.go b/habanalabs.go
index 8a545c6..000f6a5 100644
--- a/habanalabs.go
+++ b/habanalabs.go
@@ -158,8 +158,8 @@ func watchXIDs(ctx context.Context, devs []*pluginapi.Device, xids chan<- *plugi
continue
}

  •                   serial, err := dev.SerialNumber() // BUG: fix this was before UUID
    
  •                   if err != nil || len(serial) == 0 {
    
  •                   uuid, err := dev.UUID()
    
  •                   if err != nil || len(uuid) == 0 {
                              slog.Error("XidCriticalError: All devices will go unhealthy", "xid", e.Etype)
                              // All devices are unhealthy
                              for _, d := range devs {
    

@@ -169,7 +169,7 @@ func watchXIDs(ctx context.Context, devs []*pluginapi.Device, xids chan<- *plugi
}

                    for _, d := range devs {
  •                           if d.ID == serial {
    
  •                           if d.ID == uuid {
                                      slog.Error("XidCriticalError: the device will go unhealthy", "xid", e.Etype, "aip", d.ID)
                                      xids <- d
                              }
    

Steps to reproduce (describe as minimally and precisely as possible)?

unhealthy condition never match.

OS

Linux 6.11.10-061110-generic x86_64 PRETTY_NAME="Ubuntu 22.04.5 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.5 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy

Kernel Version

6.11.10-061110-generic

Container Runtime Type/Version

any

K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS)

all

Extra logs and files

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant