[Bug]: GPU Error shows up on Non-GPU Clusters #1422

andrewballantyne · 2023-06-23T19:48:48Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

On some clusters, we get an error on the Workbench page in relation to GPUs. This was reported on the Dev Sandbox as well as in another RHODS cluster.

It's unclear what the actual cause is -- but it looks to be a 404 from the backend /api/gpu endpoint. Which means it likely is one of the many calls that happens on the backend that causes it to happen. The backend code is not safe from promise fails in k8s calls so it could just be a fail to fetch some details.

Expected Behavior

Silently stop 404s and just say there is no GPUs on this cluster.

Steps To Reproduce

Unknown.

Workaround (if any)

No response

What browsers are you seeing the problem on?

No response

Open Data Hub Version

Dashboard: v2.11.0

Anything else

This will be hard to reproduce -- best guess is it can't find the scale machines.

Investigation of the GPU endpoint will be needed to see if there is some triggering factor. There are several calls that are unbounded and can throw into a non-existent catch which turns it into a thread breaking error.

The text was updated successfully, but these errors were encountered:

Gkrumbach07 · 2023-09-28T17:47:08Z

This issue is will no longer relevant as the api/gpu backend endpoint is deprecated in favor for accelerator profiles

tracker for accelerators merging into main:

[UI] Habana Support Part 1 #1450

Gkrumbach07 · 2023-10-05T17:49:59Z

This can be closed with the completion of:

[Feature Request]: accelerator feature cleanup #1894
this issue will remove the deprecated gpu endpoint

andrewballantyne added kind/bug Something isn't working untriaged Indicates the newly create issue has not been triaged yet labels Jun 23, 2023

github-project-automation bot added this to ODH Dashboard Planning Jun 23, 2023

github-project-automation bot moved this to Needs prioritization in ODH Dashboard Planning Jun 23, 2023

andrewballantyne added the feature/accelerator-support All things related to Accelerators label Jun 23, 2023

Gkrumbach07 added priority/normal An issue with the product; fix when possible and removed untriaged Indicates the newly create issue has not been triaged yet labels Jun 27, 2023

Gkrumbach07 moved this from Needs prioritization to To do in ODH Dashboard Planning Jun 27, 2023

Gkrumbach07 added this to the Current Release milestone Jun 27, 2023

andrewballantyne modified the milestones: Current Release, Upcoming Release Jul 14, 2023

jkoehler-redhat added this to ODH Feature Tracking Jul 19, 2023

jkoehler-redhat moved this to Dashboard in ODH Feature Tracking Jul 19, 2023

lucferbux modified the milestones: Current Release, Upcoming Release Aug 3, 2023

andrewballantyne removed this from the Current Release milestone Sep 15, 2023

andrewballantyne added this to Internal tracking Oct 5, 2023

dgutride assigned Gkrumbach07 Nov 20, 2023

dgutride moved this from Dev To do to Dev In progress in ODH Dashboard Planning Nov 20, 2023

Gkrumbach07 closed this as completed Nov 29, 2023

github-project-automation bot moved this from Dev In progress to Done in ODH Dashboard Planning Nov 29, 2023

github-project-automation bot moved this to Done in Internal tracking Nov 29, 2023

andrewballantyne linked a pull request Nov 29, 2023 that will close this issue

Cleanup deprecated usage of gpu #2182

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: GPU Error shows up on Non-GPU Clusters #1422

[Bug]: GPU Error shows up on Non-GPU Clusters #1422

andrewballantyne commented Jun 23, 2023

Gkrumbach07 commented Sep 28, 2023

Gkrumbach07 commented Oct 5, 2023

[Bug]: GPU Error shows up on Non-GPU Clusters #1422

[Bug]: GPU Error shows up on Non-GPU Clusters #1422

Comments

andrewballantyne commented Jun 23, 2023

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Workaround (if any)

What browsers are you seeing the problem on?

Open Data Hub Version

Anything else

Gkrumbach07 commented Sep 28, 2023

Gkrumbach07 commented Oct 5, 2023