Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: GPU Error shows up on Non-GPU Clusters #1422

Closed
1 task done
andrewballantyne opened this issue Jun 23, 2023 · 2 comments · Fixed by #2182
Closed
1 task done

[Bug]: GPU Error shows up on Non-GPU Clusters #1422

andrewballantyne opened this issue Jun 23, 2023 · 2 comments · Fixed by #2182
Assignees
Labels
feature/accelerator-support All things related to Accelerators field-priority Flag to track improvements that are for stability -- effort to put in front of new functionality kind/bug Something isn't working priority/high Important issue that needs to be resolved asap. Releases should not have too many of these.

Comments

@andrewballantyne
Copy link
Member

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

image image

On some clusters, we get an error on the Workbench page in relation to GPUs. This was reported on the Dev Sandbox as well as in another RHODS cluster.

It's unclear what the actual cause is -- but it looks to be a 404 from the backend /api/gpu endpoint. Which means it likely is one of the many calls that happens on the backend that causes it to happen. The backend code is not safe from promise fails in k8s calls so it could just be a fail to fetch some details.

Expected Behavior

Silently stop 404s and just say there is no GPUs on this cluster.

Steps To Reproduce

Unknown.

Workaround (if any)

No response

What browsers are you seeing the problem on?

No response

Open Data Hub Version

Dashboard: v2.11.0

Anything else

This will be hard to reproduce -- best guess is it can't find the scale machines.

Investigation of the GPU endpoint will be needed to see if there is some triggering factor. There are several calls that are unbounded and can throw into a non-existent catch which turns it into a thread breaking error.

@andrewballantyne andrewballantyne added kind/bug Something isn't working untriaged Indicates the newly create issue has not been triaged yet labels Jun 23, 2023
@github-project-automation github-project-automation bot moved this to Needs prioritization in ODH Dashboard Planning Jun 23, 2023
@andrewballantyne andrewballantyne added the feature/accelerator-support All things related to Accelerators label Jun 23, 2023
@Gkrumbach07 Gkrumbach07 added priority/normal An issue with the product; fix when possible and removed untriaged Indicates the newly create issue has not been triaged yet labels Jun 27, 2023
@Gkrumbach07 Gkrumbach07 moved this from Needs prioritization to To do in ODH Dashboard Planning Jun 27, 2023
@Gkrumbach07 Gkrumbach07 added priority/high Important issue that needs to be resolved asap. Releases should not have too many of these. field-priority Flag to track improvements that are for stability -- effort to put in front of new functionality and removed priority/normal An issue with the product; fix when possible labels Jun 27, 2023
@Gkrumbach07 Gkrumbach07 added this to the Current Release milestone Jun 27, 2023
@andrewballantyne andrewballantyne removed this from the Current Release milestone Sep 15, 2023
@Gkrumbach07
Copy link
Member

This issue is will no longer relevant as the api/gpu backend endpoint is deprecated in favor for accelerator profiles

tracker for accelerators merging into main:

@Gkrumbach07
Copy link
Member

This can be closed with the completion of:

@dgutride dgutride moved this from Dev To do to Dev In progress in ODH Dashboard Planning Nov 20, 2023
@github-project-automation github-project-automation bot moved this from Dev In progress to Done in ODH Dashboard Planning Nov 29, 2023
@andrewballantyne andrewballantyne linked a pull request Nov 29, 2023 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/accelerator-support All things related to Accelerators field-priority Flag to track improvements that are for stability -- effort to put in front of new functionality kind/bug Something isn't working priority/high Important issue that needs to be resolved asap. Releases should not have too many of these.
Projects
Status: Done
Status: Dashboard
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants