[Bug]: GPU Error shows up on Non-GPU Clusters #1422
Labels
feature/accelerator-support
All things related to Accelerators
field-priority
Flag to track improvements that are for stability -- effort to put in front of new functionality
kind/bug
Something isn't working
priority/high
Important issue that needs to be resolved asap. Releases should not have too many of these.
Is there an existing issue for this?
Current Behavior
On some clusters, we get an error on the Workbench page in relation to GPUs. This was reported on the Dev Sandbox as well as in another RHODS cluster.
It's unclear what the actual cause is -- but it looks to be a 404 from the backend
/api/gpu
endpoint. Which means it likely is one of the many calls that happens on the backend that causes it to happen. The backend code is not safe from promise fails in k8s calls so it could just be a fail to fetch some details.Expected Behavior
Silently stop 404s and just say there is no GPUs on this cluster.
Steps To Reproduce
Unknown.
Workaround (if any)
No response
What browsers are you seeing the problem on?
No response
Open Data Hub Version
Anything else
This will be hard to reproduce -- best guess is it can't find the scale machines.
Investigation of the GPU endpoint will be needed to see if there is some triggering factor. There are several calls that are unbounded and can throw into a non-existent catch which turns it into a thread breaking error.
The text was updated successfully, but these errors were encountered: