Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly set number of GPUs in dask-sidecar #5094

Open
Tracked by #950
sanderegg opened this issue Nov 28, 2023 · 0 comments
Open
Tracked by #950

Properly set number of GPUs in dask-sidecar #5094

sanderegg opened this issue Nov 28, 2023 · 0 comments
Assignees
Labels
a:clusters-keeper a:dask-service Any of the dask services: dask-scheduler/sidecar or worker a:director-v2 issue related with the director-v2 service

Comments

@sanderegg
Copy link
Member

sanderegg commented Nov 28, 2023

Currently the dask-sidecar management of GPUs on the underlying machine relies on defaults.

  • on start it counts the number of available GPUs, relying on running a container nvidia-smi --list-gpus,
  • it reports the number of available GPUs to the dask-scheduler,
  • when a service requiring GPUs is started, it has to be inheriting an nvidia-based container so that it actually requires GPUs and gets by default all of them assigned to the container

This works as is but generates issues:

  • inconsistent with how CPUs/RAM is assigned to the containers,
  • in case of sharing a mutli-GPU machine with multiple single GPU container it would fail as all the container would get all GPUs assigned,

This shall be modified by doing the following changes:

  • DV-2/Clusters-keeper: the assigned GPU resources in case of using the full machine should be set correctly
  • Dask-sidecar: assign the correct amount of GPUs to the container:
"HostConfig": {
            "DeviceRequests": [
                {
                    "Driver": "nvidia",
                    "count": int(num_GPUs),
                    "Capabilities": [["gpu"]],
                }
            ],
        }
@sanderegg sanderegg transferred this issue from ITISFoundation/osparc-issues Nov 28, 2023
@sanderegg sanderegg self-assigned this Nov 28, 2023
@sanderegg sanderegg added a:director-v2 issue related with the director-v2 service a:dask-service Any of the dask services: dask-scheduler/sidecar or worker a:clusters-keeper labels Nov 28, 2023
@mrnicegyu11 mrnicegyu11 changed the title Properly set number of GPUs Properly set number of GPUs in dask-sidecar Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:clusters-keeper a:dask-service Any of the dask services: dask-scheduler/sidecar or worker a:director-v2 issue related with the director-v2 service
Projects
None yet
Development

No branches or pull requests

1 participant