Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Either 'device_ids' or 'count' must be specified otherwise no GPUs are loaded #21026

Closed
1 task done
StellaContrail opened this issue Sep 28, 2024 · 6 comments · Fixed by compose-spec/compose-go#689
Closed
1 task done
Labels
area/compose Relates to docker-compose.yml spec or docker-compose binary lifecycle/locked status/triage Needs triage

Comments

@StellaContrail
Copy link

StellaContrail commented Sep 28, 2024

Is this a docs issue?

  • My issue is about the documentation content or website

Type of issue

Information is incorrect

Description

Since moby v27.3.0, either 'device_ids' or 'count' must be specified to load GPUs. (moby/moby#48483)

The documentation says :

count. This value, specified as an integer or the value all, represents the number of GPU devices that should be reserved (providing the host holds that number of GPUs). If count is set to all or not specified, all GPUs available on the host are used by default.

device_ids. This value, specified as a list of strings, represents GPU device IDs from the host. You can find the device ID in the output of nvidia-smi on the host. If no device_ids are set, all GPUs available on the host are used by default.

However, skipping these parameters at the same time results in NVIDIA_VISIBLE_DEVICES=void. This causes no GPUs to be loaded. You need to set 'device_ids' to none and 'count: all', in order to load all GPUs.

Location

https://docs.docker.com/compose/how-tos/gpu-support/#enabling-gpu-access-to-service-containers

Suggestion

Rewrite the description to the following:

count. This value, specified as an integer or the value all, represents the number of GPU devices that should be reserved (providing the host holds that number of GPUs). If count is set to all and no device_ids are set, all GPUs available on the host are used by default.

device_ids. This value, specified as a list of strings, represents GPU device IDs from the host. You can find the device ID in the output of nvidia-smi on the host. If no device_ids are set and count is set to all, all GPUs available on the host are used by default.

or just write it apart to a note:

:::note important

If count is set to all and no device_ids are set, all GPUs available on the host are used by default.

@StellaContrail StellaContrail added the status/triage Needs triage label Sep 28, 2024
@StellaContrail StellaContrail changed the title Either 'device_ids' or 'count' must be specified. Either 'device_ids' or 'count' must be specified otherwise no GPUs are loaded Sep 29, 2024
@yanorei32
Copy link

yanorei32 commented Sep 29, 2024

This feature breaks backwards compatibility.
In my case, I lost access to my GPU with this docker-compose.yml.

https://github.com/yr32infra/voicevox-deploy/blob/3f76d7acbd7a0c4e4000a6529a497f498c0b52c0/docker-compose.yml#L2-L20

I think this change was unintended and is a bug.

@yanorei32
Copy link

yanorei32 commented Sep 29, 2024

@thaJeztah @ezrasilvera @laurazard
These seem to be breaking changes to the docker-compose interface, is this intended?

moby/moby#48482
moby/moby#48483

@aevesdocker aevesdocker added the area/compose Relates to docker-compose.yml spec or docker-compose binary label Sep 30, 2024
@laurazard
Copy link
Member

laurazard commented Sep 30, 2024

Hi, thanks for the reports all. I don't think this was intended – likely what happened was that Compose was setting count to 0 when not specified, which with this change on the daemon side now causes gpus to be explicitly disabled. I'll open a ticket on the Compose side to set it to -1 in this case.

I guess the other option is we fix up the docs, but since it's been explicitly documented that

If count is set to all or not specified, all GPUs available on the host are used by default.

it makes more sense to me that we try to preserve that behavior.

cc @ndeloof @glours @jhrotko

@glours
Copy link
Contributor

glours commented Sep 30, 2024

Thanks @laurazard , I'll take a look

@laurazard
Copy link
Member

My bad @glours, didn't even think about this but we could have opened a PR/pinged you Compose folks so you'd be aware.

@docker-robot
Copy link

docker-robot bot commented Nov 1, 2024

Closed issues are locked after 30 days of inactivity.
This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

/lifecycle locked

@docker-robot docker-robot bot locked and limited conversation to collaborators Nov 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/compose Relates to docker-compose.yml spec or docker-compose binary lifecycle/locked status/triage Needs triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants