Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Assign replicas to GPUs #11863

Open
kheyer opened this issue May 30, 2024 · 3 comments
Open

Feature Request: Assign replicas to GPUs #11863

kheyer opened this issue May 30, 2024 · 3 comments

Comments

@kheyer
Copy link

kheyer commented May 30, 2024

Description

I have a service that runs on a single GPU. If I have multiple GPUs available, I would like to create one replica of this service for each GPU available.

Currently, I can only do this by explicitly creating the service multiple times in the docker-compose file and changing the device_ids section of the resources

  resources:
    reservations:
      devices:
      - driver: "nvidia"
        device_ids: ["0"]
        capabilities: [gpu]

Having to create 8 near-identical replicas of the same service in the config file is unwieldy.

I would like to specify the service once and set the replicas with

replicas: ${GPU_COUNT}

However this requires some way for each replica to know what GPU to use. My understanding is currently there is no way to do this, and there isn't a good way to go off a "replica index" for each container (see #9153).

Some method of mapping replicas of a service to different GPUs would be helpful.

@kheyer kheyer changed the title Assign replicas to GPUs Feature Request: Assign replicas to GPUs May 30, 2024
@ndeloof
Copy link
Contributor

ndeloof commented May 30, 2024

In an ideal world the engine (or nvidia driver) would manage this as a pool of resources, just like you can declare a service to bind port with a range, and let the engine select available port in the range, so that scaling is not an issue.

About #9153, I would not be comfortable we rely on container number used to index replicas. While we try to make this somehow sequential there are many corner cases and no guarantee you would always get value within the [1..GPU_COUNT] interval

@mvonpohle
Copy link

Currently this is the best solution I've found solely using the docker compose file and not different environment variables in each container. But something akin to the way port ranges are assigned would be nice.


services:
  # using yaml fragment solution for multiple GPU selection at docker level - https://docs.docker.com/reference/compose-file/fragments/
  thingy-1: &default-service
    image: ubuntu
    build: .
    restart: always
    command: [ "nvidia-smi" ]
    # ports:
    #   - "8000-8004:8000"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["0"]
              capabilities: [gpu]
  thingy-2: 
    <<: *default-service
    deploy:
      resources:
        reservations:
          devices:
              - driver: nvidia
                device_ids: ["1"]
                capabilities: [gpu]
  thingy-3: 
    <<: *default-service
    deploy:
      resources:
        reservations:
          devices:
              - driver: nvidia
                device_ids: ["2"]
                capabilities: [gpu]

@ndeloof
Copy link
Contributor

ndeloof commented Oct 8, 2024

I wonder you could rely on CDI (#12184) so that you don't need to explicitly assign a device ID to a container, but can request a device by CDI specification and let engine assign actual device matching request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants