Statistics of other types of gpu #2484

zjj2wry · 2024-10-31T13:17:50Z

Search before asking

I had searched in the issues and found no similar feature requirement.

Description

kuberay/ray-operator/controllers/ray/raycluster_controller.go

Line 1623 in 33ba385

if strings.HasSuffix(string(key), "gpu") && !val.IsZero() {

if use aliyun k8s gpu share, gpu key is aliyun.com/gpu-mem

    workerGroupSpecs:
            resources:
              limits:
                aliyun.com/gpu-mem: "1"
                cpu: "1"
                memory: 2Gi
              requests:
                aliyun.com/gpu-mem: "1"
                cpu: "1"
                memory: 2Gi

autoscaler will not work when request gpu resource

(autoscaler +3m13s) Error: No available node types can fulfill resource request {'GPU': 1.0, 'CPU': 1.0}. Add suitable node types to this cluster to resolve this issue.

code:

import ray
import time

ray.init()

@ray.remote(num_gpus=1) 
def gpu_task():
    import torch
    x = torch.rand(10000, 10000).cuda()  
    y = torch.mm(x, x) 
    return y.sum().item()

future = gpu_task.remote()
result = ray.get(future)

print("Result:", result)

ray.shutdown()

Use case

No response

Related issues

none

Are you willing to submit a PR?

Yes I am willing to submit a PR!

The text was updated successfully, but these errors were encountered:

win5923 · 2024-10-31T15:21:47Z

Perhaps using strings.Contains could be a better way.

zjj2wry · 2024-11-04T03:58:39Z

https://github.com/ray-project/ray/blob/ba41ae99097c30cac2dd62e263bbe0b7b9bffc95/python/ray/autoscaler/_private/kuberay/autoscaling_config.py#L346-L351

By setting num-gpus, i can solve the problem that the gpu will not automatically expand. desireGPU is just for display purposes.

andrewsykim · 2024-11-04T18:10:11Z

I suggest adding these to I suggest to add these in the list of well known accelerators instread: https://github.com/ray-project/kuberay/blob/master/ray-operator/controllers/ray/common/pod.go#L41-L43 instead of using regex to parse GPU counts

zjj2wry added enhancement New feature or request triage labels Oct 31, 2024

zjj2wry changed the title ~~[Feature] autoscaler support custom gpu key~~ Statistics of other types of gpu Nov 4, 2024

zjj2wry linked a pull request Nov 4, 2024 that will close this issue

Supports sum the GPU counts of Aliyun and Volcano #2490

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Statistics of other types of gpu #2484

Statistics of other types of gpu #2484

zjj2wry commented Oct 31, 2024

win5923 commented Oct 31, 2024

zjj2wry commented Nov 4, 2024

andrewsykim commented Nov 4, 2024

Statistics of other types of gpu #2484

Statistics of other types of gpu #2484

Comments

zjj2wry commented Oct 31, 2024

Search before asking

Description

Use case

Related issues

Are you willing to submit a PR?

win5923 commented Oct 31, 2024

zjj2wry commented Nov 4, 2024

andrewsykim commented Nov 4, 2024