Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Wrong resource usage in UI #4602

Open
k1ngcyk opened this issue Jun 3, 2020 · 7 comments
Open

Wrong resource usage in UI #4602

k1ngcyk opened this issue Jun 3, 2020 · 7 comments

Comments

@k1ngcyk
Copy link

k1ngcyk commented Jun 3, 2020

Organization Name: EnjoyMusic Technology

Short summary about the issue/question:

Even there is only one running job with config 4 CPU core, 8 GB memory, it will result in half usage in UI. The worker has 36 vCore and 128 GB memory.

Screen Shot 2020-06-03 at 4 20 32 PM

Brief what process you are following:

How to reproduce it:

OpenPAI Environment:

  • OpenPAI version: v1.0.0
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Hardware (e.g. core number, memory size, storage size, GPU type etc.):
  • Others:

Anything else we need to know:

@Binyang2014
Copy link
Contributor

@k1ngcyk seems the job use 1 GPU, and your bed only has 2 GPUs. The resource usage is max(usedGpu/totalGPU, usedMem/totolMem, usedCPU/Total/CPU). So it shows 50%

@yqwang-ms
Copy link
Member

I assume your job requests 1 GPU.
PAI using HiveD scheduler by default, and HiveD only schedules cells by which all resource dimensions proportionally increased or decreased.
See more in https://github.com/microsoft/hivedscheduler/blob/master/doc/user-manual.md#config-quickstart

So in your VC setup, 1 GPU corresponding to 1 cell, i.e. (1GPU, 17CPU, 63G Mem).
So (1GPU, 17CPU, 63G Mem) resource is allocated instead of just (1GPU, 4CPU, 8G Mem).
And, if your job requests 2 GPU, you will requested 2 cells, i.e. 2 * (1GPU, 17CPU, 63G Mem) will be allocated.

We will refine below WebUI or Job Submission Page (user can only specify cells or skus instead of resources in each dimensions) to make it eaiser to understand,.

image

@fanyangCS
Copy link
Contributor

fanyangCS commented Jul 9, 2020

relate to #4601. Will use sku to improve the clarity of error message.

@fanyangCS
Copy link
Contributor

related to #3273.

@fanyangCS
Copy link
Contributor

relate to #3966

@scarlett2018
Copy link
Member

@fanyangCS @abuccts - is this a bug? has it been fixed by related issues mentioned above? if not, may we plan it in Aug release? thanks.

@fanyangCS
Copy link
Contributor

@fanyangCS @abuccts - is this a bug? has it been fixed by related issues mentioned above? if not, may we plan it in Aug release? thanks.

Yes. it is planed in Aug release.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants