-
Notifications
You must be signed in to change notification settings - Fork 963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nodegroup plugin design doc #2227
nodegroup plugin design doc #2227
Conversation
- name: drf | ||
- name: predicates | ||
- name: proportion | ||
- name: nodegroup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it's a common scenaro. Let's discuss to enhance queue to support it insted of a plugin.
@qiankunli 这边看了下设计,提出一点我这边看法,看看能不能考虑进去。 从租户资源视角: 租户hard资源: 即为 labeled标签资源, nlp的任务在: 1、集群资源相对空闲的情况下: 即queue=nlp的集群中跑了非nlp的task,此时如果集群资源允许(有空闲), nlp的task 可以调度到其他节点 2、集群资源紧张情况下: 即queue=nlp的集群中跑了非nlp的 task,但是,集群又没有资源分发新来的nlp task[同时 queue的nlp总task资源没有超过其labeled资源],此时对queue-nlp中的非nlp任务进行驱逐或者停止。 每个租户可以都遵从上述的基准策略 租户的soft资源: 超出其标定节点总量的任务,均为 弹性的soft资源, soft资源根据实际运行情况,运用一定的打分算法允许被抢占 ,驱逐,停止 。 比较坏的情况,每个queue均完整跑满各自的资源;大家其乐融融,互不亏欠 最坏的情况,大家业务时间都很重合,资源完全无法复用(这个就可以,对queue设置一些reserve机制) 从资源本身角度: 之所以分上述的租户,本质上还是由于资源本身之间的有别性, 你的资源,我的资源,volcano调度当前对资源的 表述维度,暂且 不能表述资源本身的一些属性, cpu型号,gpu卡型号,这些根据不同用户可能有不同的策略,所以根据类似类似nodegroup这种标签,甚至多重标签 ,我们需要基于它们来做一些特定场景调度 |
how can we describe the relationship between queue and nodegroup in queue.spec? example1, we can add v1.Affinity(in k8s core api) in
example2, we can add a label(such as the
|
5.20社区讨论意见: queue 和nodegroup 亲和性机制的引入可能会带来风险: 一个job 按照queue 的规则可以运行,但是根据nodegroup亲和性规则找不到可以运行的节点,一直处于pending 状态。本质就是 很容易queue 与 nodegroup 的资源配置不一致。 首先,“特定任务运行在特定类型的节点上” 这个需求是确实存在的,如果没有nodegroup机制,实际用户会使用污点等机制,也会有这个问题。 为了缓解这个问题,有两个办法
“queue 和nodegroup 亲和性配置” 作为queue 的配置 第二种方案可读性更好些,也保留了nodegroup 概念,因此使用第二种方案。 |
6c80e04
to
eb7ea2b
Compare
Is
The user/admin should make sure the configuration is reasonable, what we can do is to report the related pending reason. |
``` | ||
|
||
risk: The resources of the queue can not be too different from the resources of the node-group, otherwise it is easy that task can be scheduled to run from the perspective of the queue, but cannot find a suitable node. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clariy the following thing in the design doc
- Queue API change:
- Command line:
both kubectl and vcctl support display nodegroup info in queue querying via CLI ?
eb7ea2b
to
0704824
Compare
Signed-off-by: qiankunli <[email protected]> update design Signed-off-by: qiankunli <[email protected]> fix doc Signed-off-by: qiankunli <[email protected]> fix doc Signed-off-by: qiankunli <[email protected]>
fa0d57e
to
1840715
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: william-wang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
design doc for #2224
related issue #1830