nodegroup plugin design doc #2227

qiankunli · 2022-05-12T07:23:01Z

design doc for #2224
related issue #1830

william-wang · 2022-05-13T08:27:30Z

docs/design/node-group.md

+  - name: drf
+  - name: predicates
+  - name: proportion
+  - name: nodegroup


As it's a common scenaro. Let's discuss to enhance queue to support it insted of a plugin.

whybeyoung · 2022-05-13T14:36:36Z

@qiankunli 这边看了下设计，提出一点我这边看法，看看能不能考虑进去。

从租户资源视角:

租户hard资源: 即为 labeled标签资源， nlp的任务在：

1、集群资源相对空闲的情况下: 即queue=nlp的集群中跑了非nlp的task，此时如果集群资源允许(有空闲), nlp的task 可以调度到其他节点

2、集群资源紧张情况下: 即queue=nlp的集群中跑了非nlp的 task，但是，集群又没有资源分发新来的nlp task[同时 queue的nlp总task资源没有超过其labeled资源]，此时对queue-nlp中的非nlp任务进行驱逐或者停止。

每个租户可以都遵从上述的基准策略

租户的soft资源: 超出其标定节点总量的任务，均为弹性的soft资源， soft资源根据实际运行情况，运用一定的打分算法允许被抢占，驱逐，停止。

比较坏的情况，每个queue均完整跑满各自的资源；大家其乐融融，互不亏欠
比较好的情况，大家分时复用，可取所需，感觉集群永远满足自己的刚性诉求。

最坏的情况，大家业务时间都很重合，资源完全无法复用(这个就可以，对queue设置一些reserve机制)
相对坏的情况，我的软资源任务且耗时任务快跑完了，被别人干掉了 [这时候训练场景需要引入checkpoint机制\断点恢复训练]

从资源本身角度：

之所以分上述的租户，本质上还是由于资源本身之间的有别性，你的资源，我的资源，volcano调度当前对资源的表述维度，暂且不能表述资源本身的一些属性， cpu型号，gpu卡型号，这些根据不同用户可能有不同的策略，所以根据类似类似nodegroup这种标签，甚至多重标签，我们需要基于它们来做一些特定场景调度

qiankunli · 2022-05-15T13:04:36Z

how can we describe the relationship between queue and nodegroup in queue.spec?

example1, we can add v1.Affinity(in k8s core api) in queue.spec directly, it also means that we describe the relationship between queue and node(not nodegroup)

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: default
spec:
  guarantee: {}
  reclaimable: true
  weight: 1
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: labelName
                operator: In
                values:
                  - labelValue
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          preference:
            matchExpressions:
              - key: labelName
                operator: NotIn
                values:
                  - labelValue

example2, we can add a label(such as the volcano.sh/node-group) on node, and describe the relationship between queue and nodegroup simply.

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: default
spec:
  guarantee: {}
  reclaimable: true
  weight: 1
  affinity:
     nodeGroupAffinity:
        required:
        - groupname1
        - gropuname2
       preferred
       - groupname1

qiankunli · 2022-05-20T10:47:43Z

5.20社区讨论意见：

queue 和nodegroup 亲和性机制的引入可能会带来风险：一个job 按照queue 的规则可以运行，但是根据nodegroup亲和性规则找不到可以运行的节点，一直处于pending 状态。本质就是很容易queue 与 nodegroup 的资源配置不一致。

首先，“特定任务运行在特定类型的节点上” 这个需求是确实存在的，如果没有nodegroup机制，实际用户会使用污点等机制，也会有这个问题。

为了缓解这个问题，有两个办法

“queue 和nodegroup 亲和性配置” 可以作为nodegroup plugin的参数，用这个plugin 的人承担风险，不用这个plugin 的人可以不考虑这个事儿，缺点是：配置复杂，用户也无法通过kubectl 等查看。所以还是倾向于将亲和性配置作为queue的一个属性。
我们可以根据 “queue 和nodegroup 亲和性配置” 计算一个queue 可用的资源上限，使得queue.capacity = min(用户手工配置的capacity, 可用的nodegroup资源上限)。但这个可能使得 proportion plugin代码过于复杂，我们暂时先搁置这个风险，由使用这个机制的人来承担。看后续需求的演化。

“queue 和nodegroup 亲和性配置” 作为queue 的配置第二种方案可读性更好些，也保留了nodegroup 概念，因此使用第二种方案。

k82cn · 2022-05-23T01:24:22Z

Is queue vs. nodeGroup 1:1 ? If not, how to balance resources, .e.g. how to preempt/reclaim, how to allocate resources cross nodeGroup?

queue 和nodegroup 亲和性机制的引入可能会带来风险：一个job 按照queue 的规则可以运行，但是根据nodegroup亲和性规则找不到可以运行的节点，一直处于pending 状态。本质就是很容易queue 与 nodegroup 的资源配置不一致。

The user/admin should make sure the configuration is reasonable, what we can do is to report the related pending reason.

william-wang · 2022-06-24T07:02:02Z

docs/design/node-group.md

+```
+
+risk: The resources of the queue can not be too different from the resources of the node-group, otherwise it is easy that task can be scheduled to run from the perspective of the queue, but cannot find a suitable node.
+


Please clariy the following thing in the design doc

Queue API change:

Command line:
both kubectl and vcctl support display nodegroup info in queue querying via CLI ?

Signed-off-by: qiankunli <[email protected]> update design Signed-off-by: qiankunli <[email protected]> fix doc Signed-off-by: qiankunli <[email protected]> fix doc Signed-off-by: qiankunli <[email protected]>

william-wang

/lgtm

volcano-sh-bot · 2022-06-28T02:09:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: william-wang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [william-wang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

volcano-sh-bot added the retest-not-required-docs-only label May 12, 2022

volcano-sh-bot requested review from k82cn and shinytang6 May 12, 2022 07:23

volcano-sh-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 12, 2022

william-wang reviewed May 13, 2022

View reviewed changes

qiankunli mentioned this pull request May 22, 2022

add nodeGroupAffinity in queue.spec volcano-sh/apis#72

Merged

qiankunli force-pushed the feature/node-group-doc branch from 6c80e04 to eb7ea2b Compare May 22, 2022 12:33

hwdef mentioned this pull request Jun 1, 2022

Unexpected Behaviour for AddJobEnqueueableFn #2252

Closed

william-wang reviewed Jun 24, 2022

View reviewed changes

qiankunli force-pushed the feature/node-group-doc branch from eb7ea2b to 0704824 Compare June 27, 2022 02:20

add node group design doc

1840715

Signed-off-by: qiankunli <[email protected]> update design Signed-off-by: qiankunli <[email protected]> fix doc Signed-off-by: qiankunli <[email protected]> fix doc Signed-off-by: qiankunli <[email protected]>

qiankunli force-pushed the feature/node-group-doc branch from fa0d57e to 1840715 Compare June 27, 2022 02:54

william-wang approved these changes Jun 28, 2022

View reviewed changes

volcano-sh-bot assigned william-wang Jun 28, 2022

volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Jun 28, 2022

volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 28, 2022

volcano-sh-bot merged commit cb39985 into volcano-sh:master Jun 28, 2022

william-wang mentioned this pull request Jun 29, 2022

support maping queue to node group #1830

Closed

3 tasks

This was referenced Jul 13, 2022

add details for nodegroup doc #2347

Merged

support nodegroup as a plugin #2352

Closed

wuyueandrew mentioned this pull request Sep 6, 2023

framework.go:40] Failed to get plugin nodegroup. #3091

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nodegroup plugin design doc #2227

nodegroup plugin design doc #2227

qiankunli commented May 12, 2022 •

edited

Loading

william-wang May 13, 2022

whybeyoung commented May 13, 2022 •

edited

Loading

qiankunli commented May 15, 2022 •

edited

Loading

qiankunli commented May 20, 2022 •

edited

Loading

k82cn commented May 23, 2022

william-wang Jun 24, 2022

william-wang left a comment

volcano-sh-bot commented Jun 28, 2022

		```

		risk: The resources of the queue can not be too different from the resources of the node-group, otherwise it is easy that task can be scheduled to run from the perspective of the queue, but cannot find a suitable node.

nodegroup plugin design doc #2227

nodegroup plugin design doc #2227

Conversation

qiankunli commented May 12, 2022 • edited Loading

william-wang May 13, 2022

Choose a reason for hiding this comment

whybeyoung commented May 13, 2022 • edited Loading

qiankunli commented May 15, 2022 • edited Loading

qiankunli commented May 20, 2022 • edited Loading

k82cn commented May 23, 2022

william-wang Jun 24, 2022

Choose a reason for hiding this comment

william-wang left a comment

Choose a reason for hiding this comment

volcano-sh-bot commented Jun 28, 2022

qiankunli commented May 12, 2022 •

edited

Loading

whybeyoung commented May 13, 2022 •

edited

Loading

qiankunli commented May 15, 2022 •

edited

Loading

qiankunli commented May 20, 2022 •

edited

Loading