Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job in two queue will reclaim each other's tasks in dead loop #3729

Closed
lowang-bh opened this issue Sep 14, 2024 · 1 comment · Fixed by #3696
Closed

job in two queue will reclaim each other's tasks in dead loop #3729

lowang-bh opened this issue Sep 14, 2024 · 1 comment · Fixed by #3696
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@lowang-bh
Copy link
Member

Description

With a cluster has 11C CPU, Queue-a has a deserved=5C and capability=10C, same as queue-b.
First create job-a with replicas=5, and minAvailable=2. job a will take 10C.
Then create a job-b same as job-a, it will reclaim and evict job-a's two tasks. But now queue-a used is less than deserved, it will also reclaim from queue-b, and so on.

image image image

Steps to reproduce the issue

with scheduler cm

apiVersion: v1
data:
  volcano-scheduler.conf: |
    actions: "enqueue, allocate, reclaim, backfill"
    tiers:
    - plugins:
      - name: priority
      - name: gang
        enablePreemptable: false
      - name: conformance
    - plugins:
      - name: overcommit
      - name: drf
        enablePreemptable: false
      - name: predicates
      - name: capacity
      - name: nodeorder
      - name: binpack
kind: ConfigMap
  1. apply queue-a, queue-b with yaml
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: queue-a
spec:
  reclaimable: true
  deserved:
    cpu: 5
    memory: 2Gi
  capability:          
    cpu: 10
    memory: 5Gi
---
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: queue-b
spec:
  reclaimable: true
  deserved:
    cpu: 5
    memory: 2Gi
  capability:          
    cpu: 10
    memory: 5Gi
  1. apply job-a.yaml
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: job-a
spec:
  schedulerName: volcano
  queue: queue-a
  tasks:
    - replicas: 5
      minAvailable: 2
      name: "master"
      template:
        metadata:
          annotations:
            volcano.sh/preemptable: "true"
        spec:
          containers:
            - image: nginx:1.14.2
              name: nginx
              resources:
                requests:
                  cpu: "2"
                  memory: "50Mi"
          restartPolicy: OnFailure
  1. apply job-b.yaml
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: job-b
spec:
  schedulerName: volcano
  queue: queue-b
  tasks:
    - replicas: 5
      minAvailable: 2
      name: "worker"
      template:
        metadata:
          annotations:
            volcano.sh/preemptable: "true"
        spec:
          containers:
            - image: nginx:1.14.2
              name: nginx
              resources:
                requests:
                  cpu: "2"
                  memory: "50Mi"
          restartPolicy: OnFailure

Describe the results you received and expected

After upgrade image to #3696, it keeps stable.

image

What version of Volcano are you using?

master

Any other relevant information

master branch at 95d5a92

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants