Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

volcano scheduler crash when pv is not created #2481

Closed
jinzhejz opened this issue Sep 1, 2022 · 0 comments
Closed

volcano scheduler crash when pv is not created #2481

jinzhejz opened this issue Sep 1, 2022 · 0 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@jinzhejz
Copy link
Contributor

jinzhejz commented Sep 1, 2022

What happened:
volcano scheduer crash

E0901 08:33:36.243987       1 cache.go:995] task default/zjin-lazy-0 bind Volumes failed: &fmt.wrapError{msg:"binding volumes: timed out waiting for the condition", err:(*errors.errorString)(0xc0001991c0)}
I0901 08:33:36.244016       1 cache.go:299] Revert assumed volumes for task default/zjin-lazy-0 on node zjin-vm
E0901 08:33:36.244110       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 156 [running]:
volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1a08e00, 0x2ff5020})
        /home/zjin/gopath/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x7d
volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xa})
        /home/zjin/gopath/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x75
panic({0x1a08e00, 0x2ff5020})
        /home/zjin/go/1.17.6/src/runtime/panic.go:1038 +0x215
volcano.sh/volcano/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding.(*volumeBinder).BindPodVolumes(0xc000446210, 0xc000180800, 0x0)
        /home/zjin/gopath/src/volcano.sh/volcano/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding/binder.go:453 +0x25c
volcano.sh/volcano/pkg/scheduler/cache.(*defaultVolumeBinder).BindVolumes(0xc00042e360, 0x1b50440, 0xc0002280a0)
        /home/zjin/gopath/src/volcano.sh/volcano/pkg/scheduler/cache/cache.go:338 +0x3c
volcano.sh/volcano/pkg/scheduler/cache.(*SchedulerCache).BindTask(0xc00050a000)
        /home/zjin/gopath/src/volcano.sh/volcano/pkg/scheduler/cache/cache.go:994 +0x122
volcano.sh/volcano/pkg/scheduler/cache.(*SchedulerCache).processBindTask(0xc00050a000)
        /home/zjin/gopath/src/volcano.sh/volcano/pkg/scheduler/cache/cache.go:988 +0x14e
volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc000502f00)
        /home/zjin/gopath/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x67
volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0, {0x1eedfe0, 0xc00067e270}, 0x1, 0xc000434c60)
        /home/zjin/gopath/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xb6
volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x1312d00, 0x0, 0x0, 0x0)
        /home/zjin/gopath/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x89
volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0x0, 0x0, 0x0)
        /home/zjin/gopath/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x25
created by volcano.sh/volcano/pkg/scheduler/cache.(*SchedulerCache).Run
        /home/zjin/gopath/src/volcano.sh/volcano/pkg/scheduler/cache/cache.go:656 +0x25c
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x171179c]

goroutine 156 [running]:
volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xa})
        /home/zjin/gopath/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0xd8
panic({0x1a08e00, 0x2ff5020})
        /home/zjin/go/1.17.6/src/runtime/panic.go:1038 +0x215
volcano.sh/volcano/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding.(*volumeBinder).BindPodVolumes(0xc000446210, 0xc000180800, 0x0)
        /home/zjin/gopath/src/volcano.sh/volcano/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding/binder.go:453 +0x25c
volcano.sh/volcano/pkg/scheduler/cache.(*defaultVolumeBinder).BindVolumes(0xc00042e360, 0x1b50440, 0xc0002280a0)
        /home/zjin/gopath/src/volcano.sh/volcano/pkg/scheduler/cache/cache.go:338 +0x3c
volcano.sh/volcano/pkg/scheduler/cache.(*SchedulerCache).BindTask(0xc00050a000)
        /home/zjin/gopath/src/volcano.sh/volcano/pkg/scheduler/cache/cache.go:994 +0x122
volcano.sh/volcano/pkg/scheduler/cache.(*SchedulerCache).processBindTask(0xc00050a000)
        /home/zjin/gopath/src/volcano.sh/volcano/pkg/scheduler/cache/cache.go:988 +0x14e
volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc000502f00)
        /home/zjin/gopath/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x67
volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0, {0x1eedfe0, 0xc00067e270}, 0x1, 0xc000434c60)
        /home/zjin/gopath/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xb6
volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x1312d00, 0x0, 0x0, 0x0)
        /home/zjin/gopath/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x89
volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0x0, 0x0, 0x0)
        /home/zjin/gopath/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x25
created by volcano.sh/volcano/pkg/scheduler/cache.(*SchedulerCache).Run
        /home/zjin/gopath/src/volcano.sh/volcano/pkg/scheduler/cache/cache.go:656 +0x25c
2022/09/01 08:33:36 maxprocs: Leaving GOMAXPROCS=6: CPU quota undefined

What you expected to happen:
volcano scheduler does not crash

How to reproduce it (as minimally and precisely as possible):

  1. create a storageclasses
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.beta.kubernetes.io/is-default-class: "true"
  name: csi-disk-lazy-provisoner
parameters:
  csi.storage.k8s.io/csi-driver-name: disk.csi.everest.io
  csi.storage.k8s.io/fstype: ext4
  everest.io/disk-volume-type: SATA
  everest.io/passthrough: "true"
provisioner: everest-csi-provisioner
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
$ kubectl get storageclasses.storage.k8s.io 
NAME                                 PROVISIONER               RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
csi-disk-lazy-provisoner (default)   everest-csi-provisioner   Delete          WaitForFirstConsumer   true                   5h11m
  1. submit a statefulset which will use storageclasses csi-disk-lazy-provisoner
kind: StatefulSet
apiVersion: apps/v1
metadata:
  name: zjin-lazy
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: zjin-lazy
  template:
    metadata:
      labels:
        app: zjin-lazy
      annotations:
        metrics.alpha.kubernetes.io/custom-endpoints: '[{"api":"","path":"","port":"","names":""}]'
        pod.alpha.kubernetes.io/initialized: 'true'
    spec:
      containers:
        - name: container-0
          image: centos:7
          command:
          - sleep
          - "36000"
          resources:
            limits:
              cpu: 200m
            requests:
              cpu: 200m
          volumeMounts:
            - name: pvc-random
              mountPath: /aaa
          imagePullPolicy: IfNotPresent
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      dnsPolicy: ClusterFirst
      securityContext: {}
      schedulerName: volcano
  volumeClaimTemplates:
    - kind: PersistentVolumeClaim
      apiVersion: v1
      metadata:
        name: pvc-random
        namespace: default
        annotations:
          everest.io/disk-volume-type: SATA
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: csi-disk-lazy-provisoner
        volumeMode: Filesystem
      status:
        phase: Pending
  serviceName: zjin-lazy-headless
  podManagementPolicy: OrderedReady
  updateStrategy:
    type: RollingUpdate
  revisionHistoryLimit: 10
  1. make sure the pv is not automatically created
$ kubectl get pvc
NAME                     STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS               AGE
pvc-random-zjin-lazy-0   Pending                                      csi-disk-lazy-provisoner   24m

$ kubectl get pv
No resources found
  1. volcano sheduler crashed

Anything else we need to know?:

Environment:

  • Volcano Version: 1.6.0
  • Kubernetes version (use kubectl version): v1.21.0
  • Cloud provider or hardware configuration: N/A
  • OS (e.g. from /etc/os-release): centos 7
  • Kernel (e.g. uname -a): Linux zjin-vm 3.10.0-1062.el7.x86_64 Rename hpw.cloud keyword to volcano.sh #1 SMP Wed Aug 7 18:08:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: N/A
  • Others: N/A
@jinzhejz jinzhejz added the kind/bug Categorizes issue or PR as related to a bug. label Sep 1, 2022
@jinzhejz jinzhejz closed this as completed Sep 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant