storaged pod pending when restore cluster in GKE #491

MegaByte875 · 2024-04-03T07:07:01Z

Please check the FAQ documentation before raising an issue

Describe the bug (required)

After the nodes scaled ready, the pending pod didn't scheduled to the new nodes.

Your Environments (required)

OS: uname -a
Commit id (e.g. a3ffc7d8)

How To Reproduce(required)

Steps to reproduce the behavior:

Step 1
Step 2
Step 3

Expected behavior
The pending storaged pod scheduled to new nodes.

Additional context

The text was updated successfully, but these errors were encountered:

jinyingsunny · 2024-04-05T13:25:20Z

这个storaged0的没有下载数据，是因为其他节点上pending，没都ready，所以没下发拉取数据的命令；
但是当其他节点后续ready后，也没有触发拉取数据的命令，是手动删除 storaged-0后重新触发的

MegaByte875 · 2024-04-07T06:58:09Z

 --pod-max-in-unschedulable-pods-duration duration                                                                                                                                                   
  DEPRECATED: the maximum time a pod can stay in unschedulablePods. If a pod stays in unschedulablePods for longer than this value, the pod will be moved from unschedulablePods to backoffQ or
                activeQ. This flag is deprecated and will be removed in 1.26 (default 5m0s)

MegaByte875 · 2024-04-07T09:11:20Z

The normal events:

Events:
  Type     Reason            Age   From                Message
  ----     ------            ----  ----                -------
  Warning  FailedScheduling  10s   nebula-scheduler    0/11 nodes are available: 4 Insufficient cpu, 7 node(s) didn't match Pod's node affinity/selector. preemption: 0/11 nodes are available: 4 No preemption victims found for incoming pod, 7 Preemption is not helpful for scheduling..
  Normal   TriggeredScaleUp  7s    cluster-autoscaler  pod triggered scale-up: [{https://www.googleapis.com/compute/v1/projects/vesoft-public-405209/zones/us-central1-c/instanceGroups/gke-snap-test-sk1-4c81a7ee-grp 1->2 (max: 2)}]

The failed events:

Events:
  Type     Reason            Age                    From                Message
  ----     ------            ----                   ----                -------
  Warning  FailedScheduling  4m19s                  nebula-scheduler    0/11 nodes are available: 4 Insufficient cpu, 7 node(s) didn't match Pod's node affinity/selector. preemption: 0/11 nodes are available: 4 No preemption victims found for incoming pod, 7 Preemption is not helpful for scheduling..
  Warning  FailedScheduling  3m42s                  nebula-scheduler    0/12 nodes are available: 1 node(s) didn't match the requested topology zone, 4 Insufficient cpu, 7 node(s) didn't match Pod's node affinity/selector. preemption: 0/12 nodes are available: 4 No preemption victims found for incoming pod, 8 Preemption is not helpful for scheduling..
  Warning  FailedScheduling  3m29s (x2 over 3m38s)  nebula-scheduler    0/12 nodes are available: 1 node(s) didn't match the requested topology zone, 4 Insufficient cpu, 7 node(s) didn't match Pod's node affinity/selector. preemption: 0/12 nodes are available: 4 No preemption victims found for incoming pod, 8 Preemption is not helpful for scheduling..
  Normal   TriggeredScaleUp  4m16s                  cluster-autoscaler  pod triggered scale-up: [{https://www.googleapis.com/compute/v1/projects/vesoft-public-405209/zones/us-central1-c/instanceGroups/gke-snap-test-sk1-4c81a7ee-grp 1->2 (max: 2)}]

The scheduler logs:

I0407 09:17:28.638056       1 scheduling_queue.go:798] "Pod moved to an internal scheduling queue" pod="default/nebula-storaged-4" event={"Resource":"*","ActionType":63,"Label":"UnschedulableTimeout"} queue="Active"
I0407 09:17:28.638114       1 scheduling_queue.go:1137] "About to try and schedule pod" pod="default/nebula-storaged-4"
I0407 09:17:28.638123       1 schedule_one.go:80] "Attempting to schedule pod" pod="default/nebula-storaged-4"
I0407 09:17:28.638196       1 binder.go:796] "PVC is not bound" PVC="default/storaged-data-nebula-storaged-4"
I0407 09:17:28.638498       1 csi.go:269] "Persistent volume had no name for claim" PVC="default/storaged-data-nebula-storaged-4"
I0407 09:17:28.638551       1 binder.go:281] "FindPodVolumes" pod="default/nebula-storaged-4" node="gke-snap-test-sk1-4c81a7ee-6pg4"
I0407 09:17:28.638569       1 binder.go:917] "No matching volumes for pod" pod="default/nebula-storaged-4" PVC="default/storaged-data-nebula-storaged-4" node="gke-snap-test-sk1-4c81a7ee-6pg4"
I0407 09:17:28.638599       1 binder.go:980] "Provisioning for claims of pod that has no matching volumes..." claimCount=1 pod="default/nebula-storaged-4" node="gke-snap-test-sk1-4c81a7ee-6pg4"
I0407 09:17:28.638627       1 node_zone.go:218] Available topology zones: [us-central1-b us-central1-c us-central1-f]
I0407 09:17:28.638658       1 node_zone.go:245] Anchor pod nebula-storaged-3 zone us-central1-f shift 2
I0407 09:17:28.638669       1 node_zone.go:248] Pod [default/nebula-storaged-4] not fit node gke-snap-test-sk1-4c81a7ee-6pg4 in zone us-central1-c, ideal zone us-central1-b
I0407 09:17:28.638744       1 preemption.go:236] "Selecting candidates from a pool of nodes" potentialNodesCount=4 offset=0 sampleLength=4 sample=["gke-snap-test-sk1-b7b0d5ef-prql","gke-snap-test-sk1-4c81a7ee-c2fn","gke-snap-test-sk1-9b3451e6-wgdh","gke-snap-test-sk1-9b3451e6-6h5n"] candidates=4
I0407 09:17:28.638939       1 schedule_one.go:159] "Status after running PostFilter plugins for pod" pod="default/nebula-storaged-4" status="preemption: 0/12 nodes are available: 4 No preemption victims found for incoming pod, 8 Preemption is not helpful for scheduling."
I0407 09:17:28.638976       1 schedule_one.go:889] "Unable to schedule pod; no fit; waiting" pod="default/nebula-storaged-4" err="0/12 nodes are available: 1 node(s) didn't match the requested topology zone, 4 Insufficient cpu, 7 node(s) didn't match Pod's node affinity/selector. preemption: 0/12 nodes are available: 4 No preemption victims found for incoming pod, 8 Preemption is not helpful for scheduling.."
I0407 09:17:28.639078       1 scheduling_queue.go:531] "Pod moved to an internal scheduling queue" pod="default/nebula-storaged-4" event="ScheduleAttemptFailure" queue="Unschedulable"
I0407 09:17:28.639125       1 schedule_one.go:965] "Updating pod condition" pod="default/nebula-storaged-4" conditionType="PodScheduled" conditionStatus="False" conditionReason="Unschedulable"

MegaByte875 · 2024-04-09T02:57:44Z

#495

MegaByte875 added the type/bug Type: something is unexpected label Apr 3, 2024

MegaByte875 self-assigned this Apr 3, 2024

github-actions bot added affects/none PR/issue: this bug affects none version. severity/none Severity of bug labels Apr 3, 2024

wey-gu mentioned this issue Apr 6, 2024

Weekly Report 2024-04-05 vesoft-inc/nebula-community#433

Closed

MegaByte875 closed this as completed Apr 10, 2024

github-actions bot added the process/fixed Process of bug label Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storaged pod pending when restore cluster in GKE #491

storaged pod pending when restore cluster in GKE #491

MegaByte875 commented Apr 3, 2024

jinyingsunny commented Apr 5, 2024

MegaByte875 commented Apr 7, 2024

MegaByte875 commented Apr 7, 2024 •

edited

Loading

MegaByte875 commented Apr 9, 2024

storaged pod pending when restore cluster in GKE #491

storaged pod pending when restore cluster in GKE #491

Comments

MegaByte875 commented Apr 3, 2024

jinyingsunny commented Apr 5, 2024

MegaByte875 commented Apr 7, 2024

MegaByte875 commented Apr 7, 2024 • edited Loading

MegaByte875 commented Apr 9, 2024

MegaByte875 commented Apr 7, 2024 •

edited

Loading