Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storaged pod pending when restore cluster in GKE #491

Closed
MegaByte875 opened this issue Apr 3, 2024 · 4 comments
Closed

storaged pod pending when restore cluster in GKE #491

MegaByte875 opened this issue Apr 3, 2024 · 4 comments
Assignees
Labels
affects/none PR/issue: this bug affects none version. process/fixed Process of bug severity/none Severity of bug type/bug Type: something is unexpected

Comments

@MegaByte875
Copy link
Contributor

Please check the FAQ documentation before raising an issue

Describe the bug (required)
image
image
After the nodes scaled ready, the pending pod didn't scheduled to the new nodes.

Your Environments (required)

  • OS: uname -a
  • Commit id (e.g. a3ffc7d8)

How To Reproduce(required)

Steps to reproduce the behavior:

  1. Step 1
  2. Step 2
  3. Step 3

Expected behavior
The pending storaged pod scheduled to new nodes.

Additional context

@MegaByte875 MegaByte875 added the type/bug Type: something is unexpected label Apr 3, 2024
@MegaByte875 MegaByte875 self-assigned this Apr 3, 2024
@github-actions github-actions bot added affects/none PR/issue: this bug affects none version. severity/none Severity of bug labels Apr 3, 2024
@jinyingsunny
Copy link

这个storaged0的没有下载数据,是因为其他节点上pending,没都ready,所以没下发拉取数据 的命令;
但是当其他节点后续ready后,也没有触发拉取数据的命令,是手动删除 storaged-0后重新触发的

@MegaByte875
Copy link
Contributor Author

image
 --pod-max-in-unschedulable-pods-duration duration                                                                                                                                                   
  DEPRECATED: the maximum time a pod can stay in unschedulablePods. If a pod stays in unschedulablePods for longer than this value, the pod will be moved from unschedulablePods to backoffQ or
                activeQ. This flag is deprecated and will be removed in 1.26 (default 5m0s)

@MegaByte875
Copy link
Contributor Author

MegaByte875 commented Apr 7, 2024

The normal events:

Events:
  Type     Reason            Age   From                Message
  ----     ------            ----  ----                -------
  Warning  FailedScheduling  10s   nebula-scheduler    0/11 nodes are available: 4 Insufficient cpu, 7 node(s) didn't match Pod's node affinity/selector. preemption: 0/11 nodes are available: 4 No preemption victims found for incoming pod, 7 Preemption is not helpful for scheduling..
  Normal   TriggeredScaleUp  7s    cluster-autoscaler  pod triggered scale-up: [{https://www.googleapis.com/compute/v1/projects/vesoft-public-405209/zones/us-central1-c/instanceGroups/gke-snap-test-sk1-4c81a7ee-grp 1->2 (max: 2)}]

The failed events:

Events:
  Type     Reason            Age                    From                Message
  ----     ------            ----                   ----                -------
  Warning  FailedScheduling  4m19s                  nebula-scheduler    0/11 nodes are available: 4 Insufficient cpu, 7 node(s) didn't match Pod's node affinity/selector. preemption: 0/11 nodes are available: 4 No preemption victims found for incoming pod, 7 Preemption is not helpful for scheduling..
  Warning  FailedScheduling  3m42s                  nebula-scheduler    0/12 nodes are available: 1 node(s) didn't match the requested topology zone, 4 Insufficient cpu, 7 node(s) didn't match Pod's node affinity/selector. preemption: 0/12 nodes are available: 4 No preemption victims found for incoming pod, 8 Preemption is not helpful for scheduling..
  Warning  FailedScheduling  3m29s (x2 over 3m38s)  nebula-scheduler    0/12 nodes are available: 1 node(s) didn't match the requested topology zone, 4 Insufficient cpu, 7 node(s) didn't match Pod's node affinity/selector. preemption: 0/12 nodes are available: 4 No preemption victims found for incoming pod, 8 Preemption is not helpful for scheduling..
  Normal   TriggeredScaleUp  4m16s                  cluster-autoscaler  pod triggered scale-up: [{https://www.googleapis.com/compute/v1/projects/vesoft-public-405209/zones/us-central1-c/instanceGroups/gke-snap-test-sk1-4c81a7ee-grp 1->2 (max: 2)}]

The scheduler logs:

I0407 09:17:28.638056       1 scheduling_queue.go:798] "Pod moved to an internal scheduling queue" pod="default/nebula-storaged-4" event={"Resource":"*","ActionType":63,"Label":"UnschedulableTimeout"} queue="Active"
I0407 09:17:28.638114       1 scheduling_queue.go:1137] "About to try and schedule pod" pod="default/nebula-storaged-4"
I0407 09:17:28.638123       1 schedule_one.go:80] "Attempting to schedule pod" pod="default/nebula-storaged-4"
I0407 09:17:28.638196       1 binder.go:796] "PVC is not bound" PVC="default/storaged-data-nebula-storaged-4"
I0407 09:17:28.638498       1 csi.go:269] "Persistent volume had no name for claim" PVC="default/storaged-data-nebula-storaged-4"
I0407 09:17:28.638551       1 binder.go:281] "FindPodVolumes" pod="default/nebula-storaged-4" node="gke-snap-test-sk1-4c81a7ee-6pg4"
I0407 09:17:28.638569       1 binder.go:917] "No matching volumes for pod" pod="default/nebula-storaged-4" PVC="default/storaged-data-nebula-storaged-4" node="gke-snap-test-sk1-4c81a7ee-6pg4"
I0407 09:17:28.638599       1 binder.go:980] "Provisioning for claims of pod that has no matching volumes..." claimCount=1 pod="default/nebula-storaged-4" node="gke-snap-test-sk1-4c81a7ee-6pg4"
I0407 09:17:28.638627       1 node_zone.go:218] Available topology zones: [us-central1-b us-central1-c us-central1-f]
I0407 09:17:28.638658       1 node_zone.go:245] Anchor pod nebula-storaged-3 zone us-central1-f shift 2
I0407 09:17:28.638669       1 node_zone.go:248] Pod [default/nebula-storaged-4] not fit node gke-snap-test-sk1-4c81a7ee-6pg4 in zone us-central1-c, ideal zone us-central1-b
I0407 09:17:28.638744       1 preemption.go:236] "Selecting candidates from a pool of nodes" potentialNodesCount=4 offset=0 sampleLength=4 sample=["gke-snap-test-sk1-b7b0d5ef-prql","gke-snap-test-sk1-4c81a7ee-c2fn","gke-snap-test-sk1-9b3451e6-wgdh","gke-snap-test-sk1-9b3451e6-6h5n"] candidates=4
I0407 09:17:28.638939       1 schedule_one.go:159] "Status after running PostFilter plugins for pod" pod="default/nebula-storaged-4" status="preemption: 0/12 nodes are available: 4 No preemption victims found for incoming pod, 8 Preemption is not helpful for scheduling."
I0407 09:17:28.638976       1 schedule_one.go:889] "Unable to schedule pod; no fit; waiting" pod="default/nebula-storaged-4" err="0/12 nodes are available: 1 node(s) didn't match the requested topology zone, 4 Insufficient cpu, 7 node(s) didn't match Pod's node affinity/selector. preemption: 0/12 nodes are available: 4 No preemption victims found for incoming pod, 8 Preemption is not helpful for scheduling.."
I0407 09:17:28.639078       1 scheduling_queue.go:531] "Pod moved to an internal scheduling queue" pod="default/nebula-storaged-4" event="ScheduleAttemptFailure" queue="Unschedulable"
I0407 09:17:28.639125       1 schedule_one.go:965] "Updating pod condition" pod="default/nebula-storaged-4" conditionType="PodScheduled" conditionStatus="False" conditionReason="Unschedulable"

@MegaByte875
Copy link
Contributor Author

#495

@github-actions github-actions bot added the process/fixed Process of bug label Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/none PR/issue: this bug affects none version. process/fixed Process of bug severity/none Severity of bug type/bug Type: something is unexpected
Projects
None yet
Development

No branches or pull requests

2 participants