Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[web hook] when storage scale out and pending, because of resource not enough, then can not execute scale in ,it seams stuck #320

Open
jinyingsunny opened this issue Oct 9, 2023 · 2 comments
Labels
affects/none PR/issue: this bug affects none version. severity/none Severity of bug type/bug Type: something is unexpected

Comments

@jinyingsunny
Copy link

jinyingsunny commented Oct 9, 2023

when enable web hook. scale out storage but failed, because cpu not enough
image

image

image

$ kubectl -n nebula describe pod nebulazone-storaged-9
Name:             nebulazone-storaged-9
Namespace:        nebula
Priority:         0
Service Account:  nebula-sa
Node:             <none>
Labels:           app.kubernetes.io/cluster=nebulazone
                  app.kubernetes.io/component=storaged
                  app.kubernetes.io/managed-by=nebula-operator
                  app.kubernetes.io/name=nebula-graph
                  controller-revision-hash=nebulazone-storaged-5b568d554c
                  statefulset.kubernetes.io/pod-name=nebulazone-storaged-9
Annotations:      cloud.google.com/cluster_autoscaler_unhelpable_since: 2023-10-09T09:58:34+0000
                  cloud.google.com/cluster_autoscaler_unhelpable_until: Inf
                  nebula-graph.io/cm-hash: 760645648930d20e
Status:           Pending
IP:
IPs:              <none>
Controlled By:    StatefulSet/nebulazone-storaged
Containers:
  storaged:
    Image:       asia-east2-docker.pkg.dev/nebula-cloud-test/poc/rc/nebula-storaged-ent:v3.5.0-sc
    Ports:       9779/TCP, 19789/TCP, 9778/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP
    Command:
      /bin/sh
      -ecx
      exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf --meta_server_addrs=nebulazone-metad-0.nebulazone-metad-headless.nebula.svc.cluster.local:9559,nebulazone-metad-1.nebulazone-metad-headless.nebula.svc.cluster.local:9559,nebulazone-metad-2.nebulazone-metad-headless.nebula.svc.cluster.local:9559 --local_ip=$(hostname).nebulazone-storaged-headless.nebula.svc.cluster.local --ws_ip=$(hostname).nebulazone-storaged-headless.nebula.svc.cluster.local --daemonize=false --ws_http_port=19789
    Limits:
      cpu:     3
      memory:  16Gi
    Requests:
      cpu:        2
      memory:     8Gi
    Readiness:    http-get http://:19789/status delay=10s timeout=5s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /usr/local/nebula/data from storaged-data (rw,path="data")
      /usr/local/nebula/etc/nebula-storaged.conf from nebulazone-storaged (rw,path="nebula-storaged.conf")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j86r9 (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  storaged-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  storaged-data-nebulazone-storaged-9
    ReadOnly:   false
  nebulazone-storaged:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      nebulazone-storaged
    Optional:  false
  kube-api-access-j86r9:
    Type:                     Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:   3607
    ConfigMapName:            kube-root-ca.crt
    ConfigMapOptional:        <nil>
    DownwardAPI:              true
QoS Class:                    Burstable
Node-Selectors:               <none>
Tolerations:                  node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints:  topology.kubernetes.io/zone:DoNotSchedule when max skew 1 is exceeded for selector app.kubernetes.io/cluster=nebulazone,app.kubernetes.io/component=storaged,app.kubernetes.io/managed-by=nebula-operator,app.kubernetes.io/name=nebula-graph
Events:
  Type     Reason             Age   From                Message
  ----     ------             ----  ----                -------
  Warning  FailedScheduling   48s   nebula-scheduler    0/3 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod..
  Warning  FailedScheduling   45s   nebula-scheduler    0/3 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod..
  Normal   NotTriggerScaleUp  46s   cluster-autoscaler  pod didn't trigger scale-up:

Your Environments (required)

nebula-operator: snap1.19

Expected behavior

when pending cause by resource , stop the scale out ,return to last status .

@jinyingsunny jinyingsunny added the type/bug Type: something is unexpected label Oct 9, 2023
@github-actions github-actions bot added affects/none PR/issue: this bug affects none version. severity/none Severity of bug labels Oct 9, 2023
@jinyingsunny jinyingsunny changed the title [web hook] when resource not enough cause storagescale out [web hook] when storage scale out pending, because of resource not enough, then can not execute scalein Oct 9, 2023
@jinyingsunny jinyingsunny changed the title [web hook] when storage scale out pending, because of resource not enough, then can not execute scalein [web hook] when storage scale out pending, because of resource not enough, then can not execute scale in Oct 9, 2023
@jinyingsunny jinyingsunny changed the title [web hook] when storage scale out pending, because of resource not enough, then can not execute scale in [web hook] when storage scale out and pending, because of resource not enough, then can not execute scale in ,it seams stuck Oct 9, 2023
@jinyingsunny
Copy link
Author

i resolve the problem by edit nebula-operator deployment set --enable-admission-webhook=false, to let webhook stop

image

@MegaByte875
Copy link
Contributor

I think insufficient resource problem is not a bug, admission webhook is used for preventing operations in intermediate state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/none PR/issue: this bug affects none version. severity/none Severity of bug type/bug Type: something is unexpected
Projects
None yet
Development

No branches or pull requests

2 participants