You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I manually patched the deployment to use 2Gi instead of 1Gi, and after a few hours of monitoring, mem usage has been continuously but very slowly increasing.
As of writing this, its gone from 1019280 -> 1032556 -> 1056748 and so far it has never stopped decreasing (in the past few hrs its been running).
To Reproduce
Hard to say since its a complicated deployment thats evolved over months and has lot of workload on it.
Environment
App Version Status Scale Charm Channel Rev Address Exposed Messageadmission-webhook res:oci-image@2d74d1b active 1 admission-webhook 1.7/stable 205 argo-controller res:oci-image@669ebd5 active 1 argo-controller 3.3/stable 236 no argo-server res:oci-image@576d038 active 1 argo-server 3.3/stable 185 no dex-auth active 1 dex-auth 2.31/stable 224 grafana-agent-k8s 0.32.1 waiting 1 grafana-agent-k8s latest/stable 38 istio-ingressgateway active 1 istio-gateway 1.16/stable 551 istio-pilot active 1 istio-pilot 1.16/stable 551 jupyter-controller res:oci-image@1167186 active 1 jupyter-controller 1.7/stable 607 no jupyter-ui .../9lw7s63ewtlyew486jjn1ez... active 1 jupyter-ui 25 katib-controller res:oci-image@111495a active 1 katib-controller 0.15/stable 282 katib-db mariadb/server:katib-db-manager res:oci-image@16b33a5 active 1 katib-db-manager 0.15/stable 253 katib-ui res:oci-image@c7dc04a active 1 katib-ui 0.15/stable 267 kfp-api res:oci-image@bf747d5 active 1 kfp-api 2.0-alpha.7/stable 935 kfp-db mariadb/server:kfp-persistence res:oci-image@ebed770 active 1 kfp-persistence 2.0-alpha.7/stable 939 no kfp-profile-controller res:oci-image@aa75b0c active 1 kfp-profile-controller 2.0-alpha.7/stable 899 kfp-schedwf res:oci-image@2cb9087 active 1 kfp-schedwf 2.0-alpha.7/stable 952 no kfp-ui res:oci-image@ae72602 active 1 kfp-ui 2.0-alpha.7/stable 934 kfp-viewer res:oci-image@899e25f active 1 kfp-viewer 2.0-alpha.7/stable 964 no kfp-viz res:oci-image@ffaf37e active 1 kfp-viz 2.0-alpha.7/stable 889 knative-eventing active 1 knative-eventing 1.8/stable 224 knative-operator active 1 knative-operator 1.8/stable 199 knative-serving active 1 knative-serving 1.8/stable 224 kserve-controller active 1 kserve-controller 0.kubeflow-dashboard res:oci-image@6fe6eec active 1 kubeflow-dashboard 1.7/stable 307 kubeflow-profiles res:profile-image@cfd6935 active 1 kubeflow-profiles 1.7/stable 269 kubeflow-roles active 1 kubeflow-roles 1.7/stable 113 kubeflow-volumes res:oci-image@d261609 active 1 kubeflow-volumes 1.7/stable 178 metacontroller-operator active 1 metacontroller-operator 2.0/stable 117 minio res:oci-image@1755999 active 1 minio ckf-1.7/stable 186 namespace-node-affinity active 1 namespace-node-affinity 0.1/beta 5 oidc-gatekeeper res:oci-image@6b720b8 active 1 oidc-gatekeeper ckf-1.7/stable 176 seldon-controller-manager res:oci-image@eb811b6 active 1 seldon-core 1.15/stable 354 tensorboard-controller res:oci-image@c52f7c2 active 1 tensorboard-controller 1.7/stable 156 tensorboards-web-app res:oci-image@929f55b active 1 tensorboards-web-app 1.7/stable 158 training-operator active 1 training-operator 1.6/stable 215
the jupyter ui charm is a custom charm thats based off of the regular jupyter charm revision that kf 1.7/stable tracked (a few months ago) with modified spawner ui config.yaml (also I have intentionally hidden the addresses)
Relevant Log Output
from container logs
2024-01-29T11:30:33.035914Z warn Envoy may have been out of memory killed. Check memory usage and limits.
2024-01-29T11:30:33.036001Z error Envoy exited with error: signal: killed
from pod description
Last State: Terminated
Reason: OOMKilled
Exit Code: 0
Started: Mon, 29 Jan 2024 11:37:38 +0000
Finished: Mon, 29 Jan 2024 11:41:11 +0000
Additional Context
No response
The text was updated successfully, but these errors were encountered:
Bug Description
I am running into an issue where the istio-ingressgateway-workload pod/container is crashlooping since its get OOM-killed.
I manually patched the deployment to use
2Gi
instead of1Gi
, and after a few hours of monitoring, mem usage has been continuously but very slowly increasing.As of writing this, its gone from
1019280
->1032556
->1056748
and so far it has never stopped decreasing (in the past few hrs its been running).To Reproduce
Hard to say since its a complicated deployment thats evolved over months and has lot of workload on it.
Environment
the jupyter ui charm is a custom charm thats based off of the regular jupyter charm revision that kf 1.7/stable tracked (a few months ago) with modified spawner ui config.yaml (also I have intentionally hidden the addresses)
Relevant Log Output
from pod description
Additional Context
No response
The text was updated successfully, but these errors were encountered: