istio ingressgateway's envoy process is taking more memory than defined in the charm's pod manifest #376

nishant-dash · 2024-01-29T17:11:09Z

Bug Description

I am running into an issue where the istio-ingressgateway-workload pod/container is crashlooping since its get OOM-killed.

istio-ingressgateway-workload-5dcdfb989-d52q2          1/1     Running   683 (7m23s ago)   46d

I manually patched the deployment to use 2Gi instead of 1Gi, and after a few hours of monitoring, mem usage has been continuously but very slowly increasing.
As of writing this, its gone from 1019280 -> 1032556 -> 1056748 and so far it has never stopped decreasing (in the past few hrs its been running).

To Reproduce

Hard to say since its a complicated deployment thats evolved over months and has lot of workload on it.

Environment

App                        Version                         Status   Scale  Charm                    Channel             Rev  Address         Exposed  Message
admission-webhook          res:oci-image@2d74d1b           active       1  admission-webhook        1.7/stable          205  
argo-controller            res:oci-image@669ebd5           active       1  argo-controller          3.3/stable          236                  no       
argo-server                res:oci-image@576d038           active       1  argo-server              3.3/stable          185                  no       
dex-auth                                                   active       1  dex-auth                 2.31/stable         224  
grafana-agent-k8s          0.32.1                          waiting      1  grafana-agent-k8s        latest/stable        38  
istio-ingressgateway                                       active       1  istio-gateway            1.16/stable         551  
istio-pilot                                                active       1  istio-pilot              1.16/stable         551  
jupyter-controller         res:oci-image@1167186           active       1  jupyter-controller       1.7/stable          607                  no       
jupyter-ui                 .../9lw7s63ewtlyew486jjn1ez...  active       1  jupyter-ui                                    25  
katib-controller           res:oci-image@111495a           active       1  katib-controller         0.15/stable         282  
katib-db                   mariadb/server:
katib-db-manager           res:oci-image@16b33a5           active       1  katib-db-manager         0.15/stable         253  
katib-ui                   res:oci-image@c7dc04a           active       1  katib-ui                 0.15/stable         267  
kfp-api                    res:oci-image@bf747d5           active       1  kfp-api                  2.0-alpha.7/stable  935  
kfp-db                     mariadb/server:
kfp-persistence            res:oci-image@ebed770           active       1  kfp-persistence          2.0-alpha.7/stable  939                  no       
kfp-profile-controller     res:oci-image@aa75b0c           active       1  kfp-profile-controller   2.0-alpha.7/stable  899  
kfp-schedwf                res:oci-image@2cb9087           active       1  kfp-schedwf              2.0-alpha.7/stable  952                  no       
kfp-ui                     res:oci-image@ae72602           active       1  kfp-ui                   2.0-alpha.7/stable  934  
kfp-viewer                 res:oci-image@899e25f           active       1  kfp-viewer               2.0-alpha.7/stable  964                  no       
kfp-viz                    res:oci-image@ffaf37e           active       1  kfp-viz                  2.0-alpha.7/stable  889  
knative-eventing                                           active       1  knative-eventing         1.8/stable          224  
knative-operator                                           active       1  knative-operator         1.8/stable          199  
knative-serving                                            active       1  knative-serving          1.8/stable          224  
kserve-controller                                          active       1  kserve-controller        0.
kubeflow-dashboard         res:oci-image@6fe6eec           active       1  kubeflow-dashboard       1.7/stable          307  
kubeflow-profiles          res:profile-image@cfd6935       active       1  kubeflow-profiles        1.7/stable          269  
kubeflow-roles                                             active       1  kubeflow-roles           1.7/stable          113  
kubeflow-volumes           res:oci-image@d261609           active       1  kubeflow-volumes         1.7/stable          178  
metacontroller-operator                                    active       1  metacontroller-operator  2.0/stable          117  
minio                      res:oci-image@1755999           active       1  minio                    ckf-1.7/stable      186  
namespace-node-affinity                                    active       1  namespace-node-affinity  0.1/beta              5  
oidc-gatekeeper            res:oci-image@6b720b8           active       1  oidc-gatekeeper          ckf-1.7/stable      176  
seldon-controller-manager  res:oci-image@eb811b6           active       1  seldon-core              1.15/stable         354  
tensorboard-controller     res:oci-image@c52f7c2           active       1  tensorboard-controller   1.7/stable          156  
tensorboards-web-app       res:oci-image@929f55b           active       1  tensorboards-web-app     1.7/stable          158  
training-operator                                          active       1  training-operator        1.6/stable          215

the jupyter ui charm is a custom charm thats based off of the regular jupyter charm revision that kf 1.7/stable tracked (a few months ago) with modified spawner ui config.yaml (also I have intentionally hidden the addresses)

Relevant Log Output

from container logs

2024-01-29T11:30:33.035914Z     warn    Envoy may have been out of memory killed. Check memory usage and limits.
2024-01-29T11:30:33.036001Z     error   Envoy exited with error: signal: killed

from pod description

Last State:     Terminated
  Reason:       OOMKilled
  Exit Code:    0
  Started:      Mon, 29 Jan 2024 11:37:38 +0000
  Finished:     Mon, 29 Jan 2024 11:41:11 +0000

Additional Context

No response

The text was updated successfully, but these errors were encountered:

syncronize-issues-to-jira · 2024-01-29T17:11:17Z

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5262.

This message was autogenerated

nishant-dash · 2024-01-30T07:40:07Z

this still continues to rise, currently at 1103120 KB

nishant-dash added the bug Something isn't working label Jan 29, 2024

github-project-automation bot added this to MLOps Solution Issues Aug 29, 2024

github-project-automation bot moved this to Needs Triage in MLOps Solution Issues Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

istio ingressgateway's envoy process is taking more memory than defined in the charm's pod manifest #376

istio ingressgateway's envoy process is taking more memory than defined in the charm's pod manifest #376

nishant-dash commented Jan 29, 2024

syncronize-issues-to-jira bot commented Jan 29, 2024

nishant-dash commented Jan 30, 2024

istio ingressgateway's envoy process is taking more memory than defined in the charm's pod manifest #376

istio ingressgateway's envoy process is taking more memory than defined in the charm's pod manifest #376

Comments

nishant-dash commented Jan 29, 2024

Bug Description

To Reproduce

Environment

Relevant Log Output

Additional Context

syncronize-issues-to-jira bot commented Jan 29, 2024

nishant-dash commented Jan 30, 2024