Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reports of browsers running out of memory when tailing log files in UI #7156

Closed
gaktive opened this issue Oct 11, 2022 · 4 comments · Fixed by #7511
Closed

Reports of browsers running out of memory when tailing log files in UI #7156

gaktive opened this issue Oct 11, 2022 · 4 comments · Fixed by #7511

Comments

@gaktive
Copy link
Member

gaktive commented Oct 11, 2022

Internal reference: SURE-5383
Reported in Rancher 2.6.8.

When troubleshooting an issue involving websockets, I did get a report that was adjacent to it. When using the Vue UI to tail log files, the browser stops responding.

browser memory is consumed until the tab is killed. If you leave one open long enough, it will eventually die. If you pull up pod logs for a busy pod though you can kill it in a few minutes (rancher kubectl UI). We have tried, edge, chrome and brave and they all exhibit the symptom.

Browser tab started at 650mb before opening logs, high cpu usage while streaming them, and memory growing rapidly. Browser tab crashed at about 2.5gb memory footprint.

We'll need to see if we can repro this to narrow down what's going on. We may need busy log activity to fully reproduce.

Workaround:
Restart browser tab every few minutes, or sometimes before one minute.

@gaktive gaktive added this to the v2.7.1 milestone Oct 11, 2022
@gaktive gaktive added the JIRA label Oct 11, 2022
@gaktive gaktive changed the title Need to repro: reports of browsers running out of memory when tailing log files in UI Reports of browsers running out of memory when tailing log files in UI Oct 11, 2022
@seanwcom
Copy link

I work for one of your customers that's reported this issue. I can replicate this issue within a minute or less when viewing logs for a very busy pod (nginx ingress for example). But just so that it's noted, I can login and never look at a log, and eventually the browser tab will crash from high memory usage. So it's not specific to log viewing.

@gaktive
Copy link
Member Author

gaktive commented Oct 19, 2022

Based on additional feedback with @Sean-McQ observing behaviour, other pages such as v1 Project Monitoring, Deployments and Pods are seeing this memory usage too.

Determining if we have to spawn separate tickets per page or be more generic here. 2.6.9 does offer some improvement but we have more digging to do.

@gaktive
Copy link
Member Author

gaktive commented Oct 20, 2022

Some connection to #7247

@brudnak
Copy link
Member

brudnak commented Jan 25, 2023

✅ PASSED

Reproduction Environment

Component Version / Type
Rancher version 2.7.0
Installation option docker
Cert Details docker install with --acme-domain
Docker version 20.10.7, build f0df350
Helm version v2.16.8-rancher2
Downstream cluster type not applicable
Downstream K8s version not applicable
Authentication providers enabled local
Logged in user role admin, standard user
Browser type google chrome
Browser version 109.0.5414.87 (Official Build) (x86_64)
🚨 Additional Reproduction Setup Details: Click to Expand

Docker Rancher install setup with Terraform: https://github.com/brudnak/linode-docker-cattle

Reproduction steps

  1. Setup Rancher
  2. Starting from the default Rancher homepage /dashboard/home
  3. Click hamburger menu >>> local >>> Kubectl Shell
  4. Copy the following deployment yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
  name: test-logs
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
    spec:
      affinity: {}
      containers:
      - args:
        - 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep .001; done'
        command:
        - /bin/sh
        - -c
        image: busybox
        imagePullPolicy: Always
        name: fast
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - args:
        - 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done'
        command:
        - /bin/sh
        - -c
        image: busybox
        imagePullPolicy: Always
        name: slower
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - args:
        - 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 10; done'
        command:
        - /bin/sh
        - -c
        image: busybox
        imagePullPolicy: Always
        name: sloooow
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
  1. Paste this into a file in the Kubectl Shell and run it:
vim deploy.yml

# paste above yaml into file and exit vim

kubectl apply -f deploy.yml
  1. Once deployed navigate to local >>> Workload >>> Deployments >>> test-logs
  2. For the pod running in the test-logs deployment, click the ellipsis (three dots) >>> click View Logs
  3. Once you see the logs populating
    • right click chrome/screen
    • click inspect >>> ellipsis (three dots) in chrome >>> More tools >>> Performance monitor
  4. Let this run for ~20 minutes

Additional Info

RESULTS

✅ Expected

For the Rancher UI to continue running without any issues

❌ Actual

The UI became unusable after ~20 minutes.

Metric value
JS heap size 1692 MB
DOM Nodes 720,256

Validation Environment

Component Version / Type
Rancher version v2.7-bd652cb9126f80238e5bfc063a551d6de03fc4b7-head
Rancher commit link rancher/rancher@bd652cb
Installation option docker
Cert Details docker install with --acme-domain
Docker version 20.10.7, build f0df350
Helm version v2.16.8-rancher2
Downstream cluster type not applicable
Downstream K8s version not applicable
Authentication providers enabled local
Logged in user role admin, standard user
Browser type google chrome
Browser version 109.0.5414.87 (Official Build) (x86_64)
🚨 Additional Reproduction Setup Details: Click to Expand

Docker Rancher install setup with Terraform: https://github.com/brudnak/linode-docker-cattle

Validation steps

  1. Setup Rancher
  2. Starting from the default Rancher homepage /dashboard/home
  3. Click hamburger menu >>> local >>> Kubectl Shell
  4. Copy the following deployment yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
  name: test-logs
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
    spec:
      affinity: {}
      containers:
      - args:
        - 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep .001; done'
        command:
        - /bin/sh
        - -c
        image: busybox
        imagePullPolicy: Always
        name: fast
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - args:
        - 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done'
        command:
        - /bin/sh
        - -c
        image: busybox
        imagePullPolicy: Always
        name: slower
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - args:
        - 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 10; done'
        command:
        - /bin/sh
        - -c
        image: busybox
        imagePullPolicy: Always
        name: sloooow
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
  1. Paste this into a file in the Kubectl Shell and run it:
vim deploy.yml

# paste above yaml into file and exit vim

kubectl apply -f deploy.yml
  1. Once deployed navigate to local >>> Workload >>> Deployments >>> test-logs
  2. For the pod running in the test-logs deployment, click the ellipsis (three dots) >>> click View Logs
  3. Once you see the logs populating
    • right click chrome/screen
    • click inspect >>> ellipsis (three dots) in chrome >>> More tools >>> Performance monitor
  4. Let this run for ~20 minutes

Additional Info

RESULTS

✅ Expected

For the Rancher UI to continue running without any issues

✅ Actual

No issues with Rancher after ~20 mins and drastically lower metrics

Metric value Improvement %
JS heap size 146 MB 91.3%
DOM Nodes 7,434 98.9%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants