Reports of browsers running out of memory when tailing log files in UI #7156

gaktive · 2022-10-11T15:06:14Z

Internal reference: SURE-5383
Reported in Rancher 2.6.8.

When troubleshooting an issue involving websockets, I did get a report that was adjacent to it. When using the Vue UI to tail log files, the browser stops responding.

browser memory is consumed until the tab is killed. If you leave one open long enough, it will eventually die. If you pull up pod logs for a busy pod though you can kill it in a few minutes (rancher kubectl UI). We have tried, edge, chrome and brave and they all exhibit the symptom.

Browser tab started at 650mb before opening logs, high cpu usage while streaming them, and memory growing rapidly. Browser tab crashed at about 2.5gb memory footprint.

We'll need to see if we can repro this to narrow down what's going on. We may need busy log activity to fully reproduce.

Workaround:
Restart browser tab every few minutes, or sometimes before one minute.

seanwcom · 2022-10-12T04:01:30Z

I work for one of your customers that's reported this issue. I can replicate this issue within a minute or less when viewing logs for a very busy pod (nginx ingress for example). But just so that it's noted, I can login and never look at a log, and eventually the browser tab will crash from high memory usage. So it's not specific to log viewing.

gaktive · 2022-10-19T22:47:39Z

Based on additional feedback with @Sean-McQ observing behaviour, other pages such as v1 Project Monitoring, Deployments and Pods are seeing this memory usage too.

Determining if we have to spawn separate tickets per page or be more generic here. 2.6.9 does offer some improvement but we have more digging to do.

gaktive · 2022-10-20T16:56:31Z

Some connection to #7247

brudnak · 2023-01-25T23:18:02Z

✅ PASSED

Reproduction Environment

Component	Version / Type
Rancher version	2.7.0
Installation option	docker
Cert Details	docker install with `--acme-domain`
Docker version	20.10.7, build f0df350
Helm version	v2.16.8-rancher2
Downstream cluster type	not applicable
Downstream K8s version	not applicable
Authentication providers enabled	local
Logged in user role	admin, standard user
Browser type	google chrome
Browser version	109.0.5414.87 (Official Build) (x86_64)

🚨 Additional Reproduction Setup Details: Click to Expand

Docker Rancher install setup with Terraform: https://github.com/brudnak/linode-docker-cattle

Reproduction steps

Setup Rancher
Starting from the default Rancher homepage /dashboard/home
Click hamburger menu >>> local >>> Kubectl Shell
Copy the following deployment yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
  name: test-logs
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
    spec:
      affinity: {}
      containers:
      - args:
        - 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep .001; done'
        command:
        - /bin/sh
        - -c
        image: busybox
        imagePullPolicy: Always
        name: fast
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - args:
        - 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done'
        command:
        - /bin/sh
        - -c
        image: busybox
        imagePullPolicy: Always
        name: slower
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - args:
        - 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 10; done'
        command:
        - /bin/sh
        - -c
        image: busybox
        imagePullPolicy: Always
        name: sloooow
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

Paste this into a file in the Kubectl Shell and run it:

vim deploy.yml

# paste above yaml into file and exit vim

kubectl apply -f deploy.yml

Once deployed navigate to local >>> Workload >>> Deployments >>> test-logs
For the pod running in the test-logs deployment, click the ellipsis (three dots) >>> click View Logs
Once you see the logs populating
- right click chrome/screen
- click inspect >>> ellipsis (three dots) in chrome >>> More tools >>> Performance monitor
Let this run for ~20 minutes

Additional Info

RESULTS

✅ Expected

For the Rancher UI to continue running without any issues

❌ Actual

The UI became unusable after ~20 minutes.

Metric	value
JS heap size	1692 MB
DOM Nodes	720,256

Validation Environment

Component	Version / Type
Rancher version	v2.7-bd652cb9126f80238e5bfc063a551d6de03fc4b7-head
Rancher commit link	rancher/rancher@`bd652cb`
Installation option	docker
Cert Details	docker install with `--acme-domain`
Docker version	20.10.7, build f0df350
Helm version	v2.16.8-rancher2
Downstream cluster type	not applicable
Downstream K8s version	not applicable
Authentication providers enabled	local
Logged in user role	admin, standard user
Browser type	google chrome
Browser version	109.0.5414.87 (Official Build) (x86_64)

🚨 Additional Reproduction Setup Details: Click to Expand

Docker Rancher install setup with Terraform: https://github.com/brudnak/linode-docker-cattle

Validation steps

Setup Rancher
Starting from the default Rancher homepage /dashboard/home
Click hamburger menu >>> local >>> Kubectl Shell
Copy the following deployment yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
  name: test-logs
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
    spec:
      affinity: {}
      containers:
      - args:
        - 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep .001; done'
        command:
        - /bin/sh
        - -c
        image: busybox
        imagePullPolicy: Always
        name: fast
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - args:
        - 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done'
        command:
        - /bin/sh
        - -c
        image: busybox
        imagePullPolicy: Always
        name: slower
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - args:
        - 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 10; done'
        command:
        - /bin/sh
        - -c
        image: busybox
        imagePullPolicy: Always
        name: sloooow
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

Paste this into a file in the Kubectl Shell and run it:

vim deploy.yml

# paste above yaml into file and exit vim

kubectl apply -f deploy.yml

Once deployed navigate to local >>> Workload >>> Deployments >>> test-logs
For the pod running in the test-logs deployment, click the ellipsis (three dots) >>> click View Logs
Once you see the logs populating
- right click chrome/screen
- click inspect >>> ellipsis (three dots) in chrome >>> More tools >>> Performance monitor
Let this run for ~20 minutes

Additional Info

RESULTS

✅ Expected

For the Rancher UI to continue running without any issues

✅ Actual

No issues with Rancher after ~20 mins and drastically lower metrics

Metric	value	Improvement %
JS heap size	146 MB	91.3%
DOM Nodes	7,434	98.9%

gaktive added [zube]: To Triage kind/bug labels Oct 11, 2022

gaktive added this to the v2.7.1 milestone Oct 11, 2022

gaktive added the JIRA label Oct 11, 2022

gaktive changed the title ~~Need to repro: reports of browsers running out of memory when tailing log files in UI~~ Reports of browsers running out of memory when tailing log files in UI Oct 11, 2022

gaktive added the priority/1 label Oct 19, 2022

gaktive added the [zube]: Backlog label Nov 1, 2022

zube bot removed the [zube]: To Triage label Nov 1, 2022

gaktive assigned mantis-toboggan-md Nov 1, 2022

gaktive added [zube]: Next Up and removed [zube]: Backlog labels Nov 1, 2022

mantis-toboggan-md added [zube]: Working and removed [zube]: Next Up labels Nov 7, 2022

mantis-toboggan-md added [zube]: Working and removed [zube]: Next Up labels Nov 15, 2022

mantis-toboggan-md mentioned this issue Nov 21, 2022

Improve performance of pod logs #7511

Merged

github-actions bot added [zube]: Review and removed [zube]: Working labels Nov 21, 2022

mantis-toboggan-md closed this as completed in #7511 Dec 2, 2022

zube bot added [zube]: Done and removed [zube]: Review labels Dec 2, 2022

github-actions bot reopened this Dec 2, 2022

zube bot added [zube]: To Triage and removed [zube]: Done labels Dec 2, 2022

github-actions bot removed the [zube]: To Triage label Dec 2, 2022

github-actions bot added the [zube]: To Test label Dec 2, 2022

Sahota1225 added the team/area2 Hostbusters label Dec 15, 2022

Josh-Diamond added team/area1 Team Neo and removed team/area2 Hostbusters labels Dec 22, 2022

dasarinaidu assigned brudnak Jan 3, 2023

brudnak added [zube]: QA Working and removed [zube]: To Test labels Jan 24, 2023

brudnak closed this as completed Jan 25, 2023

zube bot added [zube]: Done and removed [zube]: QA Working labels Jan 25, 2023

gaktive mentioned this issue Feb 9, 2023

[2.6.x backport] Reports of browsers running out of memory when tailing log files in UI #8135

Closed

zube bot removed the [zube]: Done label Apr 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reports of browsers running out of memory when tailing log files in UI #7156

Reports of browsers running out of memory when tailing log files in UI #7156

gaktive commented Oct 11, 2022 •

edited

Loading

seanwcom commented Oct 12, 2022

gaktive commented Oct 19, 2022

gaktive commented Oct 20, 2022

brudnak commented Jan 25, 2023

Reports of browsers running out of memory when tailing log files in UI #7156

Reports of browsers running out of memory when tailing log files in UI #7156

Comments

gaktive commented Oct 11, 2022 • edited Loading

seanwcom commented Oct 12, 2022

gaktive commented Oct 19, 2022

gaktive commented Oct 20, 2022

brudnak commented Jan 25, 2023

✅ PASSED

Reproduction Environment

Reproduction steps

Additional Info

RESULTS

✅ Expected

❌ Actual

Validation Environment

Validation steps

Additional Info

RESULTS

✅ Expected

✅ Actual

gaktive commented Oct 11, 2022 •

edited

Loading