jb: java perf monitoring #9796

akosyakov · 2022-05-05T15:12:16Z

Description

See internal doc for results of perf testing.

This PR adds tooling which can be used by us or end users to control and monitor resources consumptions of workspaces generally and particularly JB backends:

adds a supervisor endpoint to fetch resources status (memory and cpu) with respect of cgroup v1
provide internal CLI command /.supervisor/supervisor top to monitor resources status based on endpoint
contributes env var for each JB product to control max heap size (XMX) via [product]_XMX, i.e. INTELLIJ_XMX=4096M
monitor JB backends max/used memory in bytes and report it to prometheus for each workspace pod
contributes a dev time script node ./dev/ide/profile-workspace.js which capture avg/max/90% percentile of memory and CPU usage as well as max/used JB backend memory for a single workspace running on VM

Out of scope

extract cgroup library from ws-daemon in supervisor #10043
support of cgroup v2 for resources endpoint [(add a link to new issue here)]
making it convenient to configure XMX for JB products via .gitpod.yml [1]
adding cgroup based stats to JB perf indicator [(add a link to new issue here)]

Related Issue(s)

Fixes #9521

How to test

Important only one person can test at the time. So comment below that you are testing and when done remove a comment.

Start a dev workspace https://gitpod.io#https://github.com/gitpod-io/gitpod/tree/refs/heads/ak/java_perf
Run ./dev/preview/install-k3s-kubeconfig.sh to configure k8s context.
Run ./dev/preview/portforward-monitoring-satellite.sh -c harvester to port forward prometheus API endpoint.
Run node ./dev/ide/profile-workspace.js to start monitoring resource usage of a workspace pod in prev envs.
Configure prev envs with IntelliJ IDEA: https://ak-java-perf.preview.gitpod-dev.com/preferences
Start a workspace in prev envs: https://ak-java-perf.preview.gitpod-dev.com/#INTELLIJ_XMX=4096m,JB_MEM_PROFILE=true/https://github.com/eclipse/xtext-core
Connect with IntelliJ.
Run curl http://localhost:22999/_supervisor/v1/status/resources to check resource status of supervisor.
Use /.supervisor/supervisor top to see resources status of the current workspace via internal CLI
Capture CPU, memory and heap metrics for the backend from a profile file in the dev workspace: watch tail -n 1 /workspace/gitpod/dev/ide/perf.log

Release Notes

NONE

Documentation

/werft with-vm
/werft without-vm
/werft with-clean-slate-deployment

components/supervisor-api/status.proto

mustard-mh · 2022-05-13T09:49:32Z

Follow up How to test section, ~~works well for me~~ need further test

curl	top-cli	watch-log

...nts/ide/jetbrains/backend-plugin/src/main/kotlin/io/gitpod/jetbrains/remote/GitpodManager.kt

mustard-mh · 2022-05-13T10:05:13Z

I'm testing this PR

mustard-mh · 2022-05-13T10:26:04Z

How to check if xmx=4096m works? @akosyakov

watch tail -n 2 /tmp/jb_mem_profile.log I thought this should be 4096m, but not related.

Every 2.0s: tail -n 2 /tmp/jb_mem_profile.log                  eclipse-xtextcore-umw6kzpgi0x: Fri May 13 10:28:03 2022

allocated - current 1245M, avg 1235M, max 1245M, 90% percentile 1245M
used - current 984M,avg 766M, max 1128M, 90% percentile 996M

akosyakov · 2022-05-13T10:29:02Z

How to check if xmx=4096m works? @akosyakov

It is a bit tricky:

in performance indicator click ... and then select main window. It will render UI of backend in another window.
In main window press double shift and then enable Show Memory Usage toggle
Check max heap size in the status bar of this window.

mustard-mh · 2022-05-13T10:32:43Z

Tested and works well (Do we need expected (related) result in how to test section?), should we explain what /workspace/gitpod/dev/ide/perf.log and /tmp/jb_mem_profile.log use for?

mustard-mh

/hold

Code looks good

csweichel · 2022-05-13T13:59:09Z

components/supervisor/pkg/supervisor/top.go

+func Top(ctx context.Context) (*api.ResourcesStatusResponse, error) {
+	memory, err := resolveMemoryStatus()
+	if err != nil {
+		return nil, err
+	}
+	cpu, err := resolveCPUStatus()
+	if err != nil {
+		return nil, err
+	}
+	return &api.ResourcesStatusResponse{
+		Memory: memory,
+		Cpu:    cpu,
+	}, nil
+}


We should harmonise this with the way ws-daemon does it.
Either copy over some of that code, or stick it in something like common-go/cgroup.

@Furisto @utam0k could you guide here how to proceed?

At the moment the code in ws-daemon is not written to be used by other components. Factoring this out would be equivalent to starting our own cgroup library. Considering that the other options like using containerd's cgroup are awkward to use I still think this is the best way forward. This does not need to be done in this PR though.

Could you point me to relevant code please? 🙏

There also can be slight difference, i.e. I subtract evict-able memory to give a user more usefull data, i.e. simlar to what GCP showing. I am not sure what ws-daemon does.

CPU:
v1; https://github.com/gitpod-io/gitpod/blob/main/components/ws-daemon/pkg/cpulimit/cfs.go
v2: https://github.com/gitpod-io/gitpod/blob/main/components/ws-daemon/pkg/cpulimit/cfs_v2.go

Memory:
v1: https://github.com/gitpod-io/gitpod/blob/main/components/ws-daemon/pkg/cgroup/plugin_cachereclaim.go
v2: n.a.

IO:
v1: https://github.com/gitpod-io/gitpod/blob/main/components/ws-daemon/pkg/cgroup/plugin_iolimit_v1.go
v2: https://github.com/gitpod-io/gitpod/blob/main/components/ws-daemon/pkg/cgroup/plugin_iolimit_v2.go

We should harmonise this with the way ws-daemon does it.

After looking at the code I don't think it can be harmonised without refactoring, ws daemon does not have some functions like getting used memory without evictable caches. Instead it compare page caches after trying to reclaim memory. It also has special cases for different state of workspaces which is not applicable inside a workspace.

I'm not sure refactoring will make such sense trying to create a reusable library right now. Code in ws-daemon is not really source of truth, i.e. there should not be a situation when you change one and need to change another.

I wonder also why ws-daemon does not use container runtime endpoints but need to read from the disk?

Thanks for your feedback! Sorry, I couldn't get what the disk is.

Thanks for your feedback! Sorry, I couldn't get what the disk is.

I mean container runtime has such API which can return usage/limit stats for cpu and memory: https://github.com/opencontainers/runc/blob/067aaf8548d78269dcb2c13b856775e27c410f9c/libcontainer/cgroups/stats.go#L158-L161

Supervisor cannot reach this endpoint, so we have to read cgroup from filesystem. I am asking whether ws-daemon can rely on it instead and provide this information to supervisor somehow?

The API you have linked would mean to execute runc stat and then parse the output, this is not a service endpoint. We also need to write to the cgroup filesystem not only read from it and we should strive to do so in a runtime agnostic manner. We could ask this information from containerd but if we allow the option to use any CRI compatible runtime in the future (like CRI-O) this would require us to adapt the code again.

ok, sorry i don't really know it and thought opencontainers/runc is an abstraction over different runtimes and using this API oppositely will make things more stable.

@csweichel Would it be alright to continue as suggested here? Could you approve it please if so? 🙏

akosyakov · 2022-05-17T12:37:01Z

/werft run

👍 started the job as gitpod-build-ak-java-perf.41
(with .werft/ from main)

mustard-mh · 2022-05-17T14:16:21Z

Code looks good, will test it with Prometheus

mustard-mh

Tested good

used	max

Another way to check xmx=4096m in thin client (related #9796 (comment)

Img

mustard-mh · 2022-05-17T14:42:59Z

Feel free to unhold it 👍 @akosyakov

akosyakov · 2022-05-18T07:35:40Z

/unhold

roboquat added do-not-merge/work-in-progress do-not-merge/release-note-label-needed size/XXL labels May 5, 2022

akosyakov force-pushed the ak/java_perf branch 5 times, most recently from 3f08535 to cba707d Compare May 10, 2022 13:00

akosyakov mentioned this pull request May 11, 2022

Investigate reported performance of JetBrains #8704

Closed

akosyakov force-pushed the ak/java_perf branch from cba707d to a957239 Compare May 12, 2022 13:45

roboquat added release-note-none and removed do-not-merge/release-note-label-needed labels May 12, 2022

akosyakov changed the title ~~java perf testing~~ java perf testing tooling May 12, 2022

akosyakov changed the title ~~java perf testing tooling~~ jb: java perf testing tooling May 12, 2022

akosyakov changed the title ~~jb: java perf testing tooling~~ jb: java perf monitoring tooling May 12, 2022

akosyakov changed the title ~~jb: java perf monitoring tooling~~ jb: java perf monitoring May 12, 2022

akosyakov force-pushed the ak/java_perf branch 2 times, most recently from 9469cea to 110b544 Compare May 13, 2022 09:09

akosyakov marked this pull request as ready for review May 13, 2022 09:13

akosyakov requested a review from a team May 13, 2022 09:13

akosyakov requested a review from csweichel as a code owner May 13, 2022 09:13

roboquat removed the do-not-merge/work-in-progress label May 13, 2022

github-actions bot added the team: IDE label May 13, 2022

akosyakov commented May 13, 2022

View reviewed changes

components/supervisor-api/status.proto Show resolved Hide resolved

mustard-mh reviewed May 13, 2022

View reviewed changes

...nts/ide/jetbrains/backend-plugin/src/main/kotlin/io/gitpod/jetbrains/remote/GitpodManager.kt Outdated Show resolved Hide resolved

mustard-mh approved these changes May 13, 2022

View reviewed changes

roboquat added the do-not-merge/hold label May 13, 2022

akosyakov force-pushed the ak/java_perf branch from 110b544 to aa75ee7 Compare May 13, 2022 13:38

csweichel reviewed May 13, 2022

View reviewed changes

akosyakov marked this pull request as draft May 13, 2022 14:46

roboquat added the do-not-merge/work-in-progress label May 13, 2022

akosyakov mentioned this pull request May 17, 2022

extract cgroup library from ws-daemon in supervisor #10043

Closed

akosyakov force-pushed the ak/java_perf branch from 22bc28e to 2d35e3e Compare May 17, 2022 08:40

akosyakov added 3 commits May 17, 2022 10:38

[jb] allow to control xmx via [product]_XMX env var

961928a

[supervisor] fix #9521: add resources endpoint respecting cgroup v1

e735a44

jb: push backend memory metrics to prometheus

0ccdb00

akosyakov force-pushed the ak/java_perf branch from 2d35e3e to 0ccdb00 Compare May 17, 2022 10:45

akosyakov marked this pull request as ready for review May 17, 2022 13:53

roboquat removed the do-not-merge/work-in-progress label May 17, 2022

Furisto approved these changes May 17, 2022

View reviewed changes

mustard-mh self-requested a review May 17, 2022 14:16

csweichel approved these changes May 17, 2022

View reviewed changes

mustard-mh approved these changes May 17, 2022

View reviewed changes

utam0k approved these changes May 17, 2022

View reviewed changes

roboquat removed the do-not-merge/hold label May 18, 2022

roboquat merged commit faa6b30 into main May 18, 2022

roboquat deleted the ak/java_perf branch May 18, 2022 07:36

roboquat added deployed: IDE IDE change is running in production deployed Change is completely running in production labels May 19, 2022

yaohui-wyh mentioned this pull request May 22, 2022

[jb] configure vmoptions for intellij backend server #10175

Merged

andreafalzetti mentioned this pull request May 25, 2022

[JetBrains] Show notification when port becomes available 🔔 #10107

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jb: java perf monitoring #9796

jb: java perf monitoring #9796

akosyakov commented May 5, 2022 •

edited by mustard-mh

Loading

mustard-mh commented May 13, 2022 •

edited

Loading

mustard-mh commented May 13, 2022

mustard-mh commented May 13, 2022 •

edited

Loading

akosyakov commented May 13, 2022

mustard-mh commented May 13, 2022 •

edited

Loading

mustard-mh left a comment

csweichel May 13, 2022

Furisto May 16, 2022

akosyakov May 16, 2022

Furisto May 16, 2022

akosyakov May 16, 2022

utam0k May 17, 2022

akosyakov May 17, 2022

Furisto May 17, 2022

akosyakov May 17, 2022 •

edited

Loading

akosyakov May 17, 2022

akosyakov commented May 17, 2022 •

edited by werft-gitpod-dev-com bot

Loading

mustard-mh commented May 17, 2022

mustard-mh left a comment

mustard-mh commented May 17, 2022 •

edited by werft-gitpod-dev-com bot

Loading

akosyakov commented May 18, 2022

jb: java perf monitoring #9796

jb: java perf monitoring #9796

Conversation

akosyakov commented May 5, 2022 • edited by mustard-mh Loading

Description

Related Issue(s)

How to test

Release Notes

Documentation

mustard-mh commented May 13, 2022 • edited Loading

mustard-mh commented May 13, 2022

mustard-mh commented May 13, 2022 • edited Loading

akosyakov commented May 13, 2022

mustard-mh commented May 13, 2022 • edited Loading

mustard-mh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akosyakov May 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akosyakov commented May 17, 2022 • edited by werft-gitpod-dev-com bot Loading

mustard-mh commented May 17, 2022

mustard-mh left a comment

Choose a reason for hiding this comment

mustard-mh commented May 17, 2022 • edited by werft-gitpod-dev-com bot Loading

akosyakov commented May 18, 2022

akosyakov commented May 5, 2022 •

edited by mustard-mh

Loading

mustard-mh commented May 13, 2022 •

edited

Loading

mustard-mh commented May 13, 2022 •

edited

Loading

mustard-mh commented May 13, 2022 •

edited

Loading

akosyakov May 17, 2022 •

edited

Loading

akosyakov commented May 17, 2022 •

edited by werft-gitpod-dev-com bot

Loading

mustard-mh commented May 17, 2022 •

edited by werft-gitpod-dev-com bot

Loading