Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CELEBORN-688] Add JVM metrics grafana template #1964

Closed
wants to merge 3 commits into from

Conversation

onebox-li
Copy link
Contributor

What changes were proposed in this pull request?

Currently there is no JVM metrics grafana template, nor in grafana labs. For better use, it is necessary to add one.
According the change in #1939
This template uses two variables(instance, pool).
The layout is divided into 5 rows.
image

The panels with g1 look like below:
image

JVM Memory Pools row uses replicated panel mode which panels are automatically deplicated by pool variables.
image
image
image
image

Why are the changes needed?

Ditto

Does this PR introduce any user-facing change?

Yes, this dashboard is based on changes in #1939

How was this patch tested?

Cluster test

@codecov
Copy link

codecov bot commented Oct 9, 2023

Codecov Report

Merging #1964 (d395548) into main (b2412d0) will decrease coverage by 0.09%.
Report is 9 commits behind head on main.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #1964      +/-   ##
==========================================
- Coverage   47.10%   47.00%   -0.09%     
==========================================
  Files         164      164              
  Lines       10361    10369       +8     
  Branches      956      957       +1     
==========================================
- Hits         4879     4873       -6     
- Misses       5166     5178      +12     
- Partials      316      318       +2     

see 6 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

"uid": "_7trpcMIk",
"version": 5,
"weekStart": ""
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a new blank line in the end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@AngersZhuuuu
Copy link
Contributor

I want to know if the metrics name will be different in different GC?

@onebox-li
Copy link
Contributor Author

I want to know if the metrics name will be different in different GC?

No, It will only change related metrics labels. For specific details, you can refer to #1939. When use ParNew + CMS, the JVM Memory Pools / GC rows look like below:
image
image

@AngersZhuuuu
Copy link
Contributor

How about G1? I remember when I use G1 in a test cluster, it was not same in origin code

@onebox-li
Copy link
Contributor Author

How about G1? I remember when I use G1 in a test cluster, it was not same in origin code

You can find the example graphs with g1 in the top PR proposal. Different collectors only have different labels in the new jvm metrics. For pools metrics with different collectors, it can be generated dynamically by the benefit of grafana's replicate panels function(Repeat this panel for each value in the selected variable).

@AngersZhuuuu
Copy link
Contributor

Can you provide a screanshot when it shows the whole cluster? Or it only support show one worker?

@onebox-li
Copy link
Contributor Author

Can you provide a screanshot when it shows the whole cluster? Or it only support show one worker?

This template now just supports one instance. For JVM metrics, one panel often contains serveal metrics so that we can easily obtain information by comparing related metrics. The cluster perspective may make the diagram a mess. Users can customize the jvm metrics panel of the entire cluster they care about or use explore. But I will change the variable to be multi-selected later to make the template easier to use.

@AngersZhuuuu
Copy link
Contributor

Yea, for me, I always check the whole cluster's status to see if the cluster is healthy.

@onebox-li
Copy link
Contributor Author

Yea, for me, I always check the whole cluster's status to see if the cluster is healthy.

Hi, @AngersZhuuuu, I've supported selecting multi/all instances, snapshot like below, please take a look again if have time.
image

Copy link
Contributor

@AngersZhuuuu AngersZhuuuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, cc @FMX

Copy link
Contributor

@waitinfuture waitinfuture left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except a question: is it right that CPU Usage's type is timeseries?

@onebox-li
Copy link
Contributor Author

LGTM except a question: is it right that CPU Usage's type is timeseries?

I have called a increase function on it. https://github.com/apache/incubator-celeborn/pull/1964/files#diff-68dcac07da1db4f5aa934eb21ef636d9fc903f02e85db78acb9f59988baaed0eR946

@waitinfuture
Copy link
Contributor

LGTM except a question: is it right that CPU Usage's type is timeseries?

I have called a increase function on it. https://github.com/apache/incubator-celeborn/pull/1964/files#diff-68dcac07da1db4f5aa934eb21ef636d9fc903f02e85db78acb9f59988baaed0eR946

Thanks, merging to main/0.3

waitinfuture pushed a commit that referenced this pull request Oct 13, 2023
### What changes were proposed in this pull request?
Currently there is no JVM metrics grafana template, nor in grafana labs. For better use, it is necessary to add one.
According the change in #1939
This template uses two variables(instance, pool).
The layout is divided into 5 rows.
![image](https://github.com/apache/incubator-celeborn/assets/19429353/732cff90-463c-47b5-89b8-fa8dbbf33b1e)

The panels with g1 look like below:
![image](https://github.com/apache/incubator-celeborn/assets/19429353/919b7e9e-f86a-4341-a004-7f0394e1d8b2)

JVM Memory Pools row uses replicated panel mode which panels are automatically deplicated by `pool` variables.
![image](https://github.com/apache/incubator-celeborn/assets/19429353/3bdf7a3c-d4e0-42ea-bbe0-012da55a61d1)
![image](https://github.com/apache/incubator-celeborn/assets/19429353/8feaf9b7-156d-453e-8188-40a0399ea516)
![image](https://github.com/apache/incubator-celeborn/assets/19429353/cba4b61c-7d66-4893-9f07-6157c64869bd)
![image](https://github.com/apache/incubator-celeborn/assets/19429353/09b473ef-434c-4fd0-aa4b-084f7588a4f7)

### Why are the changes needed?
Ditto

### Does this PR introduce _any_ user-facing change?
Yes, this dashboard is based on changes in #1939

### How was this patch tested?
Cluster test

Closes #1964 from onebox-li/add-jvm-dashboard.

Authored-by: onebox-li <[email protected]>
Signed-off-by: zky.zhoukeyong <[email protected]>
(cherry picked from commit 2b79692)
Signed-off-by: zky.zhoukeyong <[email protected]>
@onebox-li onebox-li deleted the add-jvm-dashboard branch October 13, 2023 03:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants