Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement of JVM Thread metrics #101

Closed
lenin-jaganathan opened this issue Jun 12, 2023 · 12 comments
Closed

Enhancement of JVM Thread metrics #101

lenin-jaganathan opened this issue Jun 12, 2023 · 12 comments
Assignees

Comments

@lenin-jaganathan
Copy link
Contributor

Currently OTEL sem conv for process.runtime.jvm includes the below metric to observe the number of threads,

### Metric: `process.runtime.jvm.threads.count`
This metric is [recommended][MetricRecommended].
This metric is obtained from [`ThreadMXBean#getDaemonThreadCount()`](https://docs.oracle.com/javase/8/docs/api/java/lang/management/ThreadMXBean.html#getDaemonThreadCount--) and
[`ThreadMXBean#getThreadCount()`](https://docs.oracle.com/javase/8/docs/api/java/lang/management/ThreadMXBean.html#getThreadCount--).
Note that this is the number of platform threads (as opposed to virtual threads).
<!-- semconv metric.process.runtime.jvm.threads.count(metric_table) -->
| Name | Instrument Type | Unit (UCUM) | Description |
| -------- | --------------- | ----------- | -------------- |
| `process.runtime.jvm.threads.count` | UpDownCounter | `{thread}` | Number of executing platform threads. |
<!-- endsemconv -->
<!-- semconv metric.process.runtime.jvm.threads.count(full) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `daemon` | boolean | Whether the thread is daemon or not. | | Recommended |
<!-- endsemconv -->

I would like to propose an enhancement for these metrics/ add additional metrics about the threads,

  • the metric should help to identify the threads in different states of thread and should be one of the following in Thread.State. This a very useful metric to both get a view of thread usage patterns but also to reason about threads that are blocked/timed_wait.
  • the rate of thread creation as obtained from ThreadMXBean.html#getTotalStartedThreadCount. This is really useful in understanding the application behaviors and identifying bad/ineffective thread management by applications.
@mateuszrzeszutek
Copy link
Member

Hey @lenin-jaganathan ,

  • the metric should help to identify the threads in different states of thread and should be one of the following in Thread.State. This a very useful metric to both get a view of thread usage patterns but also to reason about threads that are blocked/timed_wait.

We've previously discussed this in the Java instrumentation SIG (see open-telemetry/opentelemetry-java-instrumentation#7006 and open-telemetry/opentelemetry-java-instrumentation#7636) and came to a conclusion that there is no good reason to include thread state.

@lenin-jaganathan
Copy link
Contributor Author

Thanks for sharing this @mateuszrzeszutek.
I just went through the thread it does talk about the first point I brought up in the "Thread states". I see there are discussions dealing with JFR and things. But we have seen a lot of cases where the Thread states are not a constant behavior that we can analyze with a one-off snapshot. The traffic patterns don't always stay constant over an app's lifecycle and having metrics about thread states were very useful over time specifically dealing with blocked/waiting/timed_waiting. This helps greatly in understanding how things fare during different traffic scenarios. Although we could argue that only dumps will help in understanding what/which threads are in different states, it is true for most of the metrics use cases where things start with "metric" and then we move to traces/logs. Also, this is very useful at the infrastructure level too where infra/framework teams can use these things to have an optimal resource configuration.

Also, is there any discussion for ThreadMXBean.html#getTotalStartedThreadCount that you are ware of?

@arminru arminru assigned trask and unassigned arminru Jun 13, 2023
@trask
Copy link
Member

trask commented Jun 13, 2023

hi @lenin-jaganathan!

the JVM runtime metrics WG isn't considering any additions until the JVM runtime metrics have been marked stable

Adding opt-in metric attributes, and adding new metrics are both things that can be added post-stability.

@lenin-jaganathan
Copy link
Contributor Author

@trask Thanks for the response.

I just wanted to kick start a conversation on this since this seems like a useful metric to have. Also, are there any tentative timelines around stability?

@trask
Copy link
Member

trask commented Aug 28, 2023

I realized that since adding new dimensions to a metric is considered a breaking semconv change, we should probably make a decision on this pre-stability

@trask trask moved this from Not needed for initial stability to Todo in Spec: JVM runtime metric stability Aug 28, 2023
@aheling11
Copy link

aheling11 commented Aug 29, 2023

I would like to propose a possible use case.

For example, if we have a dashboard that displays the jvm.thread.count metric as a line chart, and there is a thread.state dimension. When an application is abnormal or during our routine inspections, we can check the dashboard and may notice that the number of blocked/waiting threads is abnormal. This gives us a direction to investigate the problem, and then we can use other tools to perform more detailed analysis.

@lenin-jaganathan
Copy link
Contributor Author

lenin-jaganathan commented Aug 29, 2023

+1 on @aheling11's comment. And this has proved very effective in the past. And so is "the rate of thread creation" to understand the behaviors of services and identify ineffective use of resources. This is more useful when we consider more than one service and monitor at infrastructure level.

@jack-berg
Copy link
Member

We record the thread state but use the attributes advice exclude recording it by default. That would allow users to opt into the attribute but avoid the extra dimensions by default.

@trask
Copy link
Member

trask commented Aug 31, 2023

@jonatan-ivanov does micrometer report thread state as a dimension?

@jonatan-ivanov
Copy link
Member

@trask Yes, in Prometheus format it looks like this:

jvm_threads_states_threads{state="runnable"} 7.0
jvm_threads_states_threads{state="blocked"} 0.0
jvm_threads_states_threads{state="waiting"} 13.0
jvm_threads_states_threads{state="timed-waiting"} 9.0
jvm_threads_states_threads{state="new"} 0.0
jvm_threads_states_threads{state="terminated"} 0.0

@breedx-splk
Copy link
Contributor

My $0.02 - I think this is a good idea. Having the state as a dimension provides quite a bit of insight.

@trask
Copy link
Member

trask commented Aug 31, 2023

Discussed in Java SIG, consensus is to add thread.state (or maybe jvm.thread.state) attribute by default to the jvm.thread.count metric (prior to stability).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

8 participants