Attempt to align and cleanup some jmx metrics #11621

SylvainJuge · 2024-06-18T13:21:56Z

This is an attempt to fix a few errors and inconsistencies that I've found in the JMX metrics captured with the JMX Metric Insight feature.

I have intentionally limited the scope to tomcat, jetty and wildfly, but similar changes might be applied to other systems as a follow-up.

clarify the strategy for pre-defined metrics
simplify metric prefix for tomcat to tomcat. for consistency with jetty, wildfly, ...
fix tomcat busy/idle threads, as explained in this discussion.
rename metrics to use singular form to fit (experimental) metrics semconv recommendations.
rename units to use singular form to fit (stable) units semconv recommendations.
move tomcat request-related metrics to tomcat.request.* namespace for consistency with wildfly.request.*
align tomcat on system.network.io with tomcat.network.io for transferred bytes
align wildfly on system.network.io with: wildfly.network.io for transferred bytes (only direction attribute had to be changed).

For the overall strategy, I agree that covering every metric of every platform is not possible nor something we aim to. For example, with Wildfly the db pool exposes more than 50 attributes that could be captured as metrics.

I think one of the important things that could make this type of mapping somehow manageable over time is to use the following strategy for metric names and their attributes:

use or align to semconv when it fits
keep the MBean attribute name otherwise: it allows to preserve the semantics of the observed system without having to try re-defining common metrics or deal with subtle implementation details.

Checklist & follow-ups

wildfly.db.client.connection check with impl. that state can be a partition active/idle/wait (in which case using a single metric + attribute would make sense), but the wildfly documentation seems to imply it's not the case.
wildfly.db.client.connection should use the db.client.connections.state from semconv for the connectíon state.
- update: plural form was removed in semconv in Rename db.client.connections.* attributes to db.client.connection.* semantic-conventions#1125, to be released in 1.27.
fix case for wildfly.db.client.transaction.NumberOfTransactions should probably be using MBean attribute so numberOfTransactions
maybe try to fix the database semconv metrics attributes to use singular form, for example db.client.connections.pool.name should probably be renamed to db.client.connections.pool.name
- update: already covered in Rename db.client.connections.* attributes to db.client.connection.* semantic-conventions#1125

PeterF778 · 2024-06-18T16:44:37Z

instrumentation/jmx-metrics/javaagent/src/main/resources/jmx/rules/wildfly.yaml

        metricAttribute:
-          state: const(used)
+          db.client.connection.state: const(used)


Unless I'm missing something, the metric will have a metric attribute which will always have the same constant value. What is really the point of having such an attribute?

You are correct here, having a constant metric attribute only makes sense to provide breakdown of the same metric, occurences of wildfly.db.client.connection.usage should be renamed to wildfly.db.client.connection.count. I am currently trying to validate with the implementation if there is an overlap between the connection states or if it's an effective partition (in which case we can provide a breakdown).

When looking at the implementation and the wildfly test cases it seems that we have:

"available" seems close to the definition of "idle"

"in use" seems close to the definition of "busy"

"active" seems to mostly be the sum of "available" + "in use"

for "wait count", I haven't found any proper definition besides the docs

https://github.com/wildfly/wildfly/blob/841fea771567a71d06490a0d7e9a398dc6fdf5c0/testsuite/integration/basic/src/test/java/org/jboss/as/test/integration/jca/statistics/DataSourcePoolClearStatisticsTestCase.java#L69

https://github.com/wildfly/wildfly/blob/841fea771567a71d06490a0d7e9a398dc6fdf5c0/testsuite/integration/basic/src/test/java/org/jboss/as/test/integration/jca/capacitypolicies/ResourceAdapterCapacityPoliciesTestCase.java#L143

Given that both InUseCount and IdleCount refer to physical connection states in documentation, they effectively form a partition and then using a common metric with a constant attribute for idle | used makes sense.

For WaitCount this is about the number of logical connections that are waiting for a physical connection, so it would also make sense to use the same metric with a custom wait constant attribute.

When aggregating and removing attributes this would return the total number of logical connections to the database pool, and only a subset with either idle or used attribute value would be the physical connections.

I have updated this PR to match this in fb71fa1

PeterF778 · 2024-06-18T18:12:39Z

Originally, JMX Metric Insight borrowed the metric definitions from JMX Metric Gatherer, and was bug-for-bug compatible. This was caused equally by our laziness as by the desire to allow users to transition smoothly to in-process metric collection.
I do not know how popular JMX Metric Insight is, but I know from experience that changes to metric names/attributes can sometimes be painful for the users. Perhaps it will be helpful for the customers if we keep the old metric configuration files around for some time as tomcat_old or tomcat_legacy etc.

SylvainJuge · 2024-06-19T09:20:44Z

Originally, JMX Metric Insight borrowed the metric definitions from JMX Metric Gatherer, and was bug-for-bug compatible. This was caused equally by our laziness as by the desire to allow users to transition smoothly to in-process metric collection. I do not know how popular JMX Metric Insight is, but I know from experience that changes to metric names/attributes can sometimes be painful for the users. Perhaps it will be helpful for the customers if we keep the old metric configuration files around for some time as tomcat_old or tomcat_legacy etc.

I completely understand the duplication strategy here, but it's probably time to remove the duplication and simplify things:

there are a couple of issues about this Consolidate/reuse JMX implementation from instrumentation opentelemetry-java-contrib#736 and Unify metric collection between JMX Metrics Gatherer and JMX Metrics Insight #9765
JMX gatherer supports more target systems than the JMX insight
changing anything requires two PRs in two repositories, for example JMX metrics for Tomcat with 'Tomcat' JMX domain #10115 and In Tomcat Version 10.1.19 Spring Boot 3.2.4 Tomcat Bean Name Is Tomca… opentelemetry-java-contrib#1269
it seems to me that the static yaml file definition covers what is currently captured through groovy scripts in JMX gatherer (I could be completely wrong on this one)

Until we have such duplication removed, we will have to backport such changes in the contrib repo. Implementation-wise, a common implementation would likely reside in the contrib repo and be included in the instrumentation agent (there are already similar dependencies for the aws and gcp resource providers).

Regarding compatibility, I really don't know what should be the best approach here, all the JMX metrics are very dependent on implementation details, having any formal definition in semconv and stability status for them is not possible. Maybe keeping previous iterations of the yaml files could provide this.

SylvainJuge · 2024-06-26T15:05:54Z

Status following June 20th SIG meeting:

we need to first align the implementations in contrib/instrumentation while preserving current metrics compatibility
we can provide updated version of the metrics in the instrumentation side with opt-in to use those new definitions (by default on the current state for compatibility)
switching to the new metrics could be aligned with the next major that should be around the stable database semconv.
this PR will stay in draft until then, parts of it will of course be reused along the way.

SylvainJuge added 13 commits June 18, 2024 09:40

tomcat prefix + fix busy/idle threads

02b5448

tomcat align network metric/attributes

cf15127

wildfly align 'sessions' in plural form

d450d9a

wildfly align network.io with semconv attributes

5c7688b

update documentation

4b966cc

clarify doc

2349720

doc again

cc85a16

reformat example

64e9cdd

jetty singular thread metrics

aa41343

tomcat move to 'request' namespace

3f9b71e

tomcat singular session

fe09251

singular units + tomcat fix

931032b

fix tomcat request reference + clarify a bit

313ed49

github-actions bot requested a review from theletterf June 18, 2024 13:22

SylvainJuge added 3 commits June 18, 2024 15:26

fix doc for tomcat.thread.*

545a066

wildfly align with upcoming semconv

cb27853

wildfly transaction name fits mbean attribute

286d784

PeterF778 reviewed Jun 18, 2024

View reviewed changes

simplify metrics for wildfly

fb71fa1

SylvainJuge mentioned this pull request Jul 3, 2024

Refactor & merge JMX gatherer & insight implementations open-telemetry/opentelemetry-java-contrib#1362

Open

24 tasks

SylvainJuge mentioned this pull request Sep 3, 2024

JMX implementation : feature parity for target systems #12158

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt to align and cleanup some jmx metrics #11621

Attempt to align and cleanup some jmx metrics #11621

SylvainJuge commented Jun 18, 2024 •

edited

Loading

PeterF778 Jun 18, 2024

SylvainJuge Jun 19, 2024

SylvainJuge Jun 19, 2024

PeterF778 commented Jun 18, 2024

SylvainJuge commented Jun 19, 2024

SylvainJuge commented Jun 26, 2024

Attempt to align and cleanup some jmx metrics #11621

Are you sure you want to change the base?

Attempt to align and cleanup some jmx metrics #11621

Conversation

SylvainJuge commented Jun 18, 2024 • edited Loading

Checklist & follow-ups

PeterF778 Jun 18, 2024

Choose a reason for hiding this comment

SylvainJuge Jun 19, 2024

Choose a reason for hiding this comment

SylvainJuge Jun 19, 2024

Choose a reason for hiding this comment

PeterF778 commented Jun 18, 2024

SylvainJuge commented Jun 19, 2024

SylvainJuge commented Jun 26, 2024

SylvainJuge commented Jun 18, 2024 •

edited

Loading