Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Suggestion] - BlockList additions to Confluence JMX Exporter config #715

Open
1 task done
mattj-relativity opened this issue Nov 10, 2023 · 8 comments
Open
1 task done

Comments

@mattj-relativity
Copy link

Suggestion

A default deny may be warranted here, however, in the interest of having more data to do a root cause analysis, I'm going to start with the suggestion of some block list additions to the default JMX exporter config. I note we have some in JIRA under the now deprecated black list object names parameter. ( might want to update that as well )

Confluence seems to have some fairly nasty issues in the bean imports the jmx exporter does. Particularly as it pertains to:

com.atlassian.confluence.metrics

Example:

bean events of category01=logging
also seen in other metrics namespaces such as OneMinuteRate and many more.

com_atlassian_confluence_metrics_OneMinuteRate{category00="confluence",category01="logging",

We are exporting logging events to the prometheus exporter in the default jmx-config configmap configuration.
This results in any ingestion from that endpoint growing in infinite key cardinality, as well as an increasing delay in how long it takes to scrape the endpoint.

This isn't the only problematic metrics key we have. the category00=hazelcast keys tend to also create tons of unique traffic that grows infinitely. Again increasing the key cardinality on prometheus and infinitely increasing the scrape time of the exporter endpoint.

At the moment the default config for the jmx exporter is dangerous to directly integrate into prometheus. I began trying to put together a jmx-config config map that would help me blocklist the offenders. But to be frank, I am a bit outside my depth here, I would love some help.

As a starting point I have this in my values yaml:

jmxExporterCustomConfig:
    jmx-config:
      excludeObjectNames:
        - 'com.atlassian.confluence.metrics:category01=logging,*'
        - 'com.atlassian.confluence.metrics:category00=http,category01=rest,name=request,*'
        - 'com.atlassian.confluence.metrics:category00=bandana,*'
        - 'com.atlassian.confluence.metrics:category00=hazelcast,*'
        - 'com.atlassian.confluence.metrics.Value:category00=bandana,*'
        - 'com.atlassian.confluence.metrics.999thPercentile:category00=bandana,*'
        - 'com.atlassian.confluence.metrics.MeanRate:category00=bandana,*'
        - 'com.atlassian.confluence.metrics.OneMinuteRate:category00=bandana,*'
        - 'com.atlassian.confluence.metrics:category00=hazelcast,*'
        - 'com.atlassian.confluence.metrics.Value:category00=hazelcast,*'
        - 'com.atlassian.confluence.metrics.Count:category00=hazelcast,*'
        - 'com.atlassian.confluence.metrics.FifteenMinuteRate:category00=hazelcast,*'
        - 'com.atlassian.confluence.metrics.FiveMinuteRate:category00=hazelcast,*'
        - 'com.atlassian.confluence.metrics.OneMinuteRate:category:00=hazelcast,*'
        - 'com.atlassian.confluence.metrics.999thPercentile:category00=hazelcast,*'
        - 'com.atlassian.confluence.metrics.MeanRate:category00=hazelcast,*'
      rules:
        - pattern: '(java.lang)<type=(\w+)><>(\w+):'
          name: java_lang_$2_$3
        - pattern: 'java.lang<type=Memory><HeapMemoryUsage>(\w+)'
          name: java_lang_Memory_HeapMemoryUsage_$1
        - pattern: 'java.lang<name=G1 (\w+) Generation, type=GarbageCollector><>(\w+)'
          name: java_lang_G1_$1_Generation_$2
        - pattern: '.*'

but this seems to only deal with my logging events... I have tried a bunch of variations for clearing category00 stuff without success. I suspect I am just wrong about the syntax and this would be obvious to a java developer.

Anyways, I open the floor to suggestions, proposals, and rebukes. I would like to collaboratively dig through the list of metrics confluence is making available to the jmx exporter and block the dynamic growth ones.

Product

Confluence

Code of Conduct

  • I agree to follow this project's Code of Conduct
@bianchi2
Copy link
Collaborator

@mattj-relativity thanks for raising this issue. Indeed, metrics can be quite noisy and high cardinality can definitely be an issue. Did I get you right - your suggested excludeObjectNames did not work and you still see those metrics produced? So it's rather a matter fo finding the right syntax to exclude selected metrics?

@mattj-relativity
Copy link
Author

mattj-relativity commented Nov 13, 2023

Regarding syntax issues

- 'com.atlassian.confluence.metrics:category01=logging,*'
this line worked. but the attempts to handle anything in category00 have not worked. not sure why.

Regarding Scope of block list

I have not completed figuring out all the items i will need to block list. I would appreciate some feedback on how to approach figuring that out. ATM I am basically working against the existing attributes used in the grafana charts and what i can see as suspect in iterative exporter scrapes. Not even sure that's the right way to be approaching this.

Regarding severity of impact

I do want to be clear... this isn't just a performance bug. It effectively makes the jmx exporter unusable on confluence. in the default configuration under any load you will see the exporter trend toward infinity in terms of time it takes to scrape the exporter endpoint. It will eventually exceed any reasonable timeout in prometheus. You can increase the timeouts, until such time as it just endlessly times out. it will additionally generate such severe cardinality growth in prometheus that it will OOM prometheus.

@bianchi2
Copy link
Collaborator

@atl-mk I wonder if you can help here.

@bendeme
Copy link

bendeme commented Jan 11, 2024

I want to bump this and add that in the current state jmx-exporter config for confluence is unusable and dangerous to any prometheus setup having to ingest it. Maybe there should be a warning that this is not intended for production use until a reasonable exclusion list is added to control growth of metric scrape.

@bianchi2
Copy link
Collaborator

bianchi2 commented Jan 11, 2024

Asked a question here prometheus/jmx_exporter#904

As a workaround you can turn off Confluence specific JMX beans by passing -Dconfluence.jmx.disabled=true to JVM args. You will have JVM metrics only though (which in many cases is good enough for basic monitoring)

@bianchi2
Copy link
Collaborator

Another possible workaround I can think of is metricRelabelings in a ServiceMonitor. https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/running-exporters.md

I haven't had a chance to verify it and see if it's possible to drop metrics by category.

@ashishk1988
Copy link

Hello Team, I have been working on this locally and found the problem is with the string jmx-config being used in the configuration. The JMX-exporter doesn't expect any such string. A syntax like below should work flawlessly

jmxExporterCustomConfig:
     excludeObjectNames:
      - 'com.atlassian.confluence:category00=bandana,*'
      - 'com.atlassian.confluence:category00=hazelcast,*'
      - 'Confluence:name=CacheStatistics,*'
     rules:
      - pattern: '(java.lang)<type=(\w+)><>(\w+):'
        name: java_lang_$2_$3
      - pattern: 'java.lang<type=Memory><HeapMemoryUsage>(\w+)'
        name: java_lang_Memory_HeapMemoryUsage_$1
      - pattern: 'java.lang<name=G1 (\w+) Generation, type=GarbageCollector><>(\w+)'
        name: java_lang_G1_$1_Generation_$2
      - pattern: '.*'
  #  rules:
  #   - pattern: ".*" 

@bianchi2
Copy link
Collaborator

bianchi2 commented Jul 4, 2024

@mattj-relativity isn't it something that you have tried and it didn't work for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants