Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a generic jmx/jolokia metricset to metricbeat #3051

Closed
wants to merge 14 commits into from

Conversation

vas78
Copy link

@vas78 vas78 commented Nov 22, 2016

This is a generic metricset for fetching data from jmx using jolokia.

@elasticsearch-release
Copy link

Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run.

1 similar comment
@elasticmachine
Copy link
Collaborator

Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run.

@ruflin ruflin added the Metricbeat Metricbeat label Nov 23, 2016
@ruflin
Copy link
Member

ruflin commented Nov 28, 2016

@vas78 Thanks a lot for this contribution.

So far we always predefined the data structure for each metricset in the fields.yml so can guarantee the mapping in elasticsearch is correct. With jmx this becomes a little bit tricky. Few questions as I'm not too familiar with the output coming from Jolokia / JMX:

  • Is there a base structure / common fields that are always the same?
  • Config is currently quite tricky to get it right. Do you see an option to simplify this?

@ruflin ruflin added the in progress Pull request is currently in progress. label Nov 28, 2016
@vas78
Copy link
Author

vas78 commented Nov 28, 2016

@ruflin I'm afraid that's exacly the point: it's up to beats user to decide which metrics he/she would like to collect with jolokia. I could image adding some common fields like uptime, heap size, java process cpu usage etc, but I`m not quite sure if it makes any sense here.

To the second question: I could try converting the current solution with an embedded json in mapping into an array of values like "metric:alias", e.g.:

-
  host: localhost:4008
  mapping:
      "java.lang:type=Runtime:::Uptime":"uptime"
      "java.lang:type=Memory:::HeapMemoryUsage":"memory.heap_usage"
      "java.lang:type=Memory:::NonHeapMemoryUsage":"memory.non_heap_usage"
  application: application1
  instance: instance1

What do you think?

@ruflin
Copy link
Member

ruflin commented Dec 6, 2016

@vas78 Sorry for the late answer. Some more thoughts form my side:

We recently introduce raw fields in Metricbeat. This is for all event data we do not know the type of in advance or didn't create mapping for the index template. It would be nice if we could push some generic data for all events into a specific mapping and have all the other data under raw for the moment.

Looking at your events above it seems like that data for a specific MBean is known in advance. So what if we would split the above up in 2 parts:

  • Generic JMX connector
  • Specific MBean Metricsets

In the above case we would have for example:

modules:
- module: jolokia
  metricsets: ["runtime", "memory"]

In your test cases you also have Cassandra data. So the cassandra module would use the jmx connection as a library but I assume for Cassandra we know in advance what the structure of the events is.

The customer mapping definition of the jmx module goes in a similar direction as the request here: #2987 There are still lots of discussion ongoing on how to make this happen best in a generic way. Inputs are more then welcome.

@vas78 Does the above assumptions make sense related to JMX?

@radoondas Mentioning you here as I know you were also looking in JMX in the past.

@vas78
Copy link
Author

vas78 commented Dec 7, 2016

@ruflin Ok, makes sense. I will split up the module into some "standard" metricsets like memory, runtime, etc. and one "free-style" metricset that can be configured as desired.

@ruflin
Copy link
Member

ruflin commented Dec 7, 2016

@vas78 Great. Potentially you can split it up in 2 PR's because we will potentially have some additional discussions on the free-style part but that should not block the standard part to get in.

@ruflin ruflin mentioned this pull request Dec 12, 2016
@ruflin
Copy link
Member

ruflin commented Dec 12, 2016

@vas78 Here is a first idea on the example of mysql on how a "generic" metricset could work: #3170

@ruflin
Copy link
Member

ruflin commented Dec 15, 2016

@vas78 Anything I could help here with? It would be really nice to get support for jolokia / jmx in metricbeat.

@vas78
Copy link
Author

vas78 commented Dec 19, 2016

@ruflin sorry, it`s taking a bit longer than I though especially due to all the deadlines just before holidays :( I will provide an update soon.

@ruflin
Copy link
Member

ruflin commented Dec 20, 2016

@vas78 No worries. Looking forward to the changes.

@ruflin
Copy link
Member

ruflin commented Jan 13, 2017

@vas78 Did you have any chance to work on this one recently?

@vas78
Copy link
Author

vas78 commented Jan 14, 2017

@ruflin I`ve made two attempts to refactor the code and split up the module into "standard" and "custom" part, but did not really like the result. Yes, there are some metrics in MBean that are always present in every JVM, but one of the most important of these "standard" metrics - GC - depends heavily on the GC strategy configured in the application. Hard coding all of these strategies makes not that much sense since we already have a dynamic metrics mapping that solves this problem. The rest of these "standard" metrics are just some CPU and threading usage stats that are not that valuable alone in this context.

Your proposal regarding the possible future use case for custom jolokia module is embedding it into other application specific modules, e.g. cassandra, kafka, etc. and hence providing a strict mapping for each application metric set. I still think it`ll be pretty hard to guess which metrics would the "common" user pick for each application and this could end up in the constant flow of requests to add this one and that one extra metric. So why not let the users decide and configure it for themselves from the very beginning? The only difficulty in that case is really just creating/adjusting the right index mapping in Elastic which is quite simple in that case (I had to do that multiple times daily in the very beginning for every application we currently monitor with beats).

Having said all that, I would like to return to the original PR and focus on the dynamic jolokia module, rather than creating new static metricsets. Maybe I could in fact simplify the config and make it more user-friendly? Please let me know what you think.

@ruflin
Copy link
Member

ruflin commented Jan 17, 2017

@vas78 Thanks for pushing forward on this one. The reason for the fixed mapping is that we try to have a good out of the box experience whenever possible. But I see your point that with jolokia this can become quite tricky. So 👍 on moving forward with the initial proposal and figure out some better options for how to configure it.

In the meantime we also have the prometheus/collector metricset wich builds the foundation of such a "dynamic" metricset: https://github.com/elastic/beats/tree/master/metricbeat/module/prometheus/collector The most important part here is that it can set the namespace: https://github.com/elastic/beats/blob/master/metricbeat/module/prometheus/collector/collector.go#L107 Could you update your PR with making the namespace possible? I will have a look again on your PR with the above in mind.

@ruflin
Copy link
Member

ruflin commented Jan 17, 2017

I looked at the configuration again and here are some ideas on what we could do:

JSON based:

- host: localhost:4008
  mapping: '{"java.lang:type=Runtime:::Uptime":"uptime","java.lang:type=GarbageCollector,name=ConcurrentMarkSweep:::CollectionTime":"gc.cms_collection_time","java.lang:type=GarbageCollector,name=ConcurrentMarkSweep:::CollectionCount":"gc.cms_collection_count","java.lang:type=GarbageCollector,name=ParNew:::CollectionTime":"gc.parnew_collection_time","java.lang:type=GarbageCollector,name=ParNew:::CollectionCount":"gc.parnew_collection_count","java.lang:type=Memory:::HeapMemoryUsage":"memory.heap_usage","java.lang:type=Memory:::NonHeapMemoryUsage":"memory.non_heap_usage"}'

One line per metric:

- host: localhost:4008
  mappings:
    - "java.lang:type=Runtime:::Uptime":"uptime"
    - "java.lang:type=GarbageCollector,name=ConcurrentMarkSweep:::CollectionTime":"gc.cms_collection_time"
    - "java.lang:type=GarbageCollector,name=ConcurrentMarkSweep:::CollectionCount":"gc.cms_collection_count"
    - "java.lang:type=GarbageCollector,name=ParNew:::CollectionTime":"gc.parnew_collection_time"
    - "java.lang:type=GarbageCollector,name=ParNew:::CollectionCount":"gc.parnew_collection_count"
    - "java.lang:type=Memory:::HeapMemoryUsage":"memory.heap_usage"
    - "java.lang:type=Memory:::NonHeapMemoryUsage":"memory.non_heap_usage"

Abstracted to YAML:

- host: localhost:4008
  mappings:
    - type: Runtime
      attr: Uptime
      field: uptime
    - type: GarbageCollector
      name: ConcurrentMarkSweep
      attr: CollectionTime
      field: gc.cms_collection_time
    - type: GarbageCollector
      name: ConcurrentMarkSweep
      attr: CollectionCount
      field: gc.cms_collection_count
    - type: GarbageCollector
      name: ParNew
      attr: CollectionTime
      field: gc.parnew_collection_time

The above would allow, to potentially add the field type which would allow us to create a template out of it. So we could potentially create a template on the first run and push it to elasticsearch.

- host: localhost:4008
  mappings:
    - type: Runtime
      attr: Uptime
      field: uptime
      field_type: integer

My favorite option is currently the Abstract YAML one because I think it is the most extentable and it does not require the user to get the strings correctly, like 3 : etc.

@ruflin
Copy link
Member

ruflin commented Jan 17, 2017

@nfrankel Would be interesting to get your input on this one as you recently published https://blog.frankel.ch/feedback-on-feeding-spring-boot-metrics-to-elasticsearch/#gsc.tab=0 Which way would work best for you?

func marshalJSONRequest(this SliceSet) string {
result := "["
for mbean, attributes := range this {
singleRequest := "{\"type\":\"read\",\"mbean\":\"" + mbean + "\",\"attribute\":["
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be careful when taking inserting the MBean's ObjectName as it can contain all kind of weired characters, including quotes which would make this request illegal json. You would need to escape included quotes with \

Attribute names are much more regular, but still quotes are allowed according to the JMX specs, so it wouldn't harm to quote them, too.

@vas78
Copy link
Author

vas78 commented Jan 19, 2017

@ruflin Actually I liked the idea of the abstracted yaml since it eliminates the necessity to use some custom delimiter for mbean and attribute values that the "one line per metric" approach requires. And since we will still need not just the mbean object name itself but the attribute and it`s mapping as well, what would you say to this one?

-
  host: localhost:4008
  mapping:
    - mbean: 'java.lang:type=Runtime'
      attributes:
        - attr: Uptime
          field: uptime
    - mbean: 'java.lang:type=GarbageCollector,name=ConcurrentMarkSweep'
      attributes:
        - attr: CollectionTime
          field: gc.cms_collection_time
        - attr: CollectionCount
          field: gc.cms_collection_count

This would allow lazy configurers like @nfrankel (and myself :)) to copy-paste the mbean name from jconsole on the one hand and group all attributes belonging to the the same mbean and thus improve config readability and maintainability on the other.
Later on we could also add an optional field_type element for dynamic template generation as originally proposed by @ruflin.

@ruflin
Copy link
Member

ruflin commented Jan 19, 2017

@vas78 SGTM. You guys are here the experts :-)

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review comments. Seems that Github never sent them (or I didn't click properly).

@@ -0,0 +1,6 @@
- module: jmx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call the module jolokia (as this is the service we connect to) and the namespace will be dynamically defined as part of the config.

@@ -0,0 +1,6 @@
- module: jmx
metricsets: ["jolokia"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could call the "dynamic" metricset also collector as we do in prometheus. Or do you have some better ideas here?

@ruflin
Copy link
Member

ruflin commented Jan 23, 2017

@vas78 It seems some generate files are out of date. Try to run make update and push the changes again.

Can you ping me when I should have a closer look again?

Copy link

@nfrankel nfrankel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are generated files committed?

@vas78
Copy link
Author

vas78 commented Jan 29, 2017

@ruflin Renamed the module and ran make update, but the "check" CI Target still fails, can you please take a look at it? Thanks!

@ruflin
Copy link
Member

ruflin commented Jan 31, 2017

@vas78 For the check build part, make update should solve your problem. Can you run it again and make sure you don't get any local diffs left?

For the other test failing. This could be old flaky one. Any chance that you could rebase on master? This should also resolved the conflicts above (files were removed in a recent PR).

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work on this. I left you some minor comments.

@@ -121,6 +121,14 @@ metricbeat.modules:
#period: 10s
#hosts: ["tcp://127.0.0.1:14567"]

#------------------------------- Jolokia Module ------------------------------
- module: jolokia
metricsets: ["dynamic"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we call it jmx? Not sure about the name :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore that and lets go for dynamic at the moment (except you like jmx better).

enabled: true
period: 10s
hosts: ["localhost"]
namespace: "jolokia_metrics"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets call this metrics as jolokia will be already part of the namespace anyways.



[float]
=== jolokia.dynamic._namespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to document this one. This field will be removed and placed under metricset.namespace.


Namespace

[float]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these 2 fields really exist in all events?

@@ -46,7 +46,13 @@ metricbeat.modules:
period: 10s
processes: ['.*']


#------------------------------- Jolokia Module ------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use short_config: falsein the module fields.yml it will not show up here. See https://github.com/elastic/beats/blob/master/metricbeat/module/kafka/_meta/fields.yml#L7

@@ -0,0 +1,7 @@
- module: jolokia
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we commend out the config by default so it is not enabled. This might change in the near future: https://github.com/elastic/beats/blob/master/metricbeat/module/kafka/_meta/config.yml

},
"jolokia": {
"dynamic": {
"gc": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be updated / removed to not cause confusion.

event := map[string]interface{}{}

if application != "" {
event["application"] = application
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this is really a predefined field. Except for putting it under a fixed namespace I don't have a good idea yet where we could put these and how to document it.

In docker we have a similar problem where several metricsets also have the container info. We use some tricks there to add this info the the evend and have it then under the container metricset. Perhaps we should also introduce such a namespace here. jolokia.?.application. Like this we can also document it as it will not be under the dynamic namespace. Some suggestions with what we should replace ?

title: "Jolokia"
description: >
Jolokia Module
fields:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// New create a new instance of the MetricSet
func New(base mb.BaseMetricSet) (mb.MetricSet, error) {
// Additional configuration options
config := struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin
Copy link
Member

ruflin commented Feb 8, 2017

@vas78 I was thinking if we can perhaps help you with this PR to get it into metricbeat? Let me know if we should take over from here or if we can help you in any way.

@vas78
Copy link
Author

vas78 commented Feb 8, 2017

@ruflin Sorry, I was totally overloaded with some other projects lately and had no time to look into this one. Would be great if you can take it over from here, many thanks!

@ruflin
Copy link
Member

ruflin commented Feb 9, 2017

@vas78 Great, will take over from here and ping you in case I have some questions :-)

@ruflin
Copy link
Member

ruflin commented Feb 10, 2017

@vas78 Here is my WIP follow up PR: #3570

@ruflin
Copy link
Member

ruflin commented Feb 10, 2017

@vas78 I'm currently work on the jmx module and discovered the following. We have mappings with an array of mapping under it in the config each with a host. This kind of breaks the module principle as the hosts are configured on the module level already. If someone needs different mapping types for different hosts, multiple modules should be defined. So I suggest to simplify the config as following:

  hosts: ["localhost:8778"]
  mapping:
    - mbean: 'java.lang:type=Runtime'
      attributes:
        - attr: Uptime
          field: uptime
    - mbean: 'java.lang:type=GarbageCollector,name=ConcurrentMarkSweep'
      attributes:
        - attr: CollectionTime
          field: gc.cms_collection_time
        - attr: CollectionCount
          field: gc.cms_collection_count
    - mbean: 'java.lang:type=Memory'
      attributes:
        - attr: HeapMemoryUsage
          field: memory.heap_usage
        - attr: NonHeapMemoryUsage
          field: memory.non_heap_usage

Do I miss something here that requires the additional mappings?

@vas78
Copy link
Author

vas78 commented Feb 10, 2017

@ruflin You got it all right, there's no other reason to have mappings. No objections from my side, let's keep the config simple.

@ruflin
Copy link
Member

ruflin commented Feb 13, 2017

@vas78 Thanks. An other question: I'm trying to get my head around the application and instance config and fields. What are these exactly used for? Could these also just go under fields?

@vas78
Copy link
Author

vas78 commented Feb 13, 2017

@ruflin These config items were useful in the first implementation with multiple hosts under mappings. This way it was possible to set an appropriate application and instance for each host and thus be able to distinguish between multiple apps/instances running on the same host but under different URLs (e.g. different Jolokia ports).
Now, since we decided to proceed with one host per config on the top level these fields can be indeed extracted and moved over to the standard fields section.

@ruflin
Copy link
Member

ruflin commented Feb 13, 2017

Thanks for the feedback. Will remove the fields then.

ruflin pushed a commit to ruflin/beats that referenced this pull request Feb 14, 2017
This is the implementation of a module for Jolokia which contains a dynamic jmx metricset.

An example configuration looks as following:
```
- module: jolokia
  metricsets: ["jmx"]
  enabled: true
  period: 1s
  hosts: ["localhost:8778"]
  namespace: "metrics"
  jmx.mappings:
    - mbean: 'java.lang:type=Runtime'
      attributes:
        - attr: Uptime
          field: uptime
    - mbean: 'java.lang:type=GarbageCollector,name=ConcurrentMarkSweep'
      attributes:
        - attr: CollectionTime
          field: gc.cms_collection_time
        - attr: CollectionCount
          field: gc.cms_collection_count
    - mbean: 'java.lang:type=Memory'
      attributes:
        - attr: HeapMemoryUsage
          field: memory.heap_usage
        - attr: NonHeapMemoryUsage
          field: memory.non_heap_usage
```

For each mbeat the attributes which should be fetched can be defined. The field defines under which field name the event will be put. The namespace defines the metricset namespace.

This PR replaces elastic#3051

Further changes:
* Added support for method and body to http helper
* Handle empty fields in generators. This happens for a module which only contains dynamic metricsets which is currently the case for jolokia.

TODO:
* [x] Add system tests
* [x] Check documentation
* [x] Add integration test
* [ ] Open issue for metricset which contains basic memory info
tsg pushed a commit that referenced this pull request Feb 14, 2017
This is the implementation of a module for Jolokia which contains a dynamic jmx metricset.

An example configuration looks as following:
```
- module: jolokia
  metricsets: ["jmx"]
  enabled: true
  period: 1s
  hosts: ["localhost:8778"]
  namespace: "metrics"
  jmx.mappings:
    - mbean: 'java.lang:type=Runtime'
      attributes:
        - attr: Uptime
          field: uptime
    - mbean: 'java.lang:type=GarbageCollector,name=ConcurrentMarkSweep'
      attributes:
        - attr: CollectionTime
          field: gc.cms_collection_time
        - attr: CollectionCount
          field: gc.cms_collection_count
    - mbean: 'java.lang:type=Memory'
      attributes:
        - attr: HeapMemoryUsage
          field: memory.heap_usage
        - attr: NonHeapMemoryUsage
          field: memory.non_heap_usage
```

For each mbeat the attributes which should be fetched can be defined. The field defines under which field name the event will be put. The namespace defines the metricset namespace.

This PR replaces #3051

Further changes:
* Added support for method and body to http helper
* Handle empty fields in generators. This happens for a module which only contains dynamic metricsets which is currently the case for jolokia.

TODO:
* [x] Add system tests
* [x] Check documentation
* [x] Add integration test
* [ ] Open issue for metricset which contains basic memory info
@ruflin
Copy link
Member

ruflin commented Feb 14, 2017

@vas78 #3570 just got merged. Thanks for all the work you have done on this one. We are very happy to have jolokia / jmx support in Metricbeat. Closing this one.

@ruflin ruflin closed this Feb 14, 2017
@ruflin
Copy link
Member

ruflin commented Feb 14, 2017

@vas78 @nfrankel @rhuss As discussed previously, as a next step I would also like to have some structured metricsets for Jolokia. For this I opened the issue here: #3585 But I will need your help with this one as I need your insights into which metrics exists for most installation. If you could provide me with some configurations for the jmx metricset that should become a Metricset on its own I would be more then happy to implement them. Lets continue the discussion in #3585

missmaggiemo pushed a commit to tubular/beats that referenced this pull request Mar 28, 2017
This is the implementation of a module for Jolokia which contains a dynamic jmx metricset.

An example configuration looks as following:
```
- module: jolokia
  metricsets: ["jmx"]
  enabled: true
  period: 1s
  hosts: ["localhost:8778"]
  namespace: "metrics"
  jmx.mappings:
    - mbean: 'java.lang:type=Runtime'
      attributes:
        - attr: Uptime
          field: uptime
    - mbean: 'java.lang:type=GarbageCollector,name=ConcurrentMarkSweep'
      attributes:
        - attr: CollectionTime
          field: gc.cms_collection_time
        - attr: CollectionCount
          field: gc.cms_collection_count
    - mbean: 'java.lang:type=Memory'
      attributes:
        - attr: HeapMemoryUsage
          field: memory.heap_usage
        - attr: NonHeapMemoryUsage
          field: memory.non_heap_usage
```

For each mbeat the attributes which should be fetched can be defined. The field defines under which field name the event will be put. The namespace defines the metricset namespace.

This PR replaces elastic#3051

Further changes:
* Added support for method and body to http helper
* Handle empty fields in generators. This happens for a module which only contains dynamic metricsets which is currently the case for jolokia.

TODO:
* [x] Add system tests
* [x] Check documentation
* [x] Add integration test
* [ ] Open issue for metricset which contains basic memory info
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in progress Pull request is currently in progress. Metricbeat Metricbeat
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants