Add support for dropwizard format #2846

atzoum · 2017-05-23T15:59:06Z

Required for all PRs:

Signed CLA.
Associated README.md updated.
Has appropriate unit tests.

atzoum · 2017-05-24T07:59:18Z

docker image available here

nhaugo · 2017-06-01T21:02:02Z

Can you please provide additional information as to what dropwizard is, including documentation.

atzoum · 2017-06-01T22:12:51Z

Dropwizard metrics, previously known as codehale or yammer metrics, is a popular java library for capturing jvm and application level metrics. Apache Kafka for instance is a well known open source project which is internally using the dropwizard (yammer at the time) library to collect and report its own metrics.
Dropwizard has support for a number of different metric types such as gauges, meters, histograms, etc. and provides a set of ready-to-use metric reporters including the well-known jmx reporter.
However, dropwizard metrics also provide a "standard" json format for representing its metrics registry and this is what this pull request is all about:
The implemented telegraf dropwizard format is able to parse a full dropwizard metric registry represented in the "standard" json format at once (eg. using a kafka or http input) in a much more efficient manner than scraping custom jmx attributes through telegraf's jmx input plugin.
Finally, it is worth noting that the current implementation is already used in our production environment for consuming dropwizard metrics stored in json format from kafka topics with much success.

danielnelson · 2017-06-08T00:08:59Z

@atzoum I have some config changes I'm working on that will improve the way we do Parsers/Serializers and allow us to add new ones more easily, without modifying the code in internal. Lets revisit this once this PR once I finish that.

atzoum · 2017-06-08T07:55:03Z

@danielnelson no problem, do you have a PR already that I can track?

danielnelson · 2017-06-08T18:46:36Z

No code yet, still in early stages. The issue for it is #272, I'll try to remember to update here as well otherwise ping me again in about a month.

danielnelson · 2017-10-03T19:36:18Z

@atzoum Thanks for your patience, due to how long it's taking me to finish the config changes I think we should go ahead and work on getting this included. I noticed that #2369 and #3292 could both make use of this Parser, in addition to it's use with inputs supporting data_format. Before I look over the code though, I'd like to propose a few changes to the schema.

One issue with the hierarchical measurements dropwizard uses is that they can be difficult to query because all of the information is embedded in the measurement name. We should try to normalize information out of the measurement name into other areas of the metric to simplify queries.

To start off with I would move the metric type into a tag:

- counter.measurement,tag1=green,tag2=yellow count=1 1487766783662000000
+ measurement,tag1=green,tag2=yellow,metric_type=counter count=1 1487766783662000000

Another change would be similar to the graphite parser
where we use a template system to translate the metrics. I believe we could use this with few modifications. This would allow us to make translations like:

- vm.memory.pools.Compressed-Class-Space.max value=0
- vm.memory.pools.Compressed-Class-Space.used value=1
+ vm,memory_pool=Compressed-Class-Space max=0 used=1

atzoum · 2017-10-04T09:40:35Z

@danielnelson indeed moving the metric type into a tag is an improvement and I can do this asap.

With regards to using a template system similar to the one that graphite parser is using, I am afraid I don't understand how this should work in practice since in dropwizard, with the exception of counters and gauges, all metric types have multiple fields. Thus, I don't think that we can really aggregate multiple dropwizard measurements into a single telegraf/influx measurement. I guess we could benefit from a templating mechanism for extracting tags from dropwizard's measurement names, however the current behavior described in the documentation seems to at least cover all the use cases that we have used it for

Tags are parsed from metric names as if they were actual influxdb line protocol keys (measurement<,tag_set>)

danielnelson · 2017-10-04T22:09:11Z

So, if I recall correctly, all the segments that you classify as field are concatenated together using the separator and if there are multiple named fields they are appended as well.

separator = "_"
templates = [
    "measurement.measurement.field.field"
]

{
    "metrics": {
        "counters": {
            "jenkins.job.building.duration": {
                "max": 75.0,
                "mean": 42.0
            }
        }
    }
}

jenkins_job,metric_type=timer building_duration_mean=42.0 building_duration_max=75.0

I missed the "measurement,tag1=green" style format on my first read through. This is a pretty nice way to encode tags and is similar to our extensions to statsd format, but I think this style is probably very rare when working with existing services where you don't have full control over the measurement names.

ctrlok · 2017-10-05T06:40:16Z

I'm not sure that we can do measurement like this from dropwizzard, because "jenkins.job.building.duration" is a full metric name and can be anything like just "duration" or "pokemons". So, from dropwizzard you can only set full metric name as measurement. I.E.

jenkins_job_building_duration,metric_type=timer mean=42.0 max=75.0

BTW, I can do what you want in my Jenkins plugin without using dropwizzard input parser.

atzoum · 2017-10-05T07:44:05Z

@ctrlok is correct, field names are readily available in the dropwizard format, I don't see any reason why we should do anything special in order to parse them from the measurement name.
The only place where a template could be valuable would be to extract tags from the measurement name, if measurement names don't already adhere to the default line protocol format.

danielnelson · 2017-10-05T19:03:36Z

I agree that you can't apply a complex template in general, but that's why the template is an option that can be specialized for particular services. The default template would be "measurement*" which would give the metric like above.

My Jenkins example is bad perhaps, but with the Jenkins input wouldn't you rather, at the very least, have the data categorized into measurement names such as jenkins, http, and vm just for organization? I worry that have so many measurement names will make using the measurements difficult especially in GUIs like Chronograf or Grafana.

jenkins,metric=job_building_duration,metric_type=timer mean=42.0 max=75.0

In addition to organization, if a user can converge multiple counters and gauges into a fewer series it will decrease the cost of storage and memory use on the database and you can only use functions and operators easily with fields on the same series.

Here is an example in the Jenkins input where several series could be reduced easily:

vm.memory.heap.committed.value
vm.memory.heap.init.value
vm.memory.heap.max.value
vm.memory.heap.usage.value
vm.memory.heap.used.value
vm.memory.non-heap.committed.value
vm.memory.non-heap.init.value
vm.memory.non-heap.max.value
vm.memory.non-heap.usage.value
vm.memory.non-heap.used.value

vm_memory,pool=heap committed=1,init=2,max=3,usage=4,used=5
vm_memory,pool=non-heap committed=1,init=2,max=3,usage=4,used=5

atzoum · 2017-10-06T17:30:23Z

ok, I haven't seen the templating code in graphite yet, do we want to reuse, is this easy, or simply duplicate and modify appropriately?

danielnelson · 2017-10-06T21:57:19Z

Honestly, it is probably not trivial. The code is in plugins/parsers/graphite/parser.go, perhaps we can extract the matcher and template types into internal/templates.go and share it. Might also want to look at plugins/inputs/statsd/statsd.go.

grange74 · 2017-12-01T05:05:00Z

Sometimes a Dropwizard app returns a metric float value which is a very small exponential number (e.g 5.647645996854652E-23). It is also discussed here dropwizard/metrics#1106.

It would be nice if it would possible to format the float values (or atleast be able to specify the number of decimal places).

danielnelson · 2017-12-01T18:50:11Z

@grange74 This is annoying, and I know with the line protocol serializer we currently will output it in non scientific notation which can result in very long lines. I think this should probably be the job of the output plugin to handle. What outputs/data_formats are you using where this is a problem?

grange74 · 2017-12-04T03:26:42Z

@danielnelson we are only currently using the influxdb output so we aren't getting any problems like the cloudwatch ones referenced in dropwizard/metrics#1106.

It would just be nice to be able to format the float values to reduce the payloads that we send to influxdb as in most cases we only need 3 decimal places instead of the 15-18 that are often returned.

Here is an example from a basic dropwizard app:

meters" : {
    "ch.qos.logback.core.Appender.all" : {
      "count" : 16,
      "m15_rate" : 3.112334326772315,
      "m1_rate" : 2.109570016641421,
      "m5_rate" : 2.9441421268138344,
      "mean_rate" : 0.5247118152721372,
      "units" : "events/second"
    }
}

"timers" : {
    "io.dropwizard.jetty.MutableServletContextHandler.get-requests" : {
      "count" : 2,
      "max" : 0.10200000000000001,
      "mean" : 0.059500000000000004,
      "min" : 0.017,
      "p50" : 0.10200000000000001,
      "p75" : 0.10200000000000001,
      "p95" : 0.10200000000000001,
      "p98" : 0.10200000000000001,
      "p99" : 0.10200000000000001,
      "p999" : 0.10200000000000001,
      "stddev" : 0.0425,
      "m15_rate" : 0.002191574188921952,
      "m1_rate" : 0.027072376727683665,
      "m5_rate" : 0.006394670392516757,
      "mean_rate" : 0.0677765092949584,
      "duration_units" : "seconds",
      "rate_units" : "calls/second"
    }
}

danielnelson · 2017-12-04T20:28:55Z

We could add an option to control the max precision of the line protocol serializer, but I'm not sure if it is useful enough to warrant extending the interface. What would be interesting check is if we took a batch of 1000 average lines, and compared the size with with 3 digits of precision vs 15. We should compare the size with gzip compression (make sure you are using this if you are concerned about payload size).

atzoum · 2018-01-04T17:45:08Z

@danielnelson I managed to find some time to work on the agreed changes:

Move the metric type into a tag.
Extract the template matching engine from Graphite and move it to internal/templating.
Reuse the extracted templating engine in Dropwizard parser.
Update the documentation accordingly.

Please review

…arators

danielnelson · 2018-01-05T22:37:28Z

.gitignore

@@ -5,3 +5,4 @@ tivan
 .idea
 *~
 *#
+.vscode


Can you remove this, you could put it into a ~/.gitignore file and add an excludesfile to your ~/.gitconfig. I'm going to remove some of the other entries in here that don't belong.

danielnelson · 2018-01-05T22:39:08Z

plugins/parsers/dropwizard/parser.go

+		if timeFormat == "" {
+			timeFormat = time.RFC3339
+		}
+		time, err := time.Parse(timeFormat, gjson.GetBytes(buf, p.TimePath).String())


I think it should be an error if the time path is not found, or if the time is wrong format.

danielnelson · 2018-01-05T22:39:44Z

plugins/parsers/dropwizard/parser.go

+	var jsonOut map[string]interface{}
+	err := json.Unmarshal(registryBytes, &jsonOut)
+	if err != nil {
+		err = fmt.Errorf("unable to parse dropwizard metric registry from JSON document, %s", err)


I would like to see a more precise error message if the path is not found.

danielnelson · 2018-01-05T22:47:08Z

docs/DATA_FORMATS_INPUT.md

+dropwizard_time_format = "2006-01-02T15:04:05Z07:00"
+dropwizard_tags_path = "tags"
+## tag paths per tag are supported too, eg.
+#[inputs.yourinput.dropwizard_tag_paths]


Is it common for tags to be distributed throughout the document? We may want to remove this functionality and use the measurement filtering options (taginclude/tagexclude) if all we need to filter the tags.

No, this is not about filtering tags. The purpose of this optional configuration option is to allow the user to add tags which are not included in the measurement name but are contained in another property within the same json file.
I have an example in test case TestParseValidEmbeddedCounterJSON where the following json contains both arbitraty tags and a dropwizard metric registry like below

{ "time" : "2017-02-22T14:33:03.662+02:00", "tags" : { "tag1" : "green", "tag2" : "yellow" }, "metrics" : { "counters" : { "measurement" : { "count" : 1 } }, "meters" : {}, "gauges" : {}, "histograms" : {}, "timers" : {} } }

We are already using this functionality in our prod environment and it is necessary for our use cases. I don't expect it to be frequently used by other users, however it is a nice addition and I don't see any harm in keeping it.

This example could be done with dropwizard_tags_path = "tags", do you have an example where you need dropwizard_tag_paths?

Lets say we want the value inside tag1 but want to rename the tag to mytag:

[inputs.yourinput.dropwizard_tag_paths] mytag = "tags.tag1"

The above config would add a tag mytag=green in all measurements

Do you see it being used outside of renaming? Is this something you are using right now?

As an aside, we will probably want to have a general purpose renaming method in the future, probably as a processor plugin, but no ETA on that.

Given that filtering can be done already through taginclude and tagexclude, renaming is the only use case for it. Yes we are using this on two occasions to consolidate metrics from different sources (produced by different teams).
Indeed, tag renaming is a cross-cutting concern.

danielnelson added the triage label May 23, 2017

atzoum force-pushed the dropwizard branch from 9dd89f2 to 5a07f9d Compare May 24, 2017 07:15

danielnelson added feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin and removed triage labels Sep 13, 2017

atzoum mentioned this pull request Sep 28, 2017

Add integration for service discovery & kv config stores (dynamic config) #272

Open

danielnelson mentioned this pull request Oct 3, 2017

init jenkins input plugin #3292

Closed

danielnelson mentioned this pull request Dec 1, 2017

add support for pulling metrics from a dropwizard application #3531

Closed

3 tasks

atzoum force-pushed the dropwizard branch 5 times, most recently from d5bddb4 to 5f54499 Compare January 4, 2018 17:32

Aris Tzoumas (TRASYS) and others added 3 commits January 5, 2018 10:56

Add support for dropwizard format

f2fb569

Move the metric type into a tag in Dropwizard parser

faab7a3

Extract templating system from graphite and allow arbitrary split sep…

46412c2

…arators

atzoum force-pushed the dropwizard branch 2 times, most recently from e221c75 to a173866 Compare January 5, 2018 09:11

Opt-in support for measurement and tag templates in Dropwizard parser

8f7c01e

atzoum force-pushed the dropwizard branch from a173866 to 8f7c01e Compare January 5, 2018 09:25

danielnelson reviewed Jan 5, 2018

View reviewed changes

danielnelson added this to the 1.6.0 milestone Jan 5, 2018

atzoum added 2 commits January 6, 2018 15:22

remove .vscode from .gitignore

afebe89

More precise error messages

1f9135c

danielnelson merged commit 317de40 into influxdata:master Jan 8, 2018

atzoum deleted the dropwizard branch January 9, 2018 07:22

rickard-von-essen mentioned this pull request Jan 12, 2018

Parse string type values in Dropwizard parser #3662

Closed

3 tasks

maxunt pushed a commit that referenced this pull request Jun 26, 2018

Add support for dropwizard input format (#2846)

05d691a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for dropwizard format #2846

Add support for dropwizard format #2846

atzoum commented May 23, 2017

atzoum commented May 24, 2017

nhaugo commented Jun 1, 2017

atzoum commented Jun 1, 2017 •

edited

Loading

danielnelson commented Jun 8, 2017

atzoum commented Jun 8, 2017

danielnelson commented Jun 8, 2017

danielnelson commented Oct 3, 2017

atzoum commented Oct 4, 2017

danielnelson commented Oct 4, 2017

ctrlok commented Oct 5, 2017

atzoum commented Oct 5, 2017 •

edited

Loading

danielnelson commented Oct 5, 2017

atzoum commented Oct 6, 2017

danielnelson commented Oct 6, 2017

grange74 commented Dec 1, 2017 •

edited

Loading

danielnelson commented Dec 1, 2017

grange74 commented Dec 4, 2017

danielnelson commented Dec 4, 2017

atzoum commented Jan 4, 2018

danielnelson Jan 5, 2018

danielnelson Jan 5, 2018

atzoum Jan 6, 2018

danielnelson Jan 5, 2018

atzoum Jan 6, 2018

danielnelson Jan 5, 2018

atzoum Jan 6, 2018 •

edited

Loading

danielnelson Jan 8, 2018

atzoum Jan 8, 2018

danielnelson Jan 8, 2018

atzoum Jan 8, 2018 •

edited

Loading

@@ @@ -5,3 +5,4 @@ tivan @@
               .idea
               *~
               *#
+              .vscode

Add support for dropwizard format #2846

Add support for dropwizard format #2846

Conversation

atzoum commented May 23, 2017

Required for all PRs:

atzoum commented May 24, 2017

nhaugo commented Jun 1, 2017

atzoum commented Jun 1, 2017 • edited Loading

danielnelson commented Jun 8, 2017

atzoum commented Jun 8, 2017

danielnelson commented Jun 8, 2017

danielnelson commented Oct 3, 2017

atzoum commented Oct 4, 2017

danielnelson commented Oct 4, 2017

ctrlok commented Oct 5, 2017

atzoum commented Oct 5, 2017 • edited Loading

danielnelson commented Oct 5, 2017

atzoum commented Oct 6, 2017

danielnelson commented Oct 6, 2017

grange74 commented Dec 1, 2017 • edited Loading

danielnelson commented Dec 1, 2017

grange74 commented Dec 4, 2017

danielnelson commented Dec 4, 2017

atzoum commented Jan 4, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atzoum Jan 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atzoum Jan 8, 2018 • edited Loading

Choose a reason for hiding this comment

atzoum commented Jun 1, 2017 •

edited

Loading

atzoum commented Oct 5, 2017 •

edited

Loading

grange74 commented Dec 1, 2017 •

edited

Loading

atzoum Jan 6, 2018 •

edited

Loading

atzoum Jan 8, 2018 •

edited

Loading