-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for dropwizard format #2846
Conversation
docker image available here |
Can you please provide additional information as to what dropwizard is, including documentation. |
Dropwizard metrics, previously known as codehale or yammer metrics, is a popular java library for capturing jvm and application level metrics. Apache Kafka for instance is a well known open source project which is internally using the dropwizard (yammer at the time) library to collect and report its own metrics. |
@atzoum I have some config changes I'm working on that will improve the way we do Parsers/Serializers and allow us to add new ones more easily, without modifying the code in internal. Lets revisit this once this PR once I finish that. |
@danielnelson no problem, do you have a PR already that I can track? |
No code yet, still in early stages. The issue for it is #272, I'll try to remember to update here as well otherwise ping me again in about a month. |
@atzoum Thanks for your patience, due to how long it's taking me to finish the config changes I think we should go ahead and work on getting this included. I noticed that #2369 and #3292 could both make use of this Parser, in addition to it's use with inputs supporting One issue with the hierarchical measurements dropwizard uses is that they can be difficult to query because all of the information is embedded in the measurement name. We should try to normalize information out of the measurement name into other areas of the metric to simplify queries. To start off with I would move the metric type into a tag: - counter.measurement,tag1=green,tag2=yellow count=1 1487766783662000000
+ measurement,tag1=green,tag2=yellow,metric_type=counter count=1 1487766783662000000 Another change would be similar to the graphite parser - vm.memory.pools.Compressed-Class-Space.max value=0
- vm.memory.pools.Compressed-Class-Space.used value=1
+ vm,memory_pool=Compressed-Class-Space max=0 used=1 |
@danielnelson indeed moving the metric type into a tag is an improvement and I can do this asap. With regards to using a template system similar to the one that graphite parser is using, I am afraid I don't understand how this should work in practice since in dropwizard, with the exception of counters and gauges, all metric types have multiple fields. Thus, I don't think that we can really aggregate multiple dropwizard measurements into a single telegraf/influx measurement. I guess we could benefit from a templating mechanism for extracting tags from dropwizard's measurement names, however the current behavior described in the documentation seems to at least cover all the use cases that we have used it for
|
So, if I recall correctly, all the segments that you classify as separator = "_"
templates = [
"measurement.measurement.field.field"
] {
"metrics": {
"counters": {
"jenkins.job.building.duration": {
"max": 75.0,
"mean": 42.0
}
}
}
}
I missed the |
I'm not sure that we can do measurement like this from dropwizzard, because "jenkins.job.building.duration" is a full metric name and can be anything like just "duration" or "pokemons". So, from dropwizzard you can only set full metric name as measurement. I.E.
BTW, I can do what you want in my Jenkins plugin without using dropwizzard input parser. |
@ctrlok is correct, field names are readily available in the dropwizard format, I don't see any reason why we should do anything special in order to parse them from the measurement name. |
I agree that you can't apply a complex template in general, but that's why the template is an option that can be specialized for particular services. The default template would be "measurement*" which would give the metric like above. My Jenkins example is bad perhaps, but with the Jenkins input wouldn't you rather, at the very least, have the data categorized into measurement names such as
In addition to organization, if a user can converge multiple counters and gauges into a fewer series it will decrease the cost of storage and memory use on the database and you can only use functions and operators easily with fields on the same series. Here is an example in the Jenkins input where several series could be reduced easily:
|
ok, I haven't seen the templating code in graphite yet, do we want to reuse, is this easy, or simply duplicate and modify appropriately? |
Honestly, it is probably not trivial. The code is in |
Sometimes a Dropwizard app returns a metric float value which is a very small exponential number (e.g 5.647645996854652E-23). It is also discussed here dropwizard/metrics#1106. It would be nice if it would possible to format the float values (or atleast be able to specify the number of decimal places). |
@grange74 This is annoying, and I know with the line protocol serializer we currently will output it in non scientific notation which can result in very long lines. I think this should probably be the job of the output plugin to handle. What outputs/data_formats are you using where this is a problem? |
@danielnelson we are only currently using the influxdb output so we aren't getting any problems like the cloudwatch ones referenced in dropwizard/metrics#1106. It would just be nice to be able to format the float values to reduce the payloads that we send to influxdb as in most cases we only need 3 decimal places instead of the 15-18 that are often returned. Here is an example from a basic dropwizard app:
|
We could add an option to control the max precision of the line protocol serializer, but I'm not sure if it is useful enough to warrant extending the interface. What would be interesting check is if we took a batch of 1000 average lines, and compared the size with with 3 digits of precision vs 15. We should compare the size with gzip compression (make sure you are using this if you are concerned about payload size). |
d5bddb4
to
5f54499
Compare
@danielnelson I managed to find some time to work on the agreed changes:
Please review |
e221c75
to
a173866
Compare
.gitignore
Outdated
@@ -5,3 +5,4 @@ tivan | |||
.idea | |||
*~ | |||
*# | |||
.vscode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this, you could put it into a ~/.gitignore
file and add an excludesfile to your ~/.gitconfig
. I'm going to remove some of the other entries in here that don't belong.
plugins/parsers/dropwizard/parser.go
Outdated
if timeFormat == "" { | ||
timeFormat = time.RFC3339 | ||
} | ||
time, err := time.Parse(timeFormat, gjson.GetBytes(buf, p.TimePath).String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be an error if the time path is not found, or if the time is wrong format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
var jsonOut map[string]interface{} | ||
err := json.Unmarshal(registryBytes, &jsonOut) | ||
if err != nil { | ||
err = fmt.Errorf("unable to parse dropwizard metric registry from JSON document, %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see a more precise error message if the path is not found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dropwizard_time_format = "2006-01-02T15:04:05Z07:00" | ||
dropwizard_tags_path = "tags" | ||
## tag paths per tag are supported too, eg. | ||
#[inputs.yourinput.dropwizard_tag_paths] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it common for tags to be distributed throughout the document? We may want to remove this functionality and use the measurement filtering options (taginclude/tagexclude) if all we need to filter the tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is not about filtering tags. The purpose of this optional configuration option is to allow the user to add tags which are not included in the measurement name but are contained in another property within the same json file.
I have an example in test case TestParseValidEmbeddedCounterJSON
where the following json contains both arbitraty tags and a dropwizard metric registry like below
{
"time" : "2017-02-22T14:33:03.662+02:00",
"tags" : {
"tag1" : "green",
"tag2" : "yellow"
},
"metrics" : {
"counters" : {
"measurement" : {
"count" : 1
}
},
"meters" : {},
"gauges" : {},
"histograms" : {},
"timers" : {}
}
}
We are already using this functionality in our prod environment and it is necessary for our use cases. I don't expect it to be frequently used by other users, however it is a nice addition and I don't see any harm in keeping it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example could be done with dropwizard_tags_path = "tags"
, do you have an example where you need dropwizard_tag_paths
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets say we want the value inside tag1
but want to rename the tag to mytag
:
[inputs.yourinput.dropwizard_tag_paths]
mytag = "tags.tag1"
The above config would add a tag mytag=green
in all measurements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you see it being used outside of renaming? Is this something you are using right now?
As an aside, we will probably want to have a general purpose renaming method in the future, probably as a processor plugin, but no ETA on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that filtering can be done already through taginclude
and tagexclude
, renaming is the only use case for it. Yes we are using this on two occasions to consolidate metrics from different sources (produced by different teams).
Indeed, tag renaming is a cross-cutting concern.
Required for all PRs: