Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for dropwizard format #2846

Merged
merged 6 commits into from
Jan 8, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Godeps
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@ github.com/StackExchange/wmi f3e2bae1e0cb5aef83e319133eabfee30013a4a5
github.com/streadway/amqp 63795daa9a446c920826655f26ba31c81c860fd6
github.com/stretchr/objx 1a9d0bb9f541897e62256577b352fdbc1fb4fd94
github.com/stretchr/testify 4d4bfba8f1d1027c4fdbe371823030df51419987
github.com/tidwall/gjson 0623bd8fbdbf97cc62b98d15108832851a658e59
github.com/tidwall/match 173748da739a410c5b0b813b956f89ff94730b4c
github.com/vjeantet/grok d73e972b60935c7fec0b4ffbc904ed39ecaf7efe
github.com/wvanbergen/kafka bc265fedb9ff5b5c5d3c0fdcef4a819b3523d3ee
github.com/wvanbergen/kazoo-go 968957352185472eacb69215fa3dbfcfdbac1096
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,7 @@ formats may be used with input plugins supporting the `data_format` option:
* [Value](./docs/DATA_FORMATS_INPUT.md#value)
* [Nagios](./docs/DATA_FORMATS_INPUT.md#nagios)
* [Collectd](./docs/DATA_FORMATS_INPUT.md#collectd)
* [Dropwizard](./docs/DATA_FORMATS_INPUT.md#dropwizard)

## Processor Plugins

Expand Down
174 changes: 174 additions & 0 deletions docs/DATA_FORMATS_INPUT.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Telegraf is able to parse the following input data formats into metrics:
1. [Value](https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md#value), ie: 45 or "booyah"
1. [Nagios](https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md#nagios) (exec input only)
1. [Collectd](https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md#collectd)
1. [Dropwizard](https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md#dropwizard)

Telegraf metrics, like InfluxDB
[points](https://docs.influxdata.com/influxdb/v0.10/write_protocols/line/),
Expand Down Expand Up @@ -479,3 +480,176 @@ You can also change the path to the typesdb or add additional typesdb using
## Path of to TypesDB specifications
collectd_typesdb = ["/usr/share/collectd/types.db"]
```

# Dropwizard:

The dropwizard format can parse the JSON representation of a single dropwizard metric registry. By default, tags are parsed from metric names as if they were actual influxdb line protocol keys (`measurement<,tag_set>`) which can be overriden by defining custom [measurement & tag templates](./DATA_FORMATS_INPUT.md#measurement--tag-templates). All field values are collected as float64 fields.

A typical JSON of a dropwizard metric registry:

```json
{
"version": "3.0.0",
"counters" : {
"measurement,tag1=green" : {
"count" : 1
}
},
"meters" : {
"measurement" : {
"count" : 1,
"m15_rate" : 1.0,
"m1_rate" : 1.0,
"m5_rate" : 1.0,
"mean_rate" : 1.0,
"units" : "events/second"
}
},
"gauges" : {
"measurement" : {
"value" : 1
}
},
"histograms" : {
"measurement" : {
"count" : 1,
"max" : 1.0,
"mean" : 1.0,
"min" : 1.0,
"p50" : 1.0,
"p75" : 1.0,
"p95" : 1.0,
"p98" : 1.0,
"p99" : 1.0,
"p999" : 1.0,
"stddev" : 1.0
}
},
"timers" : {
"measurement" : {
"count" : 1,
"max" : 1.0,
"mean" : 1.0,
"min" : 1.0,
"p50" : 1.0,
"p75" : 1.0,
"p95" : 1.0,
"p98" : 1.0,
"p99" : 1.0,
"p999" : 1.0,
"stddev" : 1.0,
"m15_rate" : 1.0,
"m1_rate" : 1.0,
"m5_rate" : 1.0,
"mean_rate" : 1.0,
"duration_units" : "seconds",
"rate_units" : "calls/second"
}
}
}
```

Would get translated into 4 different measurements:

```
measurement,metric_type=counter,tag1=green count=1
measurement,metric_type=meter count=1,m15_rate=1.0,m1_rate=1.0,m5_rate=1.0,mean_rate=1.0
measurement,metric_type=gauge value=1
measurement,metric_type=histogram count=1,max=1.0,mean=1.0,min=1.0,p50=1.0,p75=1.0,p95=1.0,p98=1.0,p99=1.0,p999=1.0
measurement,metric_type=timer count=1,max=1.0,mean=1.0,min=1.0,p50=1.0,p75=1.0,p95=1.0,p98=1.0,p99=1.0,p999=1.0,stddev=1.0,m15_rate=1.0,m1_rate=1.0,m5_rate=1.0,mean_rate=1.0
```

You may also parse a dropwizard registry from any JSON document which contains a dropwizard registry in some inner field.
Eg. to parse the following JSON document:

```json
{
"time" : "2017-02-22T14:33:03.662+02:00",
"tags" : {
"tag1" : "green",
"tag2" : "yellow"
},
"metrics" : {
"counters" : {
"measurement" : {
"count" : 1
}
},
"meters" : {},
"gauges" : {},
"histograms" : {},
"timers" : {}
}
}
```
and translate it into:

```
measurement,metric_type=counter,tag1=green,tag2=yellow count=1 1487766783662000000
```

you simply need to use the following additional configuration properties:

```toml
dropwizard_metric_registry_path = "metrics"
dropwizard_time_path = "time"
dropwizard_time_format = "2006-01-02T15:04:05Z07:00"
dropwizard_tags_path = "tags"
## tag paths per tag are supported too, eg.
#[inputs.yourinput.dropwizard_tag_paths]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it common for tags to be distributed throughout the document? We may want to remove this functionality and use the measurement filtering options (taginclude/tagexclude) if all we need to filter the tags.

Copy link
Contributor Author

@atzoum atzoum Jan 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is not about filtering tags. The purpose of this optional configuration option is to allow the user to add tags which are not included in the measurement name but are contained in another property within the same json file.
I have an example in test case TestParseValidEmbeddedCounterJSON where the following json contains both arbitraty tags and a dropwizard metric registry like below

{
	"time" : "2017-02-22T14:33:03.662+02:00",
	"tags" : {
		"tag1" : "green",
		"tag2" : "yellow"
	},
	"metrics" : {
		"counters" : 	{ 
			"measurement" : {
				"count" : 1
			}
		},
		"meters" : 		{},
		"gauges" : 		{},
		"histograms" : 	{},
		"timers" : 		{}
	}
}

We are already using this functionality in our prod environment and it is necessary for our use cases. I don't expect it to be frequently used by other users, however it is a nice addition and I don't see any harm in keeping it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example could be done with dropwizard_tags_path = "tags", do you have an example where you need dropwizard_tag_paths?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets say we want the value inside tag1 but want to rename the tag to mytag:

[inputs.yourinput.dropwizard_tag_paths]
  mytag = "tags.tag1"

The above config would add a tag mytag=green in all measurements

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see it being used outside of renaming? Is this something you are using right now?

As an aside, we will probably want to have a general purpose renaming method in the future, probably as a processor plugin, but no ETA on that.

Copy link
Contributor Author

@atzoum atzoum Jan 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that filtering can be done already through taginclude and tagexclude, renaming is the only use case for it. Yes we are using this on two occasions to consolidate metrics from different sources (produced by different teams).
Indeed, tag renaming is a cross-cutting concern.

# tag1 = "tags.tag1"
# tag2 = "tags.tag2"
```


For more information about the dropwizard json format see
[here](http://metrics.dropwizard.io/3.1.0/manual/json/).

#### Dropwizard Configuration:

```toml
[[inputs.exec]]
## Commands array
commands = ["curl http://localhost:8080/sys/metrics"]
timeout = "5s"

## Data format to consume.
## Each data format has its own unique set of configuration options, read
## more about them here:
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
data_format = "dropwizard"

## Used by the templating engine to join matched values when cardinality is > 1
separator = "_"

## Each template line requires a template pattern. It can have an optional
## filter before the template and separated by spaces. It can also have optional extra
## tags following the template. Multiple tags should be separated by commas and no spaces
## similar to the line protocol format. There can be only one default template.
## Templates support below format:
## 1. filter + template
## 2. filter + template + extra tag(s)
## 3. filter + template with field key
## 4. default template
## By providing an empty template array, templating is disabled and measurements are parsed as influxdb line protocol keys (measurement<,tag_set>)
templates = []

## You may use an appropriate [gjson path](https://github.com/tidwall/gjson#path-syntax)
## to locate the metric registry within the JSON document
# dropwizard_metric_registry_path = "metrics"

## You may use an appropriate [gjson path](https://github.com/tidwall/gjson#path-syntax)
## to locate the default time of the measurements within the JSON document
# dropwizard_time_path = "time"
# dropwizard_time_format = "2006-01-02T15:04:05Z07:00"

## You may use an appropriate [gjson path](https://github.com/tidwall/gjson#path-syntax)
## to locate the tags map within the JSON document
# dropwizard_tags_path = "tags"

## You may even use tag paths per tag
# [inputs.exec.dropwizard_tag_paths]
# tag1 = "tags.tag1"
# tag2 = "tags.tag2"

```
46 changes: 46 additions & 0 deletions internal/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -1272,6 +1272,47 @@ func buildParser(name string, tbl *ast.Table) (parsers.Parser, error) {
}
}

if node, ok := tbl.Fields["dropwizard_metric_registry_path"]; ok {
if kv, ok := node.(*ast.KeyValue); ok {
if str, ok := kv.Value.(*ast.String); ok {
c.DropwizardMetricRegistryPath = str.Value
}
}
}
if node, ok := tbl.Fields["dropwizard_time_path"]; ok {
if kv, ok := node.(*ast.KeyValue); ok {
if str, ok := kv.Value.(*ast.String); ok {
c.DropwizardTimePath = str.Value
}
}
}
if node, ok := tbl.Fields["dropwizard_time_format"]; ok {
if kv, ok := node.(*ast.KeyValue); ok {
if str, ok := kv.Value.(*ast.String); ok {
c.DropwizardTimeFormat = str.Value
}
}
}
if node, ok := tbl.Fields["dropwizard_tags_path"]; ok {
if kv, ok := node.(*ast.KeyValue); ok {
if str, ok := kv.Value.(*ast.String); ok {
c.DropwizardTagsPath = str.Value
}
}
}
c.DropwizardTagPathsMap = make(map[string]string)
if node, ok := tbl.Fields["dropwizard_tag_paths"]; ok {
if subtbl, ok := node.(*ast.Table); ok {
for name, val := range subtbl.Fields {
if kv, ok := val.(*ast.KeyValue); ok {
if str, ok := kv.Value.(*ast.String); ok {
c.DropwizardTagPathsMap[name] = str.Value
}
}
}
}
}

c.MetricName = name

delete(tbl.Fields, "data_format")
Expand All @@ -1282,6 +1323,11 @@ func buildParser(name string, tbl *ast.Table) (parsers.Parser, error) {
delete(tbl.Fields, "collectd_auth_file")
delete(tbl.Fields, "collectd_security_level")
delete(tbl.Fields, "collectd_typesdb")
delete(tbl.Fields, "dropwizard_metric_registry_path")
delete(tbl.Fields, "dropwizard_time_path")
delete(tbl.Fields, "dropwizard_time_format")
delete(tbl.Fields, "dropwizard_tags_path")
delete(tbl.Fields, "dropwizard_tag_paths")

return parsers.NewParser(c)
}
Expand Down
86 changes: 86 additions & 0 deletions internal/templating/engine.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
package templating

import (
"sort"
"strings"
)

const (
// DefaultSeparator is the default separation character to use when separating template parts.
DefaultSeparator = "."
)

// Engine uses a Matcher to retrieve the appropriate template and applies the template
// to the input string
type Engine struct {
joiner string
matcher *matcher
}

// Apply extracts the template fields from the given line and returns the measurement
// name, tags and field name
func (e *Engine) Apply(line string) (string, map[string]string, string, error) {
return e.matcher.match(line).Apply(line, e.joiner)
}

// NewEngine creates a new templating engine
func NewEngine(joiner string, defaultTemplate *Template, templates []string) (*Engine, error) {
engine := Engine{
joiner: joiner,
matcher: newMatcher(defaultTemplate),
}
templateSpecs := parseTemplateSpecs(templates)

for _, templateSpec := range templateSpecs {
if err := engine.matcher.addSpec(templateSpec); err != nil {
return nil, err
}
}

return &engine, nil
}

func parseTemplateSpecs(templates []string) templateSpecs {
tmplts := templateSpecs{}
for _, pattern := range templates {
tmplt := templateSpec{
separator: DefaultSeparator,
}

// Format is [separator] [filter] <template> [tag1=value1,tag2=value2]
parts := strings.Fields(pattern)
partsLength := len(parts)
if partsLength < 1 {
// ignore
continue
}
if partsLength == 1 {
tmplt.template = pattern
} else if partsLength == 4 {
tmplt.separator = parts[0]
tmplt.filter = parts[1]
tmplt.template = parts[2]
tmplt.tagstring = parts[3]
} else {
hasTagstring := strings.Contains(parts[partsLength-1], "=")
if hasTagstring {
tmplt.tagstring = parts[partsLength-1]
tmplt.template = parts[partsLength-2]
if partsLength == 3 {
tmplt.filter = parts[0]
}
} else {
tmplt.template = parts[partsLength-1]
if partsLength == 2 {
tmplt.filter = parts[0]
} else { // length == 3
tmplt.separator = parts[0]
tmplt.filter = parts[1]
}
}
}
tmplts = append(tmplts, tmplt)
}
sort.Sort(tmplts)
return tmplts
}
Loading