Skip to content

Commit

Permalink
Implementing generic parser plugins and documentation
Browse files Browse the repository at this point in the history
This constitutes a large change in how we will parse different data
formats going forward (for the plugins that support it)

This is working off @henrypfhu's changes.
  • Loading branch information
sparrc committed Feb 9, 2016
1 parent 1449c8b commit 0064cc2
Show file tree
Hide file tree
Showing 32 changed files with 1,947 additions and 515 deletions.
46 changes: 46 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,52 @@ func init() {
}
```

## Input Plugins Accepting Arbitrary Data Formats

Some input plugins (such as
[exec](https://github.com/influxdata/telegraf/tree/master/plugins/inputs/exec))
accept arbitrary input data formats. An overview of these data formats can
be found
[here](https://github.com/influxdata/telegraf/blob/master/DATA_FORMATS_INPUT.md).

In order to enable this, you must specify a `SetParser(parser parsers.Parser)`
function on the plugin object (see the exec plugin for an example), as well as
defining `parser` as a field of the object.

You can then utilize the parser internally in your plugin, parsing data as you
see fit. Telegraf's configuration layer will take care of instantiating and
creating the `Parser` object.

You should also add the following to your SampleConfig() return:

```toml
### Data format to consume. This can be "json", "influx" or "graphite"
### Each data format has it's own unique set of configuration options, read
### more about them here:
### https://github.com/influxdata/telegraf/blob/master/DATA_FORMATS.md
data_format = "influx"
```

Below is the `Parser` interface.

```go
// Parser is an interface defining functions that a parser plugin must satisfy.
type Parser interface {
// Parse takes a byte buffer separated by newlines
// ie, `cpu.usage.idle 90\ncpu.usage.busy 10`
// and parses it into telegraf metrics
Parse(buf []byte) ([]telegraf.Metric, error)

// ParseLine takes a single string metric
// ie, "cpu.usage.idle 90"
// and parses it into a telegraf metric.
ParseLine(line string) (telegraf.Metric, error)
}
```

And you can view the code
[here.](https://github.com/influxdata/telegraf/blob/henrypfhu-master/plugins/parsers/registry.go)

## Service Input Plugins

This section is for developers who want to create new "service" collection
Expand Down
257 changes: 257 additions & 0 deletions DATA_FORMATS_INPUT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,257 @@
# Telegraf Input Data Formats

Telegraf metrics, like InfluxDB
[points](https://docs.influxdata.com/influxdb/v0.10/write_protocols/line/),
are a combination of four basic parts:

1. Measurement Name
1. Tags
1. Fields
1. Timestamp

These four parts are easily defined when using InfluxDB line-protocol as a
data format. But there are other data formats that users may want to use which
require more advanced configuration to create usable Telegraf metrics.

Plugins such as `exec` and `kafka_consumer` parse textual data. Up until now,
these plugins were statically configured to parse just a single
data format. `exec` mostly only supported parsing JSON, and `kafka_consumer` only
supported data in InfluxDB line-protocol.

But now we are normalizing the parsing of various data formats across all
plugins that can support it. You will be able to identify a plugin that supports
different data formats by the presence of a `data_format` config option, for
example, in the exec plugin:

```toml
[[inputs.exec]]
### Commands array
commands = ["/tmp/test.sh", "/usr/bin/mycollector --foo=bar"]

### measurement name suffix (for separating different commands)
name_suffix = "_mycollector"

### Data format to consume. This can be "json", "influx" or "graphite"
### Each data format has it's own unique set of configuration options, read
### more about them here:
### https://github.com/influxdata/telegraf/blob/master/DATA_FORMATS.md
data_format = "json"

### Additional configuration options go here
```

Each data_format has an additional set of configuration options available, which
I'll go over below.

## Influx:

There are no additional configuration options for InfluxDB line-protocol. The
metrics are parsed directly into Telegraf metrics.

## JSON:

The JSON data format flattens JSON into metric _fields_. For example, this JSON:

```json
{
"a": 5,
"b": {
"c": 6
}
}
```

Would get translated into _fields_ of a measurement:

```
myjsonmetric a=5,b_c=6
```

The _measurement_ _name_ is usually the name of the plugin,
but can be overridden using the `name_override` config option.

#### Configuration:

The JSON data format supports specifying "tag keys". If specified, keys
will be searched for in the root-level of the JSON blob. If the key(s) exist,
they will be applied as tags to the Telegraf metrics.

For example, if you had this configuration:

```toml
[[inputs.exec]]
### Commands array
commands = ["/tmp/test.sh", "/usr/bin/mycollector --foo=bar"]

### measurement name suffix (for separating different commands)
name_suffix = "_mycollector"

### Data format to consume. This can be "json", "influx" or "graphite"
### Each data format has it's own unique set of configuration options, read
### more about them here:
### https://github.com/influxdata/telegraf/blob/master/DATA_FORMATS.md
data_format = "json"

### List of tag names to extract from top-level of JSON server response
tag_keys = [
"my_tag_1",
"my_tag_2"
]
```

with this JSON output from a command:

```json
{
"a": 5,
"b": {
"c": 6
},
"my_tag_1": "foo"
}
```

Your Telegraf metrics would get tagged with "my_tag_1"

```
exec_mycollector,my_tag_1=foo a=5,b_c=6
```

## Graphite:

The Graphite data format translates graphite _dot_ buckets directly into
telegraf measurement names, with a single value field, and without any tags. For
more advanced options, Telegraf supports specifying "templates" to translate
graphite buckets into Telegraf metrics.

#### Separator:

You can specify a separator to use for the parsed metrics.
By default, it will leave the metrics with a "." separator.
Setting `separator = "_"` will translate:

```
cpu.usage.idle 99
=> cpu_usage_idle value=99
```

#### Measurement/Tag Templates:

The most basic template is to specify a single transformation to apply to all
incoming metrics. _measurement_ is a special keyword that tells Telegraf which
parts of the graphite bucket to combine into the measurement name. It can have a
trailing `*` to indicate that the remainder of the metric should be used.
Other words are considered tag keys. So the following template:

```toml
templates = [
"region.measurement*"
]
```

would result in the following Graphite -> Telegraf transformation.

```
us-west.cpu.load 100
=> cpu.load,region=us-west value=100
```

#### Field Templates:

There is also a _field_ keyword, which can only be specified once.
The field keyword tells Telegraf to give the metric that field name.
So the following template:

```toml
templates = [
"measurement.measurement.field.region"
]
```

would result in the following Graphite -> Telegraf transformation.

```
cpu.usage.idle.us-west 100
=> cpu_usage,region=us-west idle=100
```

#### Filter Templates:

Users can also filter the template(s) to use based on the name of the bucket,
using glob matching, like so:

```toml
templates = [
"cpu.* measurement.measurement.region",
"mem.* measurement.measurement.host"
]
```

which would result in the following transformation:

```
cpu.load.us-west 100
=> cpu_load,region=us-west value=100
mem.cached.localhost 256
=> mem_cached,host=localhost value=256
```

#### Adding Tags:

Additional tags can be added to a metric that don't exist on the received metric.
You can add additional tags by specifying them after the pattern.
Tags have the same format as the line protocol.
Multiple tags are separated by commas.

```toml
templates = [
"measurement.measurement.field.region datacenter=1a"
]
```

would result in the following Graphite -> Telegraf transformation.

```
cpu.usage.idle.us-west 100
=> cpu_usage,region=us-west,datacenter=1a idle=100
```

There are many more options available,
[More details can be found here](https://github.com/influxdata/influxdb/tree/master/services/graphite#templates)

#### Configuration:

```toml
[[inputs.exec]]
### Commands array
commands = ["/tmp/test.sh", "/usr/bin/mycollector --foo=bar"]

### measurement name suffix (for separating different commands)
name_suffix = "_mycollector"

### Data format to consume. This can be "json", "influx" or "graphite" (line-protocol)
### Each data format has it's own unique set of configuration options, read
### more about them here:
### https://github.com/influxdata/telegraf/blob/master/DATA_FORMATS.md
data_format = "graphite"

### This string will be used to join the matched values.
separator = "_"

### Each template line requires a template pattern. It can have an optional
### filter before the template and separated by spaces. It can also have optional extra
### tags following the template. Multiple tags should be separated by commas and no spaces
### similar to the line protocol format. There can be only one default template.
### Templates support below format:
### 1. filter + template
### 2. filter + template + extra tag
### 3. filter + template with field key
### 4. default template
templates = [
"*.app env.service.resource.measurement",
"stats.* .host.measurement* region=us-west,agent=sensu",
"stats2.* .host.measurement.field",
"measurement*"
]
```
8 changes: 1 addition & 7 deletions Godeps
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,8 @@ git.eclipse.org/gitroot/paho/org.eclipse.paho.mqtt.golang.git dbd8d5c40a582eb9ad
github.com/Shopify/sarama d37c73f2b2bce85f7fa16b6a550d26c5372892ef
github.com/Sirupsen/logrus f7f79f729e0fbe2fcc061db48a9ba0263f588252
github.com/amir/raidman 6a8e089bbe32e6b907feae5ba688841974b3c339
github.com/armon/go-metrics 345426c77237ece5dab0e1605c3e4b35c3f54757
github.com/aws/aws-sdk-go 87b1e60a50b09e4812dee560b33a238f67305804
github.com/beorn7/perks b965b613227fddccbfffe13eae360ed3fa822f8d
github.com/boltdb/bolt ee4a0888a9abe7eefe5a0992ca4cb06864839873
github.com/cenkalti/backoff 4dc77674aceaabba2c7e3da25d4c823edfb73f99
github.com/dancannon/gorethink 6f088135ff288deb9d5546f4c71919207f891a70
github.com/davecgh/go-spew 5215b55f46b2b919f50a1df0eaa5886afe4e3b3d
Expand All @@ -14,16 +12,12 @@ github.com/eapache/queue ded5959c0d4e360646dc9e9908cff48666781367
github.com/fsouza/go-dockerclient 7b651349f9479f5114913eefbfd3c4eeddd79ab4
github.com/go-ini/ini afbd495e5aaea13597b5e14fe514ddeaa4d76fc3
github.com/go-sql-driver/mysql 7c7f556282622f94213bc028b4d0a7b6151ba239
github.com/gogo/protobuf e8904f58e872a473a5b91bc9bf3377d223555263
github.com/golang/protobuf 6aaa8d47701fa6cf07e914ec01fde3d4a1fe79c3
github.com/golang/snappy 723cc1e459b8eea2dea4583200fd60757d40097a
github.com/gonuts/go-shellquote e842a11b24c6abfb3dd27af69a17f482e4b483c2
github.com/gorilla/context 1c83b3eabd45b6d76072b66b746c20815fb2872d
github.com/gorilla/mux 26a6070f849969ba72b72256e9f14cf519751690
github.com/hailocab/go-hostpool e80d13ce29ede4452c43dea11e79b9bc8a15b478
github.com/hashicorp/go-msgpack fa3f63826f7c23912c15263591e65d54d080b458
github.com/hashicorp/raft 057b893fd996696719e98b6c44649ea14968c811
github.com/hashicorp/raft-boltdb d1e82c1ec3f15ee991f7cc7ffd5b67ff6f5bbaee
github.com/influxdata/config bae7cb98197d842374d3b8403905924094930f24
github.com/influxdata/influxdb 697f48b4e62e514e701ffec39978b864a3c666e6
github.com/influxdb/influxdb 697f48b4e62e514e701ffec39978b864a3c666e6
Expand Down Expand Up @@ -56,4 +50,4 @@ golang.org/x/text 6d3c22c4525a4da167968fa2479be5524d2e8bd0
gopkg.in/dancannon/gorethink.v1 6f088135ff288deb9d5546f4c71919207f891a70
gopkg.in/fatih/pool.v2 cba550ebf9bce999a02e963296d4bc7a486cb715
gopkg.in/mgo.v2 03c9f3ee4c14c8e51ee521a6a7d0425658dd6f64
gopkg.in/yaml.v2 f7716cbe52baa25d2e9b0d0da546fcf909fc16b4
gopkg.in/yaml.v2 f7716cbe52baa25d2e9b0d0da546fcf909fc16b4
Loading

0 comments on commit 0064cc2

Please sign in to comment.