Implementing generic parser plugins and documentation

This constitutes a large change in how we will parse different data formats going forward (for the plugins that support it) This is working off @henrypfhu's changes.
influxdata · Feb 9, 2016 · 0064cc2 · 0064cc2
1 parent 1449c8b
commit 0064cc2
Show file tree

Hide file tree

Showing 32 changed files with 1,947 additions and 515 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -129,6 +129,52 @@ func init() {
 }
 ```
 
+## Input Plugins Accepting Arbitrary Data Formats
+
+Some input plugins (such as
+[exec](https://github.com/influxdata/telegraf/tree/master/plugins/inputs/exec))
+accept arbitrary input data formats. An overview of these data formats can
+be found
+[here](https://github.com/influxdata/telegraf/blob/master/DATA_FORMATS_INPUT.md).
+
+In order to enable this, you must specify a `SetParser(parser parsers.Parser)`
+function on the plugin object (see the exec plugin for an example), as well as
+defining `parser` as a field of the object.
+
+You can then utilize the parser internally in your plugin, parsing data as you
+see fit. Telegraf's configuration layer will take care of instantiating and
+creating the `Parser` object.
+
+You should also add the following to your SampleConfig() return:
+
+```toml
+  ### Data format to consume. This can be "json", "influx" or "graphite"
+  ### Each data format has it's own unique set of configuration options, read
+  ### more about them here:
+  ### https://github.com/influxdata/telegraf/blob/master/DATA_FORMATS.md
+  data_format = "influx"
+```
+
+Below is the `Parser` interface.
+
+```go
+// Parser is an interface defining functions that a parser plugin must satisfy.
+type Parser interface {
+    // Parse takes a byte buffer separated by newlines
+    // ie, `cpu.usage.idle 90\ncpu.usage.busy 10`
+    // and parses it into telegraf metrics
+    Parse(buf []byte) ([]telegraf.Metric, error)
+
+    // ParseLine takes a single string metric
+    // ie, "cpu.usage.idle 90"
+    // and parses it into a telegraf metric.
+    ParseLine(line string) (telegraf.Metric, error)
+}
+```
+
+And you can view the code
+[here.](https://github.com/influxdata/telegraf/blob/henrypfhu-master/plugins/parsers/registry.go)
+
 ## Service Input Plugins
 
 This section is for developers who want to create new "service" collection

diff --git a/DATA_FORMATS_INPUT.md b/DATA_FORMATS_INPUT.md
@@ -0,0 +1,257 @@
+# Telegraf Input Data Formats
+
+Telegraf metrics, like InfluxDB
+[points](https://docs.influxdata.com/influxdb/v0.10/write_protocols/line/),
+are a combination of four basic parts:
+
+1. Measurement Name
+1. Tags
+1. Fields
+1. Timestamp
+
+These four parts are easily defined when using InfluxDB line-protocol as a
+data format. But there are other data formats that users may want to use which
+require more advanced configuration to create usable Telegraf metrics.
+
+Plugins such as `exec` and `kafka_consumer` parse textual data. Up until now,
+these plugins were statically configured to parse just a single
+data format. `exec` mostly only supported parsing JSON, and `kafka_consumer` only
+supported data in InfluxDB line-protocol.
+
+But now we are normalizing the parsing of various data formats across all
+plugins that can support it. You will be able to identify a plugin that supports
+different data formats by the presence of a `data_format` config option, for
+example, in the exec plugin:
+
+```toml
+[[inputs.exec]]
+  ### Commands array
+  commands = ["/tmp/test.sh", "/usr/bin/mycollector --foo=bar"]
+
+  ### measurement name suffix (for separating different commands)
+  name_suffix = "_mycollector"
+
+  ### Data format to consume. This can be "json", "influx" or "graphite"
+  ### Each data format has it's own unique set of configuration options, read
+  ### more about them here:
+  ### https://github.com/influxdata/telegraf/blob/master/DATA_FORMATS.md
+  data_format = "json"
+
+  ### Additional configuration options go here
+```
+
+Each data_format has an additional set of configuration options available, which
+I'll go over below.
+
+## Influx:
+
+There are no additional configuration options for InfluxDB line-protocol. The
+metrics are parsed directly into Telegraf metrics.
+
+## JSON:
+
+The JSON data format flattens JSON into metric _fields_. For example, this JSON:
+
+```json
+{
+    "a": 5,
+    "b": {
+        "c": 6
+    }
+}
+```
+
+Would get translated into _fields_ of a measurement:
+
+```
+myjsonmetric a=5,b_c=6
+```
+
+The _measurement_ _name_ is usually the name of the plugin,
+but can be overridden using the `name_override` config option.
+
+#### Configuration:
+
+The JSON data format supports specifying "tag keys". If specified, keys
+will be searched for in the root-level of the JSON blob. If the key(s) exist,
+they will be applied as tags to the Telegraf metrics.
+
+For example, if you had this configuration:
+
+```toml
+[[inputs.exec]]
+  ### Commands array
+  commands = ["/tmp/test.sh", "/usr/bin/mycollector --foo=bar"]
+
+  ### measurement name suffix (for separating different commands)
+  name_suffix = "_mycollector"
+
+  ### Data format to consume. This can be "json", "influx" or "graphite"
+  ### Each data format has it's own unique set of configuration options, read
+  ### more about them here:
+  ### https://github.com/influxdata/telegraf/blob/master/DATA_FORMATS.md
+  data_format = "json"
+
+  ### List of tag names to extract from top-level of JSON server response
+  tag_keys = [
+    "my_tag_1",
+    "my_tag_2"
+  ]
+```
+
+with this JSON output from a command:
+
+```json
+{
+    "a": 5,
+    "b": {
+        "c": 6
+    },
+    "my_tag_1": "foo"
+}
+```
+
+Your Telegraf metrics would get tagged with "my_tag_1"
+
+```
+exec_mycollector,my_tag_1=foo a=5,b_c=6
+```
+
+## Graphite:
+
+The Graphite data format translates graphite _dot_ buckets directly into
+telegraf measurement names, with a single value field, and without any tags. For
+more advanced options, Telegraf supports specifying "templates" to translate
+graphite buckets into Telegraf metrics.
+
+#### Separator:
+
+You can specify a separator to use for the parsed metrics.
+By default, it will leave the metrics with a "." separator.
+Setting `separator = "_"` will translate:
+
+```
+cpu.usage.idle 99
+=> cpu_usage_idle value=99
+```
+
+#### Measurement/Tag Templates:
+
+The most basic template is to specify a single transformation to apply to all
+incoming metrics. _measurement_ is a special keyword that tells Telegraf which
+parts of the graphite bucket to combine into the measurement name. It can have a
+trailing `*` to indicate that the remainder of the metric should be used.
+Other words are considered tag keys. So the following template:
+
+```toml
+templates = [
+    "region.measurement*"
+]
+```
+
+would result in the following Graphite -> Telegraf transformation.
+
+```
+us-west.cpu.load 100
+=> cpu.load,region=us-west value=100
+```
+
+#### Field Templates:
+
+There is also a _field_ keyword, which can only be specified once.
+The field keyword tells Telegraf to give the metric that field name.
+So the following template:
+
+```toml
+templates = [
+    "measurement.measurement.field.region"
+]
+```
+
+would result in the following Graphite -> Telegraf transformation.
+
+```
+cpu.usage.idle.us-west 100
+=> cpu_usage,region=us-west idle=100
+```
+
+#### Filter Templates:
+
+Users can also filter the template(s) to use based on the name of the bucket,
+using glob matching, like so:
+
+```toml
+templates = [
+    "cpu.* measurement.measurement.region",
+    "mem.* measurement.measurement.host"
+]
+```
+
+which would result in the following transformation:
+
+```
+cpu.load.us-west 100
+=> cpu_load,region=us-west value=100
+
+mem.cached.localhost 256
+=> mem_cached,host=localhost value=256
+```
+
+#### Adding Tags:
+
+Additional tags can be added to a metric that don't exist on the received metric.
+You can add additional tags by specifying them after the pattern.
+Tags have the same format as the line protocol.
+Multiple tags are separated by commas.
+
+```toml
+templates = [
+    "measurement.measurement.field.region datacenter=1a"
+]
+```
+
+would result in the following Graphite -> Telegraf transformation.
+
+```
+cpu.usage.idle.us-west 100
+=> cpu_usage,region=us-west,datacenter=1a idle=100
+```
+
+There are many more options available,
+[More details can be found here](https://github.com/influxdata/influxdb/tree/master/services/graphite#templates)
+
+#### Configuration:
+
+```toml
+[[inputs.exec]]
+  ### Commands array
+  commands = ["/tmp/test.sh", "/usr/bin/mycollector --foo=bar"]
+
+  ### measurement name suffix (for separating different commands)
+  name_suffix = "_mycollector"
+
+  ### Data format to consume. This can be "json", "influx" or "graphite" (line-protocol)
+  ### Each data format has it's own unique set of configuration options, read
+  ### more about them here:
+  ### https://github.com/influxdata/telegraf/blob/master/DATA_FORMATS.md
+  data_format = "graphite"
+
+  ### This string will be used to join the matched values.
+  separator = "_"
+
+  ### Each template line requires a template pattern. It can have an optional
+  ### filter before the template and separated by spaces. It can also have optional extra
+  ### tags following the template. Multiple tags should be separated by commas and no spaces
+  ### similar to the line protocol format. There can be only one default template.
+  ### Templates support below format:
+  ### 1. filter + template
+  ### 2. filter + template + extra tag
+  ### 3. filter + template with field key
+  ### 4. default template
+  templates = [
+    "*.app env.service.resource.measurement",
+    "stats.* .host.measurement* region=us-west,agent=sensu",
+    "stats2.* .host.measurement.field",
+    "measurement*"
+  ]
+```
diff --git a/Godeps b/Godeps
@@ -2,10 +2,8 @@ git.eclipse.org/gitroot/paho/org.eclipse.paho.mqtt.golang.git dbd8d5c40a582eb9ad
 github.com/Shopify/sarama d37c73f2b2bce85f7fa16b6a550d26c5372892ef
 github.com/Sirupsen/logrus f7f79f729e0fbe2fcc061db48a9ba0263f588252
 github.com/amir/raidman 6a8e089bbe32e6b907feae5ba688841974b3c339
-github.com/armon/go-metrics 345426c77237ece5dab0e1605c3e4b35c3f54757
 github.com/aws/aws-sdk-go 87b1e60a50b09e4812dee560b33a238f67305804
 github.com/beorn7/perks b965b613227fddccbfffe13eae360ed3fa822f8d
-github.com/boltdb/bolt ee4a0888a9abe7eefe5a0992ca4cb06864839873
 github.com/cenkalti/backoff 4dc77674aceaabba2c7e3da25d4c823edfb73f99
 github.com/dancannon/gorethink 6f088135ff288deb9d5546f4c71919207f891a70
 github.com/davecgh/go-spew 5215b55f46b2b919f50a1df0eaa5886afe4e3b3d
@@ -14,16 +12,12 @@ github.com/eapache/queue ded5959c0d4e360646dc9e9908cff48666781367
 github.com/fsouza/go-dockerclient 7b651349f9479f5114913eefbfd3c4eeddd79ab4
 github.com/go-ini/ini afbd495e5aaea13597b5e14fe514ddeaa4d76fc3
 github.com/go-sql-driver/mysql 7c7f556282622f94213bc028b4d0a7b6151ba239
-github.com/gogo/protobuf e8904f58e872a473a5b91bc9bf3377d223555263
 github.com/golang/protobuf 6aaa8d47701fa6cf07e914ec01fde3d4a1fe79c3
 github.com/golang/snappy 723cc1e459b8eea2dea4583200fd60757d40097a
 github.com/gonuts/go-shellquote e842a11b24c6abfb3dd27af69a17f482e4b483c2
 github.com/gorilla/context 1c83b3eabd45b6d76072b66b746c20815fb2872d
 github.com/gorilla/mux 26a6070f849969ba72b72256e9f14cf519751690
 github.com/hailocab/go-hostpool e80d13ce29ede4452c43dea11e79b9bc8a15b478
-github.com/hashicorp/go-msgpack fa3f63826f7c23912c15263591e65d54d080b458
-github.com/hashicorp/raft 057b893fd996696719e98b6c44649ea14968c811
-github.com/hashicorp/raft-boltdb d1e82c1ec3f15ee991f7cc7ffd5b67ff6f5bbaee
 github.com/influxdata/config bae7cb98197d842374d3b8403905924094930f24
 github.com/influxdata/influxdb 697f48b4e62e514e701ffec39978b864a3c666e6
 github.com/influxdb/influxdb 697f48b4e62e514e701ffec39978b864a3c666e6
@@ -56,4 +50,4 @@ golang.org/x/text 6d3c22c4525a4da167968fa2479be5524d2e8bd0
 gopkg.in/dancannon/gorethink.v1 6f088135ff288deb9d5546f4c71919207f891a70
 gopkg.in/fatih/pool.v2 cba550ebf9bce999a02e963296d4bc7a486cb715
 gopkg.in/mgo.v2 03c9f3ee4c14c8e51ee521a6a7d0425658dd6f64
-gopkg.in/yaml.v2 f7716cbe52baa25d2e9b0d0da546fcf909fc16b4
+gopkg.in/yaml.v2 f7716cbe52baa25d2e9b0d0da546fcf909fc16b4