-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implementing generic parser plugins and documentation
This constitutes a large change in how we will parse different data formats going forward (for the plugins that support it) This is working off @henrypfhu's changes.
- Loading branch information
Showing
32 changed files
with
1,947 additions
and
515 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,257 @@ | ||
# Telegraf Input Data Formats | ||
|
||
Telegraf metrics, like InfluxDB | ||
[points](https://docs.influxdata.com/influxdb/v0.10/write_protocols/line/), | ||
are a combination of four basic parts: | ||
|
||
1. Measurement Name | ||
1. Tags | ||
1. Fields | ||
1. Timestamp | ||
|
||
These four parts are easily defined when using InfluxDB line-protocol as a | ||
data format. But there are other data formats that users may want to use which | ||
require more advanced configuration to create usable Telegraf metrics. | ||
|
||
Plugins such as `exec` and `kafka_consumer` parse textual data. Up until now, | ||
these plugins were statically configured to parse just a single | ||
data format. `exec` mostly only supported parsing JSON, and `kafka_consumer` only | ||
supported data in InfluxDB line-protocol. | ||
|
||
But now we are normalizing the parsing of various data formats across all | ||
plugins that can support it. You will be able to identify a plugin that supports | ||
different data formats by the presence of a `data_format` config option, for | ||
example, in the exec plugin: | ||
|
||
```toml | ||
[[inputs.exec]] | ||
### Commands array | ||
commands = ["/tmp/test.sh", "/usr/bin/mycollector --foo=bar"] | ||
|
||
### measurement name suffix (for separating different commands) | ||
name_suffix = "_mycollector" | ||
|
||
### Data format to consume. This can be "json", "influx" or "graphite" | ||
### Each data format has it's own unique set of configuration options, read | ||
### more about them here: | ||
### https://github.com/influxdata/telegraf/blob/master/DATA_FORMATS.md | ||
data_format = "json" | ||
|
||
### Additional configuration options go here | ||
``` | ||
|
||
Each data_format has an additional set of configuration options available, which | ||
I'll go over below. | ||
|
||
## Influx: | ||
|
||
There are no additional configuration options for InfluxDB line-protocol. The | ||
metrics are parsed directly into Telegraf metrics. | ||
|
||
## JSON: | ||
|
||
The JSON data format flattens JSON into metric _fields_. For example, this JSON: | ||
|
||
```json | ||
{ | ||
"a": 5, | ||
"b": { | ||
"c": 6 | ||
} | ||
} | ||
``` | ||
|
||
Would get translated into _fields_ of a measurement: | ||
|
||
``` | ||
myjsonmetric a=5,b_c=6 | ||
``` | ||
|
||
The _measurement_ _name_ is usually the name of the plugin, | ||
but can be overridden using the `name_override` config option. | ||
|
||
#### Configuration: | ||
|
||
The JSON data format supports specifying "tag keys". If specified, keys | ||
will be searched for in the root-level of the JSON blob. If the key(s) exist, | ||
they will be applied as tags to the Telegraf metrics. | ||
|
||
For example, if you had this configuration: | ||
|
||
```toml | ||
[[inputs.exec]] | ||
### Commands array | ||
commands = ["/tmp/test.sh", "/usr/bin/mycollector --foo=bar"] | ||
|
||
### measurement name suffix (for separating different commands) | ||
name_suffix = "_mycollector" | ||
|
||
### Data format to consume. This can be "json", "influx" or "graphite" | ||
### Each data format has it's own unique set of configuration options, read | ||
### more about them here: | ||
### https://github.com/influxdata/telegraf/blob/master/DATA_FORMATS.md | ||
data_format = "json" | ||
|
||
### List of tag names to extract from top-level of JSON server response | ||
tag_keys = [ | ||
"my_tag_1", | ||
"my_tag_2" | ||
] | ||
``` | ||
|
||
with this JSON output from a command: | ||
|
||
```json | ||
{ | ||
"a": 5, | ||
"b": { | ||
"c": 6 | ||
}, | ||
"my_tag_1": "foo" | ||
} | ||
``` | ||
|
||
Your Telegraf metrics would get tagged with "my_tag_1" | ||
|
||
``` | ||
exec_mycollector,my_tag_1=foo a=5,b_c=6 | ||
``` | ||
|
||
## Graphite: | ||
|
||
The Graphite data format translates graphite _dot_ buckets directly into | ||
telegraf measurement names, with a single value field, and without any tags. For | ||
more advanced options, Telegraf supports specifying "templates" to translate | ||
graphite buckets into Telegraf metrics. | ||
|
||
#### Separator: | ||
|
||
You can specify a separator to use for the parsed metrics. | ||
By default, it will leave the metrics with a "." separator. | ||
Setting `separator = "_"` will translate: | ||
|
||
``` | ||
cpu.usage.idle 99 | ||
=> cpu_usage_idle value=99 | ||
``` | ||
|
||
#### Measurement/Tag Templates: | ||
|
||
The most basic template is to specify a single transformation to apply to all | ||
incoming metrics. _measurement_ is a special keyword that tells Telegraf which | ||
parts of the graphite bucket to combine into the measurement name. It can have a | ||
trailing `*` to indicate that the remainder of the metric should be used. | ||
Other words are considered tag keys. So the following template: | ||
|
||
```toml | ||
templates = [ | ||
"region.measurement*" | ||
] | ||
``` | ||
|
||
would result in the following Graphite -> Telegraf transformation. | ||
|
||
``` | ||
us-west.cpu.load 100 | ||
=> cpu.load,region=us-west value=100 | ||
``` | ||
|
||
#### Field Templates: | ||
|
||
There is also a _field_ keyword, which can only be specified once. | ||
The field keyword tells Telegraf to give the metric that field name. | ||
So the following template: | ||
|
||
```toml | ||
templates = [ | ||
"measurement.measurement.field.region" | ||
] | ||
``` | ||
|
||
would result in the following Graphite -> Telegraf transformation. | ||
|
||
``` | ||
cpu.usage.idle.us-west 100 | ||
=> cpu_usage,region=us-west idle=100 | ||
``` | ||
|
||
#### Filter Templates: | ||
|
||
Users can also filter the template(s) to use based on the name of the bucket, | ||
using glob matching, like so: | ||
|
||
```toml | ||
templates = [ | ||
"cpu.* measurement.measurement.region", | ||
"mem.* measurement.measurement.host" | ||
] | ||
``` | ||
|
||
which would result in the following transformation: | ||
|
||
``` | ||
cpu.load.us-west 100 | ||
=> cpu_load,region=us-west value=100 | ||
mem.cached.localhost 256 | ||
=> mem_cached,host=localhost value=256 | ||
``` | ||
|
||
#### Adding Tags: | ||
|
||
Additional tags can be added to a metric that don't exist on the received metric. | ||
You can add additional tags by specifying them after the pattern. | ||
Tags have the same format as the line protocol. | ||
Multiple tags are separated by commas. | ||
|
||
```toml | ||
templates = [ | ||
"measurement.measurement.field.region datacenter=1a" | ||
] | ||
``` | ||
|
||
would result in the following Graphite -> Telegraf transformation. | ||
|
||
``` | ||
cpu.usage.idle.us-west 100 | ||
=> cpu_usage,region=us-west,datacenter=1a idle=100 | ||
``` | ||
|
||
There are many more options available, | ||
[More details can be found here](https://github.com/influxdata/influxdb/tree/master/services/graphite#templates) | ||
|
||
#### Configuration: | ||
|
||
```toml | ||
[[inputs.exec]] | ||
### Commands array | ||
commands = ["/tmp/test.sh", "/usr/bin/mycollector --foo=bar"] | ||
|
||
### measurement name suffix (for separating different commands) | ||
name_suffix = "_mycollector" | ||
|
||
### Data format to consume. This can be "json", "influx" or "graphite" (line-protocol) | ||
### Each data format has it's own unique set of configuration options, read | ||
### more about them here: | ||
### https://github.com/influxdata/telegraf/blob/master/DATA_FORMATS.md | ||
data_format = "graphite" | ||
|
||
### This string will be used to join the matched values. | ||
separator = "_" | ||
|
||
### Each template line requires a template pattern. It can have an optional | ||
### filter before the template and separated by spaces. It can also have optional extra | ||
### tags following the template. Multiple tags should be separated by commas and no spaces | ||
### similar to the line protocol format. There can be only one default template. | ||
### Templates support below format: | ||
### 1. filter + template | ||
### 2. filter + template + extra tag | ||
### 3. filter + template with field key | ||
### 4. default template | ||
templates = [ | ||
"*.app env.service.resource.measurement", | ||
"stats.* .host.measurement* region=us-west,agent=sensu", | ||
"stats2.* .host.measurement.field", | ||
"measurement*" | ||
] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.