Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs (and design issue) with Riemann output plugin #642

Closed
dhruvbansal opened this issue Feb 3, 2016 · 2 comments · Fixed by #669
Closed

Bugs (and design issue) with Riemann output plugin #642

dhruvbansal opened this issue Feb 3, 2016 · 2 comments · Fixed by #669

Comments

@dhruvbansal
Copy link

Getting an error sending inputs with string values to a Riemann server. To replicate, install telegraf (worked on release v0.10.1 & git v0.10.1-26-gb941d27) with a Riemann server output and make sure to include the system input (though this same bug should manifest with any input having a string value, see below):

$ cat /tmp/telegraf.conf 
# Telegraf configuration

[tags]

# Configuration for telegraf agent
[agent]
    interval = "10s"
    debug = false
    hostname = "my-host"
    round_interval = true
    flush_interval = "10s"
    flush_jitter = "0s"

###############################################################################
#                                  OUTPUTS                                    #
###############################################################################

[[outputs.riemann]]
    url = "localhost:5555"
    transport = "tcp"

###############################################################################
#                                  INPUTS                                     #
###############################################################################

[[inputs.system]]

Output of -test:

$ telegraf -config /tmp/telegraf.conf -test
* Plugin: system, Collection 1
> system load1=0.2,load15=0.75,load5=0.3,uptime=331935i,uptime_format="3 days, 20:12" 1454537455699667917

Now try sending this to Riemann:

$ telegraf -config /tmp/telegraf.conf
2016/02/03 22:14:18 Starting Telegraf (version 0.10.1)
2016/02/03 22:14:18 Loaded outputs: riemann
2016/02/03 22:14:18 Loaded inputs: system
2016/02/03 22:14:18 Tags enabled: host=my-host
2016/02/03 22:14:18 Agent Config: Interval:10s, Debug:false, Quiet:false, Hostname:"my-host", Flush Interval:10s
2016/02/03 22:14:20 Gathered metrics, (10s interval), from 1 inputs in 788.652µs
2016/02/03 22:14:30 Gathered metrics, (10s interval), from 1 inputs in 668.949µs
2016/02/03 22:14:30 Error writing to output [riemann]: FAILED to send riemann message: Metric of invalid type (type string)

The problem is the uptime_format field in the system input is a string, not a numeric value, but is passed to the Riemann client library as a Metric anyway. Looking at the way the client library processes events, if an input's value is a string, then it should be passed to the client as a State, not a Metric. This matches with how Riemann thinks about events (see the table of event properties). This bug will occur when writing to Riemann for any input with a value that is a string.

This isn't caught by the integration tests because, as far as I can see, the tests only test with numeric values, not with strings.

There is a design issue when writing metrics to Riemann as well. Inputs in Telegraf have a name and a set of tags which identify them uniquely and can contain many metrics (fields). When writing to Riemann, these multi-metric inputs are turned into single-metric Riemann events (as required by Riemann). Unfortunately, of the name and tags that uniquely identify this input, only the name is currently transmitted to Riemann. So the Telegraf input

disk,fstype=ext4,path=/ free=15179063296i,inodes_free=1154148i,inodes_total=1310720i,inodes_used=156572i,total=20988547072i,used=4848979968i,used_percent=24.210952133880912 145$

becomes the Riemann events

Event{:host my-host, :service disk_free, :metric 104857600, :time 1.454538690201E9, :ttl 60}
Event{:host my-host, :service disk_inodes_free, :metric 2054140, :time 1.454538690201E9, :ttl 60}
Event{:host my-host, :service disk_inodes_total, :metric 2054141, :time 1.454538690201E9, :ttl 60}
...

Without the missing tag information (most importantly, path=/), it's impossible on the Riemann side to handle this event stream.

I wrote an (ugly) patch for these problems but I'm hoping someone else can do better now that I've raised this issue.

@dhruvbansal
Copy link
Author

BTW bool values from an input can cause the same problem, see my latest commits...

@sparrc
Copy link
Contributor

sparrc commented Feb 7, 2016

@dhruvbansal I don't particularly see anything wrong with the commit you've linked. I would appreciate if you could put it up in a PR. I'm sure you have a better perspective to be doing this as I have very little hands-on experience with riemann.

sparrc pushed a commit that referenced this issue Feb 9, 2016
* Customizable 'separator' option instead of hard-coded '_'

* String values are sent as "State" instead of "Metric", preventing
  Riemann from rejecting them

* Riemann service name is set to an (ugly) combination of input name &
  (sorted) tags' values...this allows connecting different events for
  the same input together on the Riemann side

closes #642
sparrc pushed a commit that referenced this issue Feb 9, 2016
* Customizable 'separator' option instead of hard-coded '_'

* String values are sent as "State" instead of "Metric", preventing
  Riemann from rejecting them

* Riemann service name is set to an (ugly) combination of input name &
  (sorted) tags' values...this allows connecting different events for
  the same input together on the Riemann side

closes #642
sparrc pushed a commit that referenced this issue Feb 9, 2016
* Customizable 'separator' option instead of hard-coded '_'

* String values are sent as "State" instead of "Metric", preventing
  Riemann from rejecting them

* Riemann service name is set to an (ugly) combination of input name &
  (sorted) tags' values...this allows connecting different events for
  the same input together on the Riemann side

closes #642
geodimm pushed a commit to miketonks/telegraf that referenced this issue Mar 10, 2016
* Customizable 'separator' option instead of hard-coded '_'

* String values are sent as "State" instead of "Metric", preventing
  Riemann from rejecting them

* Riemann service name is set to an (ugly) combination of input name &
  (sorted) tags' values...this allows connecting different events for
  the same input together on the Riemann side

closes influxdata#642
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants