-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPVS input plugin #4890
IPVS input plugin #4890
Conversation
Subsequent commits will use this library to report IPVS stats from the linux kernel. This library was chosen since it uses vishvananda/netlink as the library to make netlink socket calls (and is the preferred way of performing netlink calls in telegraf).
This is a first pass at adding an IPVS input plugin. It is quite rudimentary and has the following limitations: * does not support namespaces, reports stats for all namespaces * does not support filtering, reports stats for all servers it sees * only reports virtual server stats (not real servers) These limitations will be addressed in subsequent commits.
The IPVS input plugin imports `docker/libnetwork` which in turn vendors an older version of `vishvananda/netlink`. When using `dep import` to pull in `docker/libnetwork` the dep tool is able to see this transitive dependency but it fails to detect the version and ends up bringing in a newer `vishvananda/netlink` that is incompatible with `docker/libnetwork`. This patch pegs the `vishvananda/netlink` lib to the older one that `docker/netlink` expects. We'll also work with docker to bump up its dependency version so that we can remove this constraint in the near future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
plugins/inputs/ipvs/README.md
Outdated
|
||
``` | ||
{"fields":{"bytes_in":0,"bytes_out":0,"connections":0,"cps":0,"pkts_in":0,"pkts_out":0,"pps_in":0,"pps_out":0},"name":"ipvs_virtual_server","tags":{"address_family":"inet","netmask":"32","sched":"rr","address":"172.18.64.234","port":"9000","protocol":"tcp"},"timestamp":1539810710} | ||
{"fields":{"bytes_in":0,"bytes_out":0,"connections":0,"cps":0,"pkts_in":0,"pkts_out":0,"pps_in":0,"pps_out":0},"name":"ipvs_virtual_server","tags":{"address_family":"inet","netmask":"32","sched":"rr","fwmark":"47"},"timestamp":1539810710} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you convert this to line protocol, I usually use the output of ./telegraf --input-filter ipvs --test
as a starting point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
// helper: given a Service, return tags that identify it | ||
func serviceTags(s *ipvs.Service) map[string]string { | ||
ret := map[string]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a tag source
and set it to os.Hostname()
, this probably seems somewhat pointless, but it is for #4413 and will be nice when we get all plugins supporting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are not metrics from a remote service (as described in #4413 ), I dont think this plugin should emit that tag.
As I understand that issue - the problem arises when metrics from a remote host (not the same host that telegraf is running on) are received and then forwarded by telegraf, we want to have a way to know what data source
the metrics originated on.
However in this case, the metrics are from the host that telegraf is running on (like, cpu/mem/network). So the agent.omit_hostname
(and related) config knobs should already provide this.
Please correct me if I'm missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case the source
and host
tag have the same value, but still would have different meanings. Adding it in this case would just be for uniformity with other plugins with the goal that eventually all metrics would contain both of these tags.
With both tags you will be able to query either by the Telegraf system that reported the metric or by the system that the metric is about, without needing to check which is supported for each plugin, and will be able to list all metrics for a hostname more easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielnelson - I think I understand.
I think this conversation is probably to be had on #4413, but I'll continue here for now...
I'd like to point out that the nice thing about the host
tag is/was that it defaulted to os.Hostname
but was overridable via the config file. This thoughtful knob was immensely useful in situations where we deploy telegraf to large deployments where the logical name is friendlier/recognizable to developers than the actual dns/hostname given to the system during provisioning.
After reading the conversation on #4413 (which wandered in the direction of selecting an appropriate name) I still feel like the source
tag should be added by telegraf when one is not provided by the input plugin (using whatever value it uses for the host
tag). This makes a lot of sense for all the metrics that originate on the host itself and frees the plugin author from needing to think about this.
I'd argue that the majority of the metrics collected by the input plugins are from the local host itself. There are fewer plugins that collect metrics from over the network (statsd, wavefront, etc).
If we proceed in this direction, we potentially run into a situation where every input plugin can now handle this tag with subtle differences. In one case, as a plugin author, I may choose to accept this as a config element so as to allow users to specify an alternate value, in other cases the user may override this using [[inputs.foobar.tags]]
, thus making this redundant. In other cases, the burden to add a tagdrop
now falls on the user (where she doesn't want this tag). We're now beginning to fray one of telegrafs greatest strengths - its simplicity.
I'd hope that the burden of doing this falls only on the input plugins who regularly collect metrics from over the network (statsd, wavefront, etc).
Has a final decision been reached on this? If so, then I'll go ahead and add this as you suggest. If not, I'd like to participate in that discussion before I add this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There hasn't been any discussion about automatically adding the tag, so I would comment on 4413 about the idea and any details. I'm not opposed to the idea, but it seems that if we automatically add the tag we need to go through all the remote plugins in one pass and add in the right value. Otherwise on top of it being incorrect I think it will cause confusion about what the tag means. I'd say it's about a 50/50 split between localhost only and plugins that can support a remote host on the input plugins.
For plugins that can collect remotely it will have to be the plugin who sets the value, as we don't have a uniform way to determine it from looking at the config and sometimes hostnames are dynamic. Uniformity in handling will just have to be enforced manually for this reason, and that means we can't totally solve the issue by adding it automatically.
Filtering it out is pretty easy, just add a tagexclude on the outputs, but I think this is something that very few will want. In general adding a tag like this doesn't increase series cardinality because it either doesn't vary with respect to the the host or there already is identifying information on the metric that it doesn't vary against.
We aren't very far along on adding this tag, so we can leave it out of this plugin for now. Once we have proper documentation in place for how it should work then we will begin requiring it in new plugins.
Replaces #4792 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the story/eta on tests? Otherwise looks good, thanks!
plugins/inputs/ipvs/ipvs.go
Outdated
default: | ||
return fmt.Sprintf("%d", af) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: remove this empty line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
plugins/inputs/ipvs/ipvs.go
Outdated
|
||
// Gather gathers the stats | ||
func (i *IPVS) Gather(acc telegraf.Accumulator) error { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: remove this empty line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
A previous commit added a Gopkg.toml contraint to force vishvananda/netlink to a particular commit. After that change, running `dep ensure` results in this Gopkg.lock file.
@glinton - not sure what tests would be appropriate here. As you can see, the Thoughts? |
good point, they really wouldn't add too much value in this case. |
Not required, but my ideas on what tests could be added. I would say the main thing you would want to test here is the state machine in Gather and how it interacts with |
@danielnelson I'll update #4413 with my thoughts/ideas on where we could go with this tag. Thanks for approving this, should we go ahead and merge this right now? Or should I add the |
Let's do the rest of the work in a follow up PR. |
issue #4757
Required for all PRs: