Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPVS input plugin #4890

Merged
merged 8 commits into from
Oct 25, 2018
Merged

Conversation

amoghe
Copy link
Contributor

@amoghe amoghe commented Oct 19, 2018

issue #4757

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

Subsequent commits will use this library to report IPVS stats from the
linux kernel. This library was chosen since it uses vishvananda/netlink
as the library to make netlink socket calls (and is the preferred way of
performing netlink calls in telegraf).
This is a first pass at adding an IPVS input plugin. It is quite
rudimentary and has the following limitations:

* does not support namespaces, reports stats for all namespaces
* does not support filtering, reports stats for all servers it sees
* only reports virtual server stats (not real servers)

These limitations will be addressed in subsequent commits.
The IPVS input plugin imports `docker/libnetwork` which in turn vendors
an older version of `vishvananda/netlink`. When using `dep import` to
pull in `docker/libnetwork` the dep tool is able to see this transitive
dependency but it fails to detect the version and ends up bringing in
a newer `vishvananda/netlink` that is incompatible with `docker/libnetwork`.

This patch pegs the `vishvananda/netlink` lib to the older one that
`docker/netlink` expects. We'll also work with docker to bump up its
dependency version so that we can remove this constraint in the near
future.
Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!


```
{"fields":{"bytes_in":0,"bytes_out":0,"connections":0,"cps":0,"pkts_in":0,"pkts_out":0,"pps_in":0,"pps_out":0},"name":"ipvs_virtual_server","tags":{"address_family":"inet","netmask":"32","sched":"rr","address":"172.18.64.234","port":"9000","protocol":"tcp"},"timestamp":1539810710}
{"fields":{"bytes_in":0,"bytes_out":0,"connections":0,"cps":0,"pkts_in":0,"pkts_out":0,"pps_in":0,"pps_out":0},"name":"ipvs_virtual_server","tags":{"address_family":"inet","netmask":"32","sched":"rr","fwmark":"47"},"timestamp":1539810710}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you convert this to line protocol, I usually use the output of ./telegraf --input-filter ipvs --test as a starting point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

plugins/inputs/ipvs/README.md Outdated Show resolved Hide resolved

// helper: given a Service, return tags that identify it
func serviceTags(s *ipvs.Service) map[string]string {
ret := map[string]string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a tag source and set it to os.Hostname(), this probably seems somewhat pointless, but it is for #4413 and will be nice when we get all plugins supporting it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are not metrics from a remote service (as described in #4413 ), I dont think this plugin should emit that tag.

As I understand that issue - the problem arises when metrics from a remote host (not the same host that telegraf is running on) are received and then forwarded by telegraf, we want to have a way to know what data source the metrics originated on.

However in this case, the metrics are from the host that telegraf is running on (like, cpu/mem/network). So the agent.omit_hostname (and related) config knobs should already provide this.

Please correct me if I'm missing something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the source and host tag have the same value, but still would have different meanings. Adding it in this case would just be for uniformity with other plugins with the goal that eventually all metrics would contain both of these tags.

With both tags you will be able to query either by the Telegraf system that reported the metric or by the system that the metric is about, without needing to check which is supported for each plugin, and will be able to list all metrics for a hostname more easily.

Copy link
Contributor Author

@amoghe amoghe Oct 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielnelson - I think I understand.

I think this conversation is probably to be had on #4413, but I'll continue here for now...

I'd like to point out that the nice thing about the host tag is/was that it defaulted to os.Hostname but was overridable via the config file. This thoughtful knob was immensely useful in situations where we deploy telegraf to large deployments where the logical name is friendlier/recognizable to developers than the actual dns/hostname given to the system during provisioning.

After reading the conversation on #4413 (which wandered in the direction of selecting an appropriate name) I still feel like the source tag should be added by telegraf when one is not provided by the input plugin (using whatever value it uses for the host tag). This makes a lot of sense for all the metrics that originate on the host itself and frees the plugin author from needing to think about this.

I'd argue that the majority of the metrics collected by the input plugins are from the local host itself. There are fewer plugins that collect metrics from over the network (statsd, wavefront, etc).

If we proceed in this direction, we potentially run into a situation where every input plugin can now handle this tag with subtle differences. In one case, as a plugin author, I may choose to accept this as a config element so as to allow users to specify an alternate value, in other cases the user may override this using [[inputs.foobar.tags]], thus making this redundant. In other cases, the burden to add a tagdrop now falls on the user (where she doesn't want this tag). We're now beginning to fray one of telegrafs greatest strengths - its simplicity.

I'd hope that the burden of doing this falls only on the input plugins who regularly collect metrics from over the network (statsd, wavefront, etc).

Has a final decision been reached on this? If so, then I'll go ahead and add this as you suggest. If not, I'd like to participate in that discussion before I add this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There hasn't been any discussion about automatically adding the tag, so I would comment on 4413 about the idea and any details. I'm not opposed to the idea, but it seems that if we automatically add the tag we need to go through all the remote plugins in one pass and add in the right value. Otherwise on top of it being incorrect I think it will cause confusion about what the tag means. I'd say it's about a 50/50 split between localhost only and plugins that can support a remote host on the input plugins.

For plugins that can collect remotely it will have to be the plugin who sets the value, as we don't have a uniform way to determine it from looking at the config and sometimes hostnames are dynamic. Uniformity in handling will just have to be enforced manually for this reason, and that means we can't totally solve the issue by adding it automatically.

Filtering it out is pretty easy, just add a tagexclude on the outputs, but I think this is something that very few will want. In general adding a tag like this doesn't increase series cardinality because it either doesn't vary with respect to the the host or there already is identifying information on the metric that it doesn't vary against.

We aren't very far along on adding this tag, so we can leave it out of this plugin for now. Once we have proper documentation in place for how it should work then we will begin requiring it in new plugins.

Gopkg.toml Outdated Show resolved Hide resolved
@danielnelson danielnelson added this to the 1.9.0 milestone Oct 19, 2018
@glinton glinton self-requested a review October 22, 2018 15:32
@glinton
Copy link
Contributor

glinton commented Oct 24, 2018

Replaces #4792

Copy link
Contributor

@glinton glinton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the story/eta on tests? Otherwise looks good, thanks!

default:
return fmt.Sprintf("%d", af)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: remove this empty line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// Gather gathers the stats
func (i *IPVS) Gather(acc telegraf.Accumulator) error {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: remove this empty line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

A previous commit added a Gopkg.toml contraint to force vishvananda/netlink
to a particular commit. After that change, running `dep ensure` results in
this Gopkg.lock file.
@amoghe
Copy link
Contributor Author

amoghe commented Oct 25, 2018

@glinton - not sure what tests would be appropriate here. As you can see, the Gather mainly copies data between the response from docker/libnetwork struct into the telegraf.Accumulator. There are some helper funcs to translate uint16s to strings, but I'm not sure those tests would add much value.

Thoughts?

@glinton
Copy link
Contributor

glinton commented Oct 25, 2018

good point, they really wouldn't add too much value in this case.

@danielnelson
Copy link
Contributor

Not required, but my ideas on what tests could be added. I would say the main thing you would want to test here is the state machine in Gather and how it interacts with ipvs.Handle. Adding a happy path test set of data returned from services could be useful as both an example and if bugs are found we will have the basic structure available to add additional cases.

@amoghe
Copy link
Contributor Author

amoghe commented Oct 25, 2018

@danielnelson I'll update #4413 with my thoughts/ideas on where we could go with this tag.

Thanks for approving this, should we go ahead and merge this right now? Or should I add the real_server metrics and then we merge after that? Originally I had intended to do this in 2 (or 3) distinct phases, and this the first.

@danielnelson danielnelson merged commit b88436c into influxdata:master Oct 25, 2018
@danielnelson
Copy link
Contributor

Let's do the rest of the work in a follow up PR.

otherpirate pushed a commit to otherpirate/telegraf that referenced this pull request Mar 15, 2019
otherpirate pushed a commit to otherpirate/telegraf that referenced this pull request Mar 15, 2019
dupondje pushed a commit to dupondje/telegraf that referenced this pull request Apr 22, 2019
athoune pushed a commit to bearstech/telegraf that referenced this pull request Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants