Support IPVS metrics #4757

amoghe · 2018-09-27T03:41:09Z

Feature Request

I'd like to implement an input plugin for reporting IPVS metrics (for "virtual_server" and "real_server").

Proposal:

Proposal is to add this in phases.

Phase 1

Add metrics to track "virtual_services". They are namespaced under "virtual_server" so that we can disinguish between them and the "real_server" metrics (see phase 2).

These are the readily available metrics:

Connections (virtual_server.connections)
PacketsIn (virtual_server.pkts_in)
PacketsOut (virtual_server.pkts_out)
BytesIn (virtual_server.bytes_in)
BytesOut (virtual_server.bytes_out)
CPS (virtual_server.cps)
PPSIn (virtual_server.pps_in)
PPSOut (virtual_server.pps_out)
BPSIn (virtual_server.bps_in)
BPSOut (virtual_server.bps_out)

Each metric has the following tags:

address
protocol
port
fwmark (if it is non zero)

Phase 2

Add metrics to track each of the above metrics, except this time for each "real_server" that backs each of the above "virtual_server"

Each "destination"s metrics will need to contain an additional tag that indicates which "service" it was backing. For this we add

service : concat(protocol, address, port, fwmark) with some separator (-)

Use case: [Why is this important (helps with prioritizing requests)]

Help monitoring linux virtual servers.

The text was updated successfully, but these errors were encountered:

danielnelson · 2018-09-27T18:31:09Z

Sounds good. I don't have any experience with IPVS, so I have a couple questions:

Can you explain or link to some documentation that explains how the real server/destination differs from the service? Would the address/protocol/port/fwmark tags not define the real server?
I assume a fwmark of 0 means that there is no fwmark?
Implementation related, but would this data come from /proc?

amoghe · 2018-09-27T20:46:24Z

The terminology is different depending on where you look. The golang libraries call them "Services" and "Destinations" , however the IPVS terminology is "Virtual Server" and "Real Server".

A virtual server is configured to be backed by one or more real servers (that can actually handle the traffic). The virtual server is just a load balancer (the "VS" part of IPVS).

The virtual server can be configured to load balance connections based on proto/addr/port/fwmark (addr+port being commonly used).

These metrics are exposed via the kernel over a netlink interface. There is a pure-go library to deal with netlink sockets and parsing these messages here - http://github.com/mqliang/libipvs however I've forked it to add a Close() method (to allow for graceful error handling and reopen/retry) and add some documentation improvements. That is available here - https://github.com/amoghe/libipvs. There is a PR open to merge those changes back. Once that happens we can switch to the upstream. However I have not seen any movement on it thus far.

amoghe · 2018-09-27T20:49:24Z

I'll switch to using virtual_server and real_server in the code and documentation. I think it is better to align the terminology with IPVS one than the one used by the libraries.

danielnelson · 2018-09-27T21:20:43Z

Looks like libipvs uses github.com/hkwi/nlgo for netlink. I know we have some other netlink work in progress, @fntlnz is this the same library your work is using?

In phase 2 the sum of the values for the real servers would be equal to the value for the virtual server? Or perhaps only approximately due to network issues or unavailability?

amoghe · 2018-09-27T22:13:51Z

While there are other libraries that are available for manipulating ipvs from go - including one that ships in docker (github.com/docker/libnetwork/ipvs), this library (libipvs) is the only one that seems to handle deserialization of the stats block correctly (which is what we're interested in). The other libraries either dont support it , or dont handle it well.

Another promising one is github.com/google/seesaw/ipvs - but that uses a go interface to netlink that goes via the C interface (and needs libnh to be installed).

amoghe · 2018-09-27T22:14:43Z

in phase 2 the sum of the values for the real servers would be equal to the value for the virtual server?

Yes, from my experimentation that seems to be the case but I wouldnt be surprised if it wasn't . They are being reported separately from the kernel.

amoghe · 2018-09-27T22:25:15Z

I assume a fwmark of 0 means that there is no fwmark?

Yes, from what I can tell - fwmark 0 means that this virtual server is fronting some real servers based on a rule that utilizes the other 3 (addr, port, proto). So a rule can be set using add/port/proto or fwmark , but not both .

danielnelson · 2018-09-28T00:23:44Z

I would use the same tags on the real_server as with virtual_server: protocol, address, port, fwmark. You could vary the measurement name to make it easier to aggregate across only real_servers or virtual_servers.

ipvs_virtual_server,addr=10.1.2.3,port=443,proto=tcp value=42
ipvs_real_server,addr=10.1.2.3,port=443,proto=tcp value=42

Or another way to do the same would be using tags:

ipvs,type=virtual_server,addr=10.1.2.3,port=443,proto=tcp value=42
ipvs,type=real_server,addr=10.1.2.3,port=443,proto=tcp value=42

We are starting to standardize on using the source tag to represent the hostname of the system that the metrics are about: #4413. In this case it is a little funny because you could argue that all the metrics are about the virtual_server, but I can also see how the source would be the real_server.

amoghe · 2018-09-28T00:28:20Z

Updated the Feature Request comment to reflect the updated proposal (naming scheme of the metrics).

amoghe · 2018-09-28T06:03:49Z

Here is a similar name scheme, but first some background...

Typically a single virtual_server is backed by several real_servers. So the output from ipvsadm can look like this:

TCP  172.18.64.234:9001             352665  3537412        0  936807K        0
  -> 172.18.64.201:9001              17839   179535        0 47392524        0
  -> 172.18.64.202:9001              18045   180643        0 47660176        0
  -> 172.18.64.203:9001              16910   170472        0 45050422        0
  -> 172.18.64.204:9001              17551   176566        0 46663660        0
  -> 172.18.64.205:9001              17113   172243        0 45421224        0
  -> 172.18.64.206:9001              17437   175234        0 46233126        0
  -> 172.18.64.207:9001              17752   179486        0 47390798        0
  -> 172.18.64.208:9001              17268   173417        0 45836000        0

Here, the first line (with the prefix TCP) indicates that this is a section for a virtual server, and the lines that follow are its real servers.

Now, a virtual server is defined by one of

proto + ip + port
fwmark

So in the above pasted output we can tell that the virtual server is using proto + ip + port. However, the following are also possible:

UDP  172.18.64.234:9003                  0        0        0        0        0
  -> 172.18.64.201:9003                  0        0        0        0        0
  -> 172.18.64.202:9003                  0        0        0        0        0
  -> 172.18.64.203:9003                  0        0        0        0        0
  -> 172.18.64.204:9003                  0        0        0        0        0

... or ...

FWM  47                                  0        0        0        0        0
  -> 172.18.64.201:9000                  0        0        0        0        0
  -> 172.18.64.202:9000                  0        0        0        0        0
  -> 172.18.64.203:9000                  0        0        0        0        0
  -> 172.18.64.204:9000                  0        0        0        0        0
  -> 172.18.64.205:9000                  0        0        0        0        0

In the example above, we see that fwmark is being used to determine what traffic gets forwarded to these real servers.

When we're tracking the virtual_server metrics, we're really tracking the metrics for only the first line (in each of the above examples).

So, with this as background , a virtual servers metrics (when using proto+ip+port) maybe something like this:

ipvs.virtual_server.bytes_in,addr=10.1.2.3,port=443,proto=tcp value=42

Or when using fwmark, they may be like this:

ipvs.virtual_server.bytes_in,fwmark=99 value=42

Since these 2 schemes are mutually exclusive, this plugin will emit one of the above.

We can discuss the real_server separately (there, the addr/port is important to identify "which" real server, and some indicator to tie them back to the "virtual server" they were servicing).

So for the case of virtual_servers, does the above proposal resonate?

danielnelson · 2018-09-28T20:23:23Z

Sounds good, in line protocol it would look something like:

ipvs_virtual_server,addr=10.1.2.3,port=443,proto=tcp bytes_in=42,bytes_out=42,connections=42
ipvs_virtual_server,fwmark=99 bytes_in=42,bytes_out=42,connections=42

I see now that we need the additional tags to tie the real server to the virtual server, I would probably split out the tags though:

ipvs_virtual_server,addr=10.1.2.3,port=443,proto=tcp bytes_in=42,bytes_out=42,connections=42
ipvs_real_server,virtual_addr=10.1.2.3,virtual_port=443,proto=tcp,real_addr=10.1.2.4,real_port=443 bytes_in=42,bytes_out=42,connections=42

On the netlink side, @fntlnz let me know he is using github.com/vishvananda/netlink. Ideally, we would share underlying libraries between plugins so we don't need multiple dependencies. I also don't see much activity on the github.com/hkwi/nlgo side.

amoghe · 2018-09-28T20:57:09Z

in line protocol it would look something like

Yes, I think we're in agreement on the line protocol format.

Ideally, we would share underlying libraries between plugins so we don't need multiple dependencies.

Let me poke around and see how easy it is to get IPVS support using that ( vishvananda/netlink) as the underlying netlink lib. I agree that using just one dependency is desirable, but I'd hate for that to be a gating factor (since both of these are pure-go libs). I'll update this thread with what I find.

On a related note, I docs to be lacking on the following topics:

explain how the SampleConfig gets translated to members on the struct (camel to snake case, who sets the members?). From the example in CONTRIBUTING.md it isn't clear (to me) how the config key/values get set on the struct that I instantiate in the Creator.
some guidance on adding info/error/debug logs. I could look at how other plugins do it, but I think it should be captured somewhere in the CONTRIBUTING.md (under the input plugins guidelines section).

danielnelson · 2018-09-28T23:44:15Z

IPVS is becoming popular for use with kube proxy, so adding k8s tag.

danielnelson · 2018-09-28T23:50:37Z

I'll update the docs with answers to your questions, but the short answer is you should use a struct tag for each field instead of relying on the default camelcase conversion:

type struct IPVS {
	FooBar []string `toml:"foo_bar"`
}

For errors fatal to the gather attempt return an error out of Gather, if the gather should continue use:

log.Printf("E! Error in plugin [inputs.ipvs]: %v", err)

You can use I!, D!, E! for info, debug, error level.

amoghe · 2018-10-02T18:52:35Z

@danielnelson can we take on the lib change as a "Phase 3" item? I'd like to proceed using the existing library (which is pure-go, so not a burning issue, imo) . Later we can switch to using the visvananda/netlink lib once I contribute the code there.

amoghe · 2018-10-02T22:48:11Z

PR #4792

danielnelson · 2018-10-25T19:49:05Z

The items under phase 1 above are completed for 1.9.0.

danielnelson · 2018-11-03T01:12:48Z

Closing, all items completed.

amoghe · 2018-11-05T18:57:48Z

@danielnelson are there specific tests to run for this plugin during the RC phase of 1.9 ?

danielnelson · 2018-11-05T19:28:09Z

If you could just double check everything is working on a system with IPVS when the RC is release that is probably all that is needed.

glinton added the feature request Requests for new plugin and for new features to existing plugins label Sep 27, 2018

danielnelson added the area/k8s label Sep 28, 2018

amoghe mentioned this issue Oct 25, 2018

IPVS input plugin #4890

Merged

3 tasks

danielnelson added this to the 1.9.0 milestone Oct 25, 2018

danielnelson closed this as completed Nov 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support IPVS metrics #4757

Support IPVS metrics #4757

amoghe commented Sep 27, 2018 •

edited

Loading

danielnelson commented Sep 27, 2018

amoghe commented Sep 27, 2018 •

edited

Loading

amoghe commented Sep 27, 2018

danielnelson commented Sep 27, 2018

amoghe commented Sep 27, 2018

amoghe commented Sep 27, 2018

amoghe commented Sep 27, 2018

danielnelson commented Sep 28, 2018

amoghe commented Sep 28, 2018

amoghe commented Sep 28, 2018 •

edited

Loading

danielnelson commented Sep 28, 2018

amoghe commented Sep 28, 2018 •

edited

Loading

danielnelson commented Sep 28, 2018

danielnelson commented Sep 28, 2018

amoghe commented Oct 2, 2018

amoghe commented Oct 2, 2018

danielnelson commented Oct 25, 2018

danielnelson commented Nov 3, 2018

amoghe commented Nov 5, 2018

danielnelson commented Nov 5, 2018

Support IPVS metrics #4757

Support IPVS metrics #4757

Comments

amoghe commented Sep 27, 2018 • edited Loading

Feature Request

Proposal:

Phase 1

Phase 2

Use case: [Why is this important (helps with prioritizing requests)]

danielnelson commented Sep 27, 2018

amoghe commented Sep 27, 2018 • edited Loading

amoghe commented Sep 27, 2018

danielnelson commented Sep 27, 2018

amoghe commented Sep 27, 2018

amoghe commented Sep 27, 2018

amoghe commented Sep 27, 2018

danielnelson commented Sep 28, 2018

amoghe commented Sep 28, 2018

amoghe commented Sep 28, 2018 • edited Loading

danielnelson commented Sep 28, 2018

amoghe commented Sep 28, 2018 • edited Loading

danielnelson commented Sep 28, 2018

danielnelson commented Sep 28, 2018

amoghe commented Oct 2, 2018

amoghe commented Oct 2, 2018

danielnelson commented Oct 25, 2018

danielnelson commented Nov 3, 2018

amoghe commented Nov 5, 2018

danielnelson commented Nov 5, 2018

amoghe commented Sep 27, 2018 •

edited

Loading

amoghe commented Sep 27, 2018 •

edited

Loading

amoghe commented Sep 28, 2018 •

edited

Loading

amoghe commented Sep 28, 2018 •

edited

Loading