Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mountpoint label and port label to mounstats collector (#993) #994

Closed
wants to merge 1 commit into from
Closed

Conversation

0xArsen
Copy link

@0xArsen 0xArsen commented Jul 9, 2018

This method takes the original way of removing duplicates from the mountstat collector and replaces it with consideration for mountpoint and port number.

Signed-off-by: Arsen Akishev [email protected]

@SuperQ
Copy link
Member

SuperQ commented Jul 11, 2018

The fixture collector/fixtures/e2e-64k-page-output.txt needs to be updated to match in order to make the buildkite happy.

@@ -111,8 +138,8 @@ func NewMountStatsCollector() (Collector, error) {
)

var (
labels = []string{"export"}
opLabels = []string{"export", "operation"}
labels = []string{"export", "mountpoint", "xprt"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to this doc, the first column of the xprt line is srcport, so the label should be srcport not xprt.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. I will update this to srcport. I will also update collector/fixtures/e2e-64k-page-output.txt.

@SuperQ
Copy link
Member

SuperQ commented Jul 11, 2018

I'm still thinking we should de-dupe on mountpoint, but if there are mounts with different version, we should include that as a label.

I'm also thinking the source port label isn't very useful, really what we probably want in order to correctly expose data is export, mountpoint, and proto

@0xArsen
Copy link
Author

0xArsen commented Jul 11, 2018

NFS has the option to let clients communicate using non-privileged ports. This makes it possible for the nfs client to communicate with the same export using a privileged and a non-privileged port.

@discordianfish
Copy link
Member

Don't feel strongly about the source port but in general LGTM! 👍

@0xArsen
Copy link
Author

0xArsen commented Jul 12, 2018

Could you tell me why buildkite is breaking? Thank you

@@ -2198,7 +2267,7 @@ node_sockstat_TCP_inuse 4
node_sockstat_TCP_mem 1
# HELP node_sockstat_TCP_mem_bytes Number of TCP sockets in state mem_bytes.
# TYPE node_sockstat_TCP_mem_bytes gauge
node_sockstat_TCP_mem_bytes 65536
node_sockstat_TCP_mem_bytes 4096
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks unrelated to this PR.

… and port

Add updated e2e tests

Change xprt to srcport

Change e2e-output.txt to support new label srcport

Change e2e-64k-page-output.txt to support srcport label

Change TCP_mem_bytes back to 65536 from 4096

Signed-off-by: Arsen Akishev <[email protected]>
@hakoerber
Copy link
Contributor

hakoerber commented Jul 12, 2018

So, to sum my thoughts on the discussion spread over #993, #996 and this merge request:

We have a few different properties for every NFS mount:

  • version (NFSv3 or NFSv4)
  • mountpoint
  • protocol (UDP or TCP)
  • export (server:/path)
  • source port

I think it makes sense to deduplicate NFS mounts by grouping by the tuple (export, version, protocol). mountpoint should be left out, because you will get the same statistics when mounting the same export on two different mountpoints. I will have to confirm that later.

This means that when you e.g. mount the same export with the same protocol but different NFS versions on the same mountpoint, you will get two sets of metrics. I think that this makes sense. Otherwise, you would lose existing metrics when mounting over an existing mount.

I would definitely not use srcport as a label, as suggested here. The source port can change with a remount, which would invalidate our existing data series and create a new one, even though all other parameters stayed the same.

I think having a mountpoint label is a very good idea to help discoverability server-side.

For #996, I already updated the duplicate detection logic. Maybe I can open another merge request so we can compare the approaches and arrive at a good solution?

About the NFS version label in prometheus/procfs: It looks a bit more involved to do. The opts: line in /proc/self/mountstats is currently not parsed at all. If you @arsenakishev need that metric, maybe we can work together to get the required code into procfs.

EDIT

I tested it locally: Mounting the same export on two different mountpoints using the same parameters (protocol, version) yields the exact same metrics in /proc/self/mountstats for the two mounts. So grouping without considering mountpoint makes sense I think.

@hakoerber hakoerber mentioned this pull request Jul 12, 2018
@0xArsen
Copy link
Author

0xArsen commented Jul 12, 2018

@hakoerber I think taking mountpoint out of the tuple makes sense. But srcport may be necessary. And yes we should find a way to add a parser for the opts line in procfs.

In my case I am using Dell EMC Isilon, to pool the ip addresses to the same domain name. This creates a situation where NFS lets me mount with the same export and mountpoint multiple times without any issues; this happens because the domain name resolves to a different ip. Every mount gets recorded separately in /proc/self/mountstats

device replaced_domain:/ifs mounted on /mnt/nfs with fstype nfs statvers=1.1
opts:ro,vers=3,rsize=131072,wsize=524288,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=xx.xx.xx.81,mountvers=3,mountport=300,mountproto=tcp,local_lock=none
    age:    1721572
    caps:    caps=0x3fc7,wtmult=512,dtsize=32768,bsize=0,namlen=255
    sec:    flavor=1,pseudoflavor=1
    events:    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    bytes:    0 0 0 0 0 0 0 0
    RPC iostats version: 1.0  p/v: 100003/3 (nfs)
    xprt:    tcp 816 1 1 0 0 13 13 0 13 0 2 0 0
    per-op statistics
            NULL: 0 0 0 0 0 0 0 0
device replaced_domain:/ifs mounted on /mnt/nfs with fstype nfs statvers=1.1
    opts:    ro,vers=3,rsize=131072,wsize=524288,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=xx.xx.xx.239,mountvers=3,mountport=300,mountproto=tcp,local_lock=none
    age:    1721572
    caps:    caps=0x3fc7,wtmult=512,dtsize=32768,bsize=0,namlen=255
    sec:    flavor=1,pseudoflavor=1
    events:    32666926 2997010980 553 16807 26123539 472085 3005222710 0 502 5014043 0 11767601 35077 0 25951790 300 600 25827155 0 0 0 0 0 0 0 0 0
    bytes:    4790124582217 0 2214397952 0 1230298670616 0 301853154 0
    RPC iostats version: 1.0  p/v: 100003/3 (nfs)
    xprt:    tcp 885 1 1 0 0 59002022 59002021 1 139390883 0 499 870786 60980420
    per-op statistics
            NULL: 0 0 0 0 0 0 0 0

It may be possible that other users will have an equivalent setup whether its through Isilon or not, where NFS will let them mount exports to the same mountpoint because the domain name will resolve differently. I can't think of a better way to dedup for this issue. Suggestions would be welcome.

@hakoerber
Copy link
Contributor

hakoerber commented Jul 12, 2018

Wow, did not know that this was possible. The xprt statistics for the first mount are quite low numbers, does that mean that the second one is used primarily and the first one is for backup?

The problem I see with srcport as a label is that the source port changes. I just tested it, when you do a remount (mount -o remount), the srcport does not change. With a mount && umount, it does.

This means that when I reboot a server, afterwards, the time series will be different because the srcport changed.

Is there anything else that differs between those two mounts that is static and can be used to tell them apart and used as a label?

@0xArsen
Copy link
Author

0xArsen commented Jul 13, 2018

The mounts can act as backups or just alternatives for access. As for the issue with srcport, thank you for pointing out the time series issue I have been convinced that excluding srcport is a better approach.

I do not think there is anything within proc/self/mountstats that can act as a static differentiator. Perhaps proc/self/mountinfo as mentioned in #600 has the necessary info but it seems that implementing that may be beyond the mountstats collector.

I would still like to have a protocol and nfs version exported as a label. I can raise an issue in github.com/prometheus/procfs for the nfs versioning and start working on a parser for the opts line. If you can help me with that it would be great.
As for this pr, I guess we can have it expose data forexport and mountpoint but I am not sure how much value that would provide.

@discordianfish
Copy link
Member

@hakoerber @arsenakishev How do you want to split the work? Let's either finish one by one, starting with the earliest (this) or moving the changes from #998 here. Up to you!

@0xArsen
Copy link
Author

0xArsen commented Jul 16, 2018 via email

@discordianfish
Copy link
Member

Now with #998 merged, are there still changes from this PR required or can we close it?

@0xArsen
Copy link
Author

0xArsen commented Jul 24, 2018

It's fine to close this. If I come up with a better solution I can open another PR.

@discordianfish
Copy link
Member

Okay great, thanks everybody involved!

@mknapphrt
Copy link
Contributor

I don't believe there is any way to deterministically distinguish between duplicate mounts in a way that won't create new time series, but this is still an issue that I think deserves some attention. The source port will change each time you remount, which leaves undesirable effects for a lot of people. However, I think that the mountaddr would isolate this effect to those that are using pooled ip addresses. Anyone that isn't would still see the same ip address each time instead of getting a new port on each restart.

Mount address isn't currently exposed from procfs (unless I'm mistaken) but it doesn't seem like a big change to expose it. Given a pool of ip addresses you will get new time series on restarts, but personally I think that seeing the duplicate mounts is more beneficial than having new time series. In general, I don't see many situations that anyone would be filtering by the ip address instead of the mount point or export, especially when they know they have a pool of ip addresses.

Does anyone have any opinions on reviving this with source port changed to mount address?

@mknapphrt
Copy link
Contributor

@discordianfish @SuperQ @hakoerber Any opinions on the above?

@SuperQ
Copy link
Member

SuperQ commented Dec 12, 2018

I need to look over this some more, but I don't have any time in the next week. What about just including the extra mount information as a separate label. It may generate new series, but this can be aggregated away at query time.

@mknapphrt
Copy link
Contributor

All good, my idea was basically to just include a mount address label onto the mountstats metrics. Possibly also adding mount point but I don't really think that you should need the mount point.
ie: node_mountstats_nfs_read_bytes_total{export="....",instance="...",protocol="tcp",mountaddr="xxx.xxx.xxx.xxx"}

@mknapphrt
Copy link
Contributor

@SuperQ I know the holidays are probably keeping everyone busy, just curious if you had a chance to look at this.

@discordianfish
Copy link
Member

I think that's fine. The address dimension should be fairly bounded.

@dipack95
Copy link
Contributor

dipack95 commented Jun 4, 2019

@discordianfish I've been working on distinguishing between multiple machines mounted to the same mountpoint, and my method involves depending on the mountaddr field in the opts in /proc/self/mountstats.

During testing, I've discovered that the mountaddr basically never shows up, while mounting NFS shares, which makes it harder to monitor all the source machines. So instead, I've been using the Device string to populate mountaddr value in the metrics, but that string is finicky as well, as it does not always contain an IP address instead of a hostname. Do you have any pointers on how to proceed?

@dipack95
Copy link
Contributor

dipack95 commented Jun 5, 2019

After some research, I've found that the information I require is present in /proc/self/mountinfo, and the mount package by Docker parses it as required, as shown here.

Can I modify the Update function for the mountStatsCollector to use the parser written by Docker, and submit a PR for the change?

@SuperQ
Copy link
Member

SuperQ commented Jun 5, 2019

It would be good to add the mountinfo parsing to our procfs library: https://github.com/prometheus/procfs

Can you open a new issue there with the details:

https://github.com/prometheus/procfs/issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants