Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows shared drive monitoring #1668

Open
dkrdj opened this issue Oct 4, 2024 · 24 comments
Open

Windows shared drive monitoring #1668

dkrdj opened this issue Oct 4, 2024 · 24 comments

Comments

@dkrdj
Copy link

dkrdj commented Oct 4, 2024

Problem Statement

After reviewing comment #507, I found a way to get shared drive statistics(total_space, free_space).

Proposed Solution

Using WMI class (Win32_LogicalDisk) in logical_disk.go, it can be solved.

Additional information

If you allow me, i would like to contribute.

Acceptance Criteria

No response

@jkroepke
Copy link
Member

jkroepke commented Oct 4, 2024

Hi @dkrdj

I'm currently removing all dependencies against WMI, if possible.

Is there an Win32 API replacement?

And I'm currently not sure, which collector should do it.

@dkrdj
Copy link
Author

dkrdj commented Oct 8, 2024

Hi @jkroepke

I can't find performance counter for shared drive either.

But I found procedure GetDiskFreeSpaceExW in kernel32.dll.

It can also get free_space and total_space of shared drive.

If the reason of removing WMI classes is performance, It can replace WMI class(Win32_LogicalDisk).

@jkroepke
Copy link
Member

jkroepke commented Oct 8, 2024

Yes, potential memory leaks and performance issues are the reasons, because we have around 20+ collectors and it's not possible to query WMI in parallel.

@JDA88
Copy link
Contributor

JDA88 commented Oct 8, 2024

Network Drives on Windows are most of the time linked to a user (the one that did the mount, or the credential used for the mount).
If the same target is mounted by two users and ...

  • ... one have an access denied and the other no, you have different states.
  • ... the target have space quota enabled it can report different used / free space.

And I am definitely forgetting other edge cases :)

@dkrdj
Copy link
Author

dkrdj commented Oct 8, 2024

Network drive is linked with user. and msi file adds windows exporter in services. So finally SYSTEM user runs this exporter. Who wants to see info of shared drives needs to mount in SYSTEM user. This means the user can decide what to see.

I wanna see only the free space and total space about shared drives. some drives are in server. so it doesnt need to see. but some other drives are connected to NAS gateway. I need to check this not to be full.

I already added WMI class to watch this, but I also think someone needs it too.

@jkroepke
Copy link
Member

So in total, it doesn't make sense to include that collector?

@JDA88
Copy link
Contributor

JDA88 commented Oct 12, 2024

IMO Network drives are user "perspective" in 99% of the case and should be deal with special scripts / custom metric files. But it's just my point of view.

@dkrdj
Copy link
Author

dkrdj commented Oct 12, 2024

Yes, a script is required to use that feature. Alternatively, it can be registered as a startup program by the user. Of course, to use it this way, the existing MSI file won't suffice.

However, how about considering this? Most users will likely use this program for basic resource monitoring. By basic resources, I mean CPU, memory, and disk space. Looking at the issues I raised, there are users who want that feature.

There is a problem with network drives being applied individually depending on the user, but I believe there are several solutions to address this.

  1. Allow the user to choose whether to register it as a service or as a startup program during installation.
  2. (This would be in the distant future) Utilize the advantages of Windows to operate it as a startup program rather than as a service through a GUI.

Please decide whether to implement this feature or to seek new solutions.

@JDA88
Copy link
Contributor

JDA88 commented Oct 12, 2024

However, how about considering this? Most users will likely use this program for basic resource monitoring.

I'm honestly curious about this! We use it in enterprise with dedicated service accounts, automatic deployment platform and a thousand targets. but I have no idea if the project has any way to know their “most used” scenarios

But back on the subject I am not at all against a network drives collector, the documentation just have to be clear about what it can do or not, to avoid issue opened latter about « exporter cannot see the drive x: » just because it is mounted by a different user.

@jkroepke
Copy link
Member

What about pass credentials to windows exporter and collector may initiate connections on they own?

@JDA88
Copy link
Contributor

JDA88 commented Oct 12, 2024

It would work.
ICollector will need to handle some logic like: "If not connected, connect" on every pull, because if the account passed is used by something else it might interact with the drive also.
Timeout on connection will have to be handle also.

@jkroepke
Copy link
Member

https://learn.microsoft.com/de-de/windows/win32/api/fileapi/nf-fileapi-getdiskfreespaceexw does not need a drive. It could connect to an UNC path directly.

Instead monitor local Network drives, monitor remote shares directly.

@JDA88
Copy link
Contributor

JDA88 commented Oct 13, 2024

Instead monitor local Network drives, monitor remote shares directly.

True, but you still need to check if the connexion is on (with credential if provided) on every pull + timeout management.

@jkroepke
Copy link
Member

Yeah, timeout management with sync blocking syscalls is not easy.

@dkrdj
Copy link
Author

dkrdj commented Oct 14, 2024

I think I need to think a bit more about how the Windows exporter connects directly to the network location to gather information. This is because there is an inconvenience of having to change the Windows exporter settings again when the path of the network drive is changed, added, or deleted while operating the server.

In my case, I initially mounted the network drive to the SYSTEM user and used it at first. However, I had to remount it to the SYSTEM user every time I rebooted, and I executed a script using Windows scheduling. So I determined that this was not an appropriate solution (for the reasons mentioned above too).

So, I changed the way to register it in the startup programs using Visual Basic instead of running it as a service. By doing this, I could run the Windows exporter with the user I wanted to run with, and I could also obtain the status of the network drive connected by that user. Of course, there was no need to register the script in Windows scheduling.

If a program registered as a service could run not as SYSTEM but as target user, that would be the most convenient solution, but for now, I have not found that.

@jkroepke
Copy link
Member

jkroepke commented Oct 14, 2024

If you already have to use scripts for that setup, I would recommend to use an script to collect metrics and use the textfile collector in windows_exporter.

From maintainer perspective, i have see an increasing volume because users not understand that network drives as scoped to user session.

I can look the msi installer to setup the exporter as user instead as SYSTEM, but I will not accept such an collector.

@dkrdj
Copy link
Author

dkrdj commented Oct 15, 2024

In metrics of the logical disk, there's a message like this.

HELP windows_logical_disk_size_bytes Total space in bytes, updates every 10-15 min (LogicalDisk.PercentFreeSpace_Base)

So, wouldn't it be unnecessary to pull every request?

How about collecting the information of the network drives using UNC path every 10 minutes with a goroutine and storing it in heap memory?

Then, exporter can provide network drive's info from the heap for each request.

By doing this, it seems like managing timeout would be easier.

@JDA88
Copy link
Contributor

JDA88 commented Oct 15, 2024

For the windows_logical_disk_size_bytes, the comment was added after I reported issue #830 and it's more a bug that should be fixed a some point than a "feature" :)

But I think in this case you are right about the idea below + a comment!

How about collecting the information of the network drives using UNC path every 10 minutes with a goroutine and storing it in heap memory?

@jkroepke
Copy link
Member

Is #830 still relevant?

@JDA88
Copy link
Contributor

JDA88 commented Oct 15, 2024

It was marked as stale then close but IMO it should be fixed at some point.

Currently the value is probably retrieved with WMI and there is a cache of 5m, that can delay alerting.
Native system call should give the true value in real time. There might be an impact on performance, and in that case I wouldn’t mind a 1min cache, but 5m is too long.

@jkroepke
Copy link
Member

Performance Counters is used as source. Thats, what other monitoring systems like Datadog are using.

And the performance counters are monitor only the logical disks of local hard/fixed drives.

@jkroepke
Copy link
Member

Even with an go routine, WNetAddConnection2 takes (up to 10?) minutes to timeout in case the share is not availible on network layer. Thats to much and a too odd user experience.

@JDA88
Copy link
Contributor

JDA88 commented Oct 15, 2024

Performance Counters is used as source. Thats, what other monitoring systems like Datadog are using.

If I’m the only one bothered by the 5min refresh rate of logical disk space counters I’m fine with the issue been closed.
It’s easy enough to generate metrics with Get-PSDrive -Name C | Select-Object Used, Free that give me real time values.

@awohlmut71
Copy link

we use this: Get-WmiObject -Class Win32_MappedLogicalDisk
for us it's necessary to have the metric related to the VM CI for eventrouting and not send to storage team
thx Andi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants