Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logical_disk collectors for Disk Read latency and Write latency #380

Closed
manohrn opened this issue Aug 9, 2019 · 10 comments
Closed

Logical_disk collectors for Disk Read latency and Write latency #380

manohrn opened this issue Aug 9, 2019 · 10 comments
Labels

Comments

@manohrn
Copy link

manohrn commented Aug 9, 2019

Do you have the repo which has the Logical_disk collectors for Disk Read latency and Write latency.
Can anyone help me to get the updated Logical _disk collectors MSI file as it is very much needed for our clients.

Please help me ASAP

@breed808
Copy link
Contributor

This should be resolved as of 853d615

@robodair
Copy link

robodair commented Oct 4, 2019

Looks like the metrics added in 853d615 weren't multiplied by ticksToSecondsScaleFactor but this should be fixed in #400.

For now I'm using rate(wmi_logical_disk_read_latency_seconds_total[1m]) / 1e7 to get disk latency in seconds.

@cameronkerrnz
Copy link

I don't understand why wmi_logical_disk_read_latency_seconds_total is a counter and not a gauge.

# HELP wmi_logical_disk_read_latency_seconds_total Shows the average time, in seconds, of a read operation from the disk (LogicalDisk.AvgDiskSecPerRead)
# TYPE wmi_logical_disk_read_latency_seconds_total counter
wmi_logical_disk_read_latency_seconds_total{volume="C:"} 636.2798569
wmi_logical_disk_read_latency_seconds_total{volume="D:"} 184.26475879999998
wmi_logical_disk_read_latency_seconds_total{volume="HarddiskVolume1"} 0.004719
wmi_logical_disk_read_latency_seconds_total{volume="HarddiskVolume2"} 0.015182699999999999

Installed today, version is:

wmi_exporter_build_info{branch="master",goversion="go1.12.3",revision="012b938b5451e5d10e2bb364876aac66cd85c54e",version="0.9.0"} 1

If this is really meant to be a counter, then how does it compare to wmi_logical_disk_read_seconds_total, which would make sence as rate() would be essentially dividing that over the time difference, and you would have an average. eg. compare the following

rate(wmi_logical_disk_read_seconds_total[1m])
rate(wmi_logical_disk_read_latency_seconds_total[1m])

Perhaps worthy of a note in the docs at least.

Thanks,
Cameron

@breed808
Copy link
Contributor

Good spot Cameron!

My understanding is that the wmi_logical_disk_read_seconds_total metric is the total number of seconds the disk has spent reading, while the wmi_logical_disk_read_latency_seconds_total metric is the total number of seconds the disk has spent waiting to perform a read operation.

This does not appear to be the case as both metrics are returning the same value.

@breed808
Copy link
Contributor

Looking at the values returned directly by Perflib via Get-Counter, it looks like the Avg. Disk sec/Read metric should be a gauge type, returning the average read request latency.
The % Disk Read Time should also be a gauge type returning a percentage. There doesn't appear to be a read_seconds_total value returned at all.

I think the underlying perflib values need to be fixed, and the wmi_logical_disk_read_seconds_total needs to be re-named to wmi_logical_disk_read_percentage.

The above would also need to be applied to the wmi_logical_disk_write_seconds_total and wmi_logical_disk_write_latency_seconds_total metrics.

@carlpett
Copy link
Collaborator

carlpett commented Jan 7, 2020

@breed808 When we pulled the data with WMI, I'm fairly sure the values were seconds. Could it be the perflib data has a _base value corresponding to it?
It would be preferable to keep the seconds-counter rather than a percentage gauge (if possible, of course!), and it seems strange WMI and perflib would differ here.

@breed808
Copy link
Contributor

breed808 commented Jan 8, 2020

@carlpett It's been a while so I'll need to check that the counters were returning the same value prior to the perflib rewrite (I.E. v0.8.3 or older). My testing during the perflib rewrite was to ensure that the perflib metrics returned the same value as the WMI metrics.

I'll check the above and also see if there are any base perflib counters that could be used.

@breed808
Copy link
Contributor

breed808 commented Jan 10, 2020

Ok, I've checked both v0.8.3 and master and there is a discrepancy with wmi_logical_disk_read_latency_seconds_total. Note that 192.168.88.43:9182 is v0.8.3.
logical_disk_metrics

I see two issues:

  1. The WMI version (v0.8.3) reports an incredible large value for the wmi_logical_disk_read_latency_seconds_total metric.

  2. The Perflib version reports the same value for both metrics.

wmi_logical_disk_read_seconds_total is sourced from the perflib PercentDiskReadTime counter, while wmi_logical_disk_read_latency_seconds_total is sourced from the perflib AvgDiskSecPerRead counter.

I will look to see if another perflib counter/base counter may be used or is applicable for either of these two metrics.

@breed808
Copy link
Contributor

I've checked the base counter for perflib:Avg. Disk sec/Read" and perflib:Avg. Disk sec/Read" with no luck, results are still the same.

Perhaps we should consider retiring one of the two identical metrics?

Copy link

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

@github-actions github-actions bot added the Stale label Nov 25, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants