-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IO metrics #26804
Add IO metrics #26804
Conversation
Here's an example of the new metrics in action: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider using sysfs
(/sys/block/*/stat
) instead of procfs
to avoid atomicity issues.
let mut num_disks = 0; | ||
for line in reader_diskstats.lines() { | ||
let line = line.map_err(|e| e.to_string())?; | ||
let values: Vec<_> = line.split_ascii_whitespace().collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collect()
isn't strictly necessary. could next()
our way to victory on the iterator instead
let values: Vec<_> = line.split_ascii_whitespace().collect(); | ||
|
||
if values.len() != 20 { | ||
return Err("parse error, expected exactly 20 disk stat elements".to_string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're probably screwed here, but would it make sense to log and continue
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking it would be better to not get any metrics rather than potentially report incorrect metrics. Added some tolerance for all 3 kernel variations that I'm aware of (11, 15, or 17 elements)
if values.len() != 20 { | ||
return Err("parse error, expected exactly 20 disk stat elements".to_string()); | ||
} | ||
if values[2].starts_with("loop") || values[1].ne("0") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will double-count at least dm-crypt
volumes.
$ cat /proc/diskstats | grep dm
253 0 dm-0 182486 0 7848082 48716 706874 0 22199432 10941388 0 610468 10990104 0 0 0 0 0 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking this is solved by using sysfs instead of procfs since we'll only look at block devices. Does that sound right to you?
} | ||
|
||
num_disks += 1; | ||
stats.reads_completed += values[3].parse::<u64>().map_err(|e| e.to_string())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly, should we be totally bailing or just continue
ing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking it would be better to not get any metrics rather than potentially report incorrect metrics. Hopefully parsing succeeds the next go around and we just report delta for a longer time period
Problem
Currently have very limited insight into storage device performance.
Summary of Changes
Start tracking aggregated metrics from
/proc/diskstats
to understand storage device performance and bottlenecks