-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fio: Add flags to capture latency, bandwidth and IOPS logs #1205
fio: Add flags to capture latency, bandwidth and IOPS logs #1205
Conversation
Add the ability to optionally collect the latency, bandwidth and IOPS logs for a fio job. This is also include the flag fio_log_avg_msec that averages the log entries over the given interval.
FYI... This has not been tested on Windows. Can someone verify if it works on Windows? |
@noahl PTAL, or please recommend someone that might be able to review this. Thanks! |
@meteorfox great patch! Unfortunately I am moving on from spending most of my time on PKB. I think @tedsta is the right person to review this now. |
@@ -460,6 +489,9 @@ def Run(benchmark_spec): | |||
stdout, stderr = vm.RobustRemoteCommand(fio_command, should_log=True) | |||
samples = fio.ParseResults(job_file_string, json.loads(stdout)) | |||
|
|||
if collect_logs: | |||
vm.PullFile(vm_util.GetTempDir(), '%s_*.log' % PKB_FIO_LOG_FILE_NAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little weird to be getting the tmp dir from two different sources. Suggest using vm_util.VM_TMP_DIR
as above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. I'll make the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While going over your suggestion, I remembered now why I didn't use vm_util.VM_TMP_DIR
. I want to store the collected logs under /tmp/pkb/runs/<run_uri>/
so they are grouped with the results for that run, vm_util.VM_TMP_DIR
is the path to just /tmp/pkb/
directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything passed to fio telling to put the logs there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Currently, it defaults to the working directory of where fio is called from. Then vm.PullFile(...)
essentially does an scp
for files ending with _*.log
from fio's working directory to the local directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh I see.
'Whether to collect a bandwidth log of the fio jobs.') | ||
flags.DEFINE_boolean('fio_iops_log', False, | ||
'Whether to collect an IOPS log of the fio jobs.') | ||
flags.DEFINE_integer('fio_log_avg_msec', 1000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fio's --log_avg_msec
defaults to 0. Why do we default to 1000?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, it will log data for every IO completion which can create a huge file. I thought 1 second, was a sensible default, so it aggregates the metrics per 1 second interval.
If you feel, we should use the same as fio as to not "surprise" the user, I can make the change.
(FLAGS.fio_iops_log, '--write_iops_log=%(filename)s',), | ||
(collect_logs, '--log_avg_msec=%(interval)d',)] | ||
fio_command_flags = ' '.join([flag for given, flag in fio_log_flags if given]) | ||
return fio_command_flags % {'filename': PKB_FIO_LOG_FILE_NAME, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want all these logs written to the same file? FIO can have concurrent jobs, so does FIO ensure that writes to any one of these logs is the only write to these logs occurring at any given time? (atomic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see below that these are supposed to be different file names. After staring at this code for several minutes it still looks like they all get the same file name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the argument we pass to the --write_(iops|lat|bw)_log
flags, it's used as a prefix for the log name.
For each of those, fio will create the log files using this format:
<name>_<log_type>.<job_number>.log
Some examples:
pkb_fio_bw.1.log
- Bandwidth log for job 1
pkb_fio_clat.1.log
- Completed IO latency log for job 1
pkb_fio_slat.1.log
- Submitted IO latency log for job 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok thanks for clarifying
Add the ability to optionally collect the latency, bandwidth and IOPS logs
for a fio job. This also includes the flag fio_log_avg_msec that averages
the log entries over the given interval.
These log files produced by fio consist of the current format:
time, rate/latency, data direction, block size
where:
Time: in milliseconds.
Rate/latency: For bandwidth, this is in KB/sec. For latency, it's microseconds.
Data direction: 0 is read, 1 is write.
Block size: block size in bytes.
Since PKB needs to pull the logs from the VMs, we could not simply use the
--fio_parameters
flag. This PR add the ability to fetch those logs.The generated logs are copied from the VMs into the local machine under the generated temporary directory
/tmp/pkb/runs/<run_uri>
.Using these logs more detail graphs such as histograms, or time series can be produced. Example
Note: The graph above was generated using R and ggplot2 as follows: