Skip to content

Commit

Permalink
Implement changes based on comments
Browse files Browse the repository at this point in the history
Signed-off-by: Ooi, Boon Sin <[email protected]>
  • Loading branch information
boonsino committed Nov 13, 2024
1 parent 4a6ab42 commit a751a2f
Show file tree
Hide file tree
Showing 8 changed files with 56 additions and 14 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,13 @@ There are several options for setting the number of inference iterations:
The more iterations a model runs, the better the statistics will be for determining
average latency and throughput.

Maximum inference rate
++++++++++++++++++++++

By default, the benchmarking app will run inference at maximum rate based on device capabilities.
The maximum inferance rate can be configured by ``-max_irate <MAXIMUM_INFERENCE_RATE>`` option.
Tweaking this value allow better accuracy in power usage measurement by limiting the number of executions.

Inputs
++++++++++++++++++++

Expand Down Expand Up @@ -337,7 +344,7 @@ following usage message:
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
usage: benchmark_app.py [-h [HELP]] [-i PATHS_TO_INPUT [PATHS_TO_INPUT ...]] -m PATH_TO_MODEL [-d TARGET_DEVICE]
[-hint {throughput,cumulative_throughput,latency,none}] [-niter NUMBER_ITERATIONS] [-t TIME] [-b BATCH_SIZE] [-shape SHAPE]
[-hint {throughput,cumulative_throughput,latency,none}] [-niter NUMBER_ITERATIONS] [-max_irate MAXIMUM_INFERENCE_RATE] [-t TIME] [-b BATCH_SIZE] [-shape SHAPE]
[-data_shape DATA_SHAPE] [-layout LAYOUT] [-extensions EXTENSIONS] [-c PATH_TO_CLDNN_CONFIG] [-cdir CACHE_DIR] [-lfile [LOAD_FROM_FILE]]
[-api {sync,async}] [-nireq NUMBER_INFER_REQUESTS] [-nstreams NUMBER_STREAMS] [-inference_only [INFERENCE_ONLY]]
[-infer_precision INFER_PRECISION] [-ip {bool,f16,f32,f64,i8,i16,i32,i64,u8,u16,u32,u64}]
Expand Down Expand Up @@ -536,6 +543,9 @@ following usage message:
'none': no device performance mode will be set.
Using explicit 'nstreams' or other device-specific options, please set hint to 'none'
-niter <integer> Optional. Number of iterations. If not specified, the number of iterations is calculated depending on a device.
-max_irate <float> Optional. Maximum inference rate by frame per second.
If not specified, default value is 0, the inference will run at maximium rate depending on a device capabilities.
Tweaking this value allow better accuracy in power usage measurement by limiting the execution.
-t Optional. Time in seconds to execute topology.
Input shapes
Expand Down
13 changes: 7 additions & 6 deletions samples/cpp/benchmark_app/benchmark_app.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -65,10 +65,11 @@ static const char cache_dir_message[] = "Optional. Enables caching of loaded mod
static const char load_from_file_message[] = "Optional. Loads model from file directly without read_model."
" All CNNNetwork options (like re-shape) will be ignored";

/// @brief message for run frequency
static const char run_frequency_message[] =
"Execute at a fixed frequency. Note if the targeted rate per second cannot be reached, "
"the benchmark would start the next run immediately, trying its best to catch up.";
/// @brief message for maximum inference rate
static const char maximum_inference_rate_message[] =
"Optional. Maximum inference rate by frame per second"
"If not specified, default value is 0, the inference will run at maximium rate depending on a device capabilities. "
"Tweaking this value allow better accuracy in power usage measurement by limiting the execution.";

/// @brief message for execution time
static const char execution_time_message[] = "Optional. Time in seconds to execute topology.";
Expand Down Expand Up @@ -313,7 +314,7 @@ DEFINE_string(api, "async", api_message);
DEFINE_uint64(nireq, 0, infer_requests_count_message);

/// @brief Execute infer requests at a fixed frequency
DEFINE_double(rfreq, 0, run_frequency_message);
DEFINE_double(max_irate, 0, maximum_inference_rate_message);

/// @brief Number of streams to use for inference on the CPU (also affects Hetero cases)
DEFINE_string(nstreams, "", infer_num_streams_message);
Expand Down Expand Up @@ -396,7 +397,7 @@ static void show_usage() {
std::cout << " -hint <performance hint> (latency or throughput or cumulative_throughput or none) "
<< hint_message << std::endl;
std::cout << " -niter <integer> " << iterations_count_message << std::endl;
std::cout << " -rfreq \"<float>\" " << run_frequency_message << std::endl;
std::cout << " -max_irate \"<float>\" " << maximum_inference_rate_message << std::endl;
std::cout << " -t " << execution_time_message << std::endl;
std::cout << std::endl;
std::cout << "Input shapes" << std::endl;
Expand Down
4 changes: 2 additions & 2 deletions samples/cpp/benchmark_app/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1155,8 +1155,8 @@ int main(int argc, char* argv[]) {
execTime = std::chrono::duration_cast<ns>(Time::now() - startTime).count();
processedFramesN += batchSize;

if (FLAGS_rfreq > 0) {
int64_t nextRunFinishTime = 1 / FLAGS_rfreq * processedFramesN * 1.0e9;
if (FLAGS_max_irate > 0) {
int64_t nextRunFinishTime = 1 / FLAGS_max_irate * processedFramesN * 1.0e9;
std::this_thread::sleep_for(std::chrono::nanoseconds(nextRunFinishTime - execTime));
}
}
Expand Down
12 changes: 11 additions & 1 deletion tests/samples_tests/smoke_tests/test_benchmark_app.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,16 @@ def create_random_4bit_bin_file(tmp_path, shape, name):
f.write(raw_data)


def verify(sample_language, device, api=None, nireq=None, shape=None, data_shape=None, nstreams=None, layout=None, pin=None, cache=None, tmp_path=None, model='bvlcalexnet-12.onnx', inp='dog-224x224.bmp', batch='1', niter='10', tm=None):
def verify(sample_language, device, api=None, nireq=None, shape=None, data_shape=None, nstreams=None,
layout=None, pin=None, cache=None, tmp_path=None, model='bvlcalexnet-12.onnx',
inp='dog-224x224.bmp', batch='1', niter='10', max_irate=None, tm=None):
output = get_cmd_output(
get_executable(sample_language),
*prepend(cache, inp, model, tmp_path),
*('-nstreams', nstreams) if nstreams else '',
*('-layout', layout) if layout else '',
*('-nireq', nireq) if nireq else '',
*('-max_irate', max_irate) if max_irate else '',
*('-shape', shape) if shape else '',
*('-data_shape', data_shape) if data_shape else '',
*('-hint', 'none') if nstreams or pin else '',
Expand Down Expand Up @@ -84,6 +87,13 @@ def test_nireq(sample_language, api, nireq, device, cache, tmp_path):
verify(sample_language, device, api=api, nireq=nireq, cache=cache, tmp_path=tmp_path)


@pytest.mark.parametrize('sample_language', ['C++', 'Python'])
@pytest.mark.parametrize('max_irate', ['', '0', '10'])
@pytest.mark.parametrize('device', get_devices())
def test_max_irate(sample_language, device, max_irate, cache, tmp_path):
verify(sample_language, device, max_irate=max_irate, cache=cache, tmp_path=tmp_path)


@pytest.mark.skipif('CPU' not in get_devices(), reason='affinity is a CPU property')
@pytest.mark.parametrize('sample_language', ['C++', 'Python'])
@pytest.mark.parametrize('pin', ['YES', 'NO', 'NUMA', 'HYBRID_AWARE'])
Expand Down
23 changes: 20 additions & 3 deletions tools/benchmark_tool/openvino/tools/benchmark/benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# SPDX-License-Identifier: Apache-2.0

import os
import time
from datetime import datetime
from math import ceil
from openvino.runtime import Core, get_version, AsyncInferQueue
Expand All @@ -15,7 +16,8 @@ def percentile(values, percent):

class Benchmark:
def __init__(self, device: str, number_infer_requests: int = 0, number_iterations: int = None,
duration_seconds: int = None, api_type: str = 'async', inference_only = None):
duration_seconds: int = None, api_type: str = 'async', inference_only = None,
maximum_inference_rate: float = 0):
self.device = device
self.core = Core()
self.nireq = number_infer_requests if api_type == 'async' else 1
Expand All @@ -24,6 +26,7 @@ def __init__(self, device: str, number_infer_requests: int = 0, number_iteration
self.api_type = api_type
self.inference_only = inference_only
self.latency_groups = []
self.max_irate = maximum_inference_rate

def __del__(self):
del self.core
Expand Down Expand Up @@ -83,24 +86,34 @@ def first_infer(self, requests):
requests.wait_all()
return requests[id].latency

def inference_rate_delay(self, processed_frames, exec_time):
if self.max_irate > 0:
nextRunFinishTime = 1 / self.max_irate * processed_frames
delay = nextRunFinishTime - exec_time
time.sleep(delay if delay > 0 else 0)

def sync_inference(self, request, data_queue):
processed_frames = 0
exec_time = 0
iteration = 0
times = []
start_time = datetime.utcnow()
while (self.niter and iteration < self.niter) or \
(self.duration_seconds and exec_time < self.duration_seconds):
processed_frames += data_queue.get_next_batch_size()
if self.inference_only == False:
request.set_input_tensors(data_queue.get_next_input())
request.infer()
times.append(request.latency)
iteration += 1

exec_time = (datetime.utcnow() - start_time).total_seconds()
self.inference_rate_delay(processed_frames, exec_time)
total_duration_sec = (datetime.utcnow() - start_time).total_seconds()
return sorted(times), total_duration_sec, iteration

def async_inference_only(self, infer_queue):
def async_inference_only(self, infer_queue, data_queue):
processed_frames = 0
exec_time = 0
iteration = 0
times = []
Expand All @@ -109,6 +122,7 @@ def async_inference_only(self, infer_queue):
while (self.niter and iteration < self.niter) or \
(self.duration_seconds and exec_time < self.duration_seconds) or \
(iteration % self.nireq):
processed_frames += data_queue.get_next_batch_size()
idle_id = infer_queue.get_idle_request_id()
if idle_id in in_fly:
times.append(infer_queue[idle_id].latency)
Expand All @@ -118,6 +132,8 @@ def async_inference_only(self, infer_queue):
iteration += 1

exec_time = (datetime.utcnow() - start_time).total_seconds()
self.inference_rate_delay(processed_frames, exec_time)

infer_queue.wait_all()
total_duration_sec = (datetime.utcnow() - start_time).total_seconds()
for infer_request_id in in_fly:
Expand Down Expand Up @@ -149,6 +165,7 @@ def async_inference_full_mode(self, infer_queue, data_queue, pcseq):
iteration += 1

exec_time = (datetime.utcnow() - start_time).total_seconds()
self.inference_rate_delay(processed_frames, exec_time)
infer_queue.wait_all()
total_duration_sec = (datetime.utcnow() - start_time).total_seconds()

Expand All @@ -164,7 +181,7 @@ def main_loop(self, requests, data_queue, batch_size, latency_percentile, pcseq)
times, total_duration_sec, iteration = self.sync_inference(requests[0], data_queue)
fps = len(batch_size) * iteration / total_duration_sec
elif self.inference_only:
times, total_duration_sec, iteration = self.async_inference_only(requests)
times, total_duration_sec, iteration = self.async_inference_only(requests, data_queue)
fps = len(batch_size) * iteration / total_duration_sec
else:
times, total_duration_sec, processed_frames, iteration = self.async_inference_full_mode(requests, data_queue, pcseq)
Expand Down
3 changes: 2 additions & 1 deletion tools/benchmark_tool/openvino/tools/benchmark/main.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,8 @@ def is_flag_set_in_command_line(flag):
next_step(step_id=2)

benchmark = Benchmark(args.target_device, args.number_infer_requests,
args.number_iterations, args.time, args.api_type, args.inference_only)
args.number_iterations, args.time, args.api_type,
args.inference_only, args.maximum_inference_rate)

if args.extensions:
benchmark.add_extension(path_to_extensions=args.extensions)
Expand Down
3 changes: 3 additions & 0 deletions tools/benchmark_tool/openvino/tools/benchmark/parameters.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,9 @@ def parse_args():
args.add_argument('-niter', '--number_iterations', type=check_positive, required=False, default=None,
help='Optional. Number of iterations. '
'If not specified, the number of iterations is calculated depending on a device.')
args.add_argument('-max_irate', '--maximum_inference_rate', type=float, required=False, default=0,
help='Optional. Maximum inference rate by frame per second. '
'If not specified, the inference will run at maximum rate based on a device capabilities.')
args.add_argument('-t', '--time', type=check_positive, required=False, default=None,
help='Optional. Time in seconds to execute topology.')

Expand Down
Empty file modified tools/benchmark_tool/setup.py
100644 → 100755
Empty file.

0 comments on commit a751a2f

Please sign in to comment.