Skip to content

Commit

Permalink
feat(oiiotool): oiiotool --parallel-frames (#3849)
Browse files Browse the repository at this point in the history
Background info: oiiotool multithreads *within* any individual image
operation (such as `--over`), but a series of operations comprising the
oiiotool command are done serially -- each command finishes before the
next starts. And similarly, for multi-frame operations using frame
ranges, frame number wildcards, or view wildcards, the set of commands
for each frame runs to completion before the next begins. For complex
oiiotool commands, especially when running over frame ranges, this is
not a particularly efficient way of parallelizing, since any serialized
operation (which often includes the expensive writing of the result
image from each frame to disk) can substantially serialize the entire
run.

This PR adds a new oiiotool --parallel-frames option that assumes all
frames in the range can be computed independently and in any order, and
can therefore be run in parallel. As long as there are a similar number
(or more) of frames versus cores, and you have enough RAM to handle that
many image pipelines at once, this is a more efficient way of
parallelizing the operations than multithreading individual image
processing operations.

Benchmarking: On my 32-core workstation, I tried the following oiiotool
command, contrived but representative of a reasonable oiiotool command
complexity that might be a production task:

oiiotool --runstats -v --frames 1-100 tmp/noise.exr -cut
2880x1620+960+540 -resize 1920x1080 -colorconvert lin_srgb srgb -d uint8
-o "tmp/test.#.tif"

The noise.exr file was a previously created 4k x 3channel 'half' exr
image. The command loads the file, cuts out a 3k section of it, resizes
that to HD res, does a color space transformation, then outputs as an 8
bit TIFF file, for each of 100 sequential frames.

    threads   real time    user  sys      peak memory
    1         4:44         4:23  0:21     122 MB
    2         2:58         4:24  0:29     122 MB
    4         1:40         4:36  0:29     122 MB
    8         1:10         5:00  0:30     127 MB
    16        0:56         5:28  0:37     131 MB
    all(32)   0:55         8:12  0:54     127 MB

    parallel  0:25         8:13  1:04     3.4 GB

Here we can see the diminishing returns of using multiple threads within
each frame, where for low thread counts we are improving steadily, but
we pretty much peak at 16 threads, seeing virtually no further gain at
32, and maxing out at about 5x the single threaded performance. This is
due to serialization of the I/O and increasingly bumping into various
locking at higher thread levels.

However, using `--parallel-frames` gives us another doubling of total
throughput (giving us a total of 11.3x speedup versus single threaded),
though at the expense of 28x the memory use! That memory explosion is
acceptable in this case, even 3.4 GB is really nothing on a modern
machine, but we can imagine that other possible oiiotool command lines
might be more memory intensive and having dozens of frame pipelines
computing simultaneously would have problematic memory use. So use this
feature carefully and with purpose, always checking that your own
workflows experience sufficient speedup and don't use more memory than
you have available.
  • Loading branch information
lgritz authored Jun 6, 2023
1 parent ee3dc55 commit f40f980
Show file tree
Hide file tree
Showing 4 changed files with 153 additions and 51 deletions.
24 changes: 24 additions & 0 deletions src/doc/oiiotool.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1068,6 +1068,30 @@ output each one to a different file, with names `sub0001.tif`,
frame (rather than the default behavior of exiting immediately and not
even attempting the other frames in the range).

.. option:: --parallel-frames

When iterating over a frame range or views, if this option is used, the
frames will run *concurrently* and not necessarily in any deterministic
order.

Running the range of frames in parallel is helpful in cases where (a)
there are enough frames in the range to make it be better to parallelize
over the range rather than within each operation (rule of thumb: you
should probably have at least as many frames to process as cores
available); (b) it doesn't matter what order the frames are processed in
(e.g., no frames have a dependency on the computed results of earlier
frames); and (c) you have enough memory and I/O bandwidth to handle all
the parallel jobs (probably equal to the number of cores).

Without the `--parallel-frames` option, the frame range will be executed
in increasing numerical order and each frame in the range will run to
completion before the next one starts. Multithreading will be used for the
individual operations done to each frame. This mode is less efficient if
you have more frames than cores available, but it is guaranteed to be safe
even if there are order or data dependencies between your frames, and it

This feature was added to OpenImageIO 2.5.1.

.. option:: --wildcardoff, --wildcardon

These *positional* options turn off (or on) numeric wildcard expansion
Expand Down
153 changes: 112 additions & 41 deletions src/oiiotool/oiiotool.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6519,6 +6519,8 @@ Oiiotool::getargs(int argc, char* argv[])
.help("Views for %V/%v wildcards (comma-separated, defaults to \"left,right\")");
ap.arg("--skip-bad-frames", &ot.skip_bad_frames)
.help("Skip to next frame in range if there's an error, rather than exiting");
ap.arg("--parallel-frames")
.help("Parallelize evaluation of frame range");
ap.arg("--wildcardoff")
.help("Disable numeric wildcard expansion for subsequent command line arguments");
ap.arg("--wildcardon")
Expand Down Expand Up @@ -7055,6 +7057,96 @@ Oiiotool::getargs(int argc, char* argv[])



void
Oiiotool::merge_stats(const Oiiotool& ot)
{
std::lock_guard<std::mutex> lock(m_stat_mutex);
total_readtime.add_ticks(ot.total_readtime.ticks());
total_writetime.add_ticks(ot.total_writetime.ticks());
total_imagecache_readtime += ot.total_imagecache_readtime;
for (auto& t : ot.function_times) {
function_times[t.first] += t.second;
}
peak_memory = std::max(peak_memory, ot.peak_memory);
if (ot.return_value != EXIT_SUCCESS)
return_value = ot.return_value;
num_outputs += ot.num_outputs;
printed_info |= ot.printed_info;
}



static void
one_sequence_iteration(Oiiotool& otmain, size_t i, int frame_number,
cspan<int>(sequence_args),
cspan<std::vector<std::string>> filenames,
cspan<const char*> argv_main)
{
// If another iteration being processed asked us all to abort, don't
// launch this iteration.
if (otmain.ap.aborted())
return;

if (otmain.debug)
print("Begin sequence iteration {}\n", i);

// Prepare the arguments for this iteration
std::vector<const char*> seq_argv(argv_main.begin(), argv_main.end());
for (size_t a : sequence_args) {
seq_argv[a] = filenames[a][i].c_str();
if (otmain.debug)
print(" {} -> {}\n", argv_main[a], seq_argv[a]);
}

Oiiotool otit; // Oiiotool for this iteration
otit.imagecache = otmain.imagecache;
otit.frame_number = frame_number;
otit.getargs((int)seq_argv.size(), (char**)&seq_argv[0]);

if (otit.ap.aborted()) {
if (!otit.skip_bad_frames) {
// If we are allowing bad frames to be a full error, and not just
// skipping the bad frames only, propagate the abort signal to the
// main otmain.
otmain.ap.abort(false);
}
} else {
otmain.process_pending();
if (otmain.pending_callback())
otmain.warning(otmain.pending_callback_name(),
"pending command never executed");
if (!otmain.control_stack.empty())
otmain.warningfmt(otmain.control_stack.top().command,
"unterminated {}",
otmain.control_stack.top().command);
}

// Merge this iteration's stats into the main OT
otmain.merge_stats(otit);

// A few settings that may have occurred in the iteration oiiotool must be
// propagated back up to the main, or certain end-of-run behaviors will be
// wrong.
if (otit.verbose)
otmain.verbose = true;
if (otit.debug)
otmain.debug = true;
if (otit.noerrexit)
otmain.noerrexit = true;
if (otit.runstats) {
std::lock_guard<std::mutex> lock(otmain.m_stat_mutex);
otmain.runstats = true;
print("End sequence iteration {}: {} (total {}) mem {}\n\n", i,
Strutil::timeintervalformat(otit.total_runtime(), 2),
Strutil::timeintervalformat(otmain.total_runtime(), 2),
Strutil::memformat(Sysutil::memory_used()));
} else if (otmain.debug) {
print("\n");
}
}



// Check if any of the command line arguments contains numeric ranges or
// wildcards. If not, just return 'false'. But if they do, the
// remainder of processing will happen here (and return 'true').
Expand Down Expand Up @@ -7116,6 +7208,9 @@ handle_sequence(Oiiotool& ot, int argc, const char** argv)
Strutil::split(argv[++a], views, ",");
} else if (strarg == "--wildcardoff" || strarg == "-wildcardoff") {
wildcard_on = false;
} else if (strarg == "--parallel-frames"
|| strarg == "-parallel-frames") {
ot.parallel_frames = true;
} else if (strarg == "--wildcardon" || strarg == "-wildcardon") {
wildcard_on = true;
} else if (wildcard_on && !is_output_all
Expand Down Expand Up @@ -7209,50 +7304,26 @@ handle_sequence(Oiiotool& ot, int argc, const char** argv)
// substituting the i-th sequence entry for its respective argument
// every time.
// Note: nfilenames really means, number of frame number iterations.
std::vector<const char*> seq_argv(argv, argv + argc + 1);
for (size_t i = 0; i < nfilenames; ++i) {
if (ot.parallel_frames) {
// If --parframes was used, run the iterations in parallel.
if (ot.debug)
std::cout << "SEQUENCE " << i << "\n";
for (size_t a : sequence_args) {
seq_argv[a] = filenames[a][i].c_str();
if (ot.debug)
std::cout << " " << argv[a] << " -> " << seq_argv[a] << "\n";
}

ot.clear_options(); // Careful to reset all command line options!
ot.frame_number = frame_numbers[0][i];
ot.getargs(argc, (char**)&seq_argv[0]);

if (ot.ap.aborted()) {
if (!ot.skip_bad_frames)
break;
else
ot.ap.abort(false);
} else {
ot.process_pending();
if (ot.pending_callback())
ot.warning(ot.pending_callback_name(),
"pending command never executed");
if (!ot.control_stack.empty())
ot.warningfmt(ot.control_stack.top().command, "unterminated {}",
ot.control_stack.top().command);
print("Running {} frames in parallel\n", nfilenames);
parallel_for(
uint64_t(0), uint64_t(nfilenames),
[&](uint64_t i) {
one_sequence_iteration(ot, i, frame_numbers[0][i],
sequence_args, filenames,
{ argv, argv + argc });
},
paropt().minitems(1));
} else {
// Fully serialized over the frame range, multithreaded for each frame
// individually.
for (size_t i = 0; i < nfilenames; ++i) {
one_sequence_iteration(ot, i, frame_numbers[0][i], sequence_args,
filenames, { argv, argv + argc });
}

// Clear the stack at the end of each iteration
ot.curimg.reset();
ot.image_stack.clear();
while (ot.control_stack.size())
ot.control_stack.pop();

if (ot.runstats)
std::cout << "End iteration " << i << ": "
<< Strutil::timeintervalformat(ot.total_runtime(), 2)
<< " " << Strutil::memformat(Sysutil::memory_used())
<< "\n";
if (ot.debug)
std::cout << "\n";
}

return true;
}

Expand Down
7 changes: 7 additions & 0 deletions src/oiiotool/oiiotool.h
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ class Oiiotool {
int autotile;
int frame_padding;
bool eval_enable; // Enable evaluation of expressions
bool parallel_frames = false; // Parallelize over frame iteration
bool skip_bad_frames = false; // Just skip a bad frame, don't exit
bool nostderr = false; // If true, use stdout for errors
bool noerrexit = false; // Don't exit on error
Expand Down Expand Up @@ -160,6 +161,9 @@ class Oiiotool {
int input_bitspersample = 0;
std::map<std::string, std::string> input_channelformats;

// stat_mutex guards when we are merging another ot's stats into this one
std::mutex m_stat_mutex;

Oiiotool();

void clear_options();
Expand Down Expand Up @@ -351,6 +355,9 @@ class Oiiotool {
return opt;
}

// Merge stats from another Oiiotool
void merge_stats(const Oiiotool& ot);

private:
CallbackFunction m_pending_callback;
std::vector<const char*> m_pending_argv;
Expand Down
20 changes: 10 additions & 10 deletions testsuite/oiiotool-control/ref/out.txt
Original file line number Diff line number Diff line change
Expand Up @@ -118,70 +118,70 @@ Testing for i 5,10,2,8 (bad range):
oiiotool ERROR: --for : Invalid range "5,10,2,8"
Full command line was:
> oiiotool -echo "Testing for i 5,10,2,8 (bad range):" --for i 5,10,2,8 --echo " i = {i}" --endfor -echo " "
SEQUENCE 0
Begin sequence iteration 0
copyA.#.jpg -> ./copyA.0001.jpg
copyB.#.jpg -> copyB.0001.jpg
Reading ./copyA.0001.jpg
Output: copyB.0001.jpg
Writing copyB.0001.jpg

SEQUENCE 1
Begin sequence iteration 1
copyA.#.jpg -> ./copyA.0002.jpg
copyB.#.jpg -> copyB.0002.jpg
Reading ./copyA.0002.jpg
Output: copyB.0002.jpg
Writing copyB.0002.jpg

SEQUENCE 2
Begin sequence iteration 2
copyA.#.jpg -> ./copyA.0003.jpg
copyB.#.jpg -> copyB.0003.jpg
Reading ./copyA.0003.jpg
Output: copyB.0003.jpg
Writing copyB.0003.jpg

SEQUENCE 3
Begin sequence iteration 3
copyA.#.jpg -> ./copyA.0004.jpg
copyB.#.jpg -> copyB.0004.jpg
Reading ./copyA.0004.jpg
Output: copyB.0004.jpg
Writing copyB.0004.jpg

SEQUENCE 4
Begin sequence iteration 4
copyA.#.jpg -> ./copyA.0005.jpg
copyB.#.jpg -> copyB.0005.jpg
Reading ./copyA.0005.jpg
Output: copyB.0005.jpg
Writing copyB.0005.jpg

SEQUENCE 5
Begin sequence iteration 5
copyA.#.jpg -> ./copyA.0006.jpg
copyB.#.jpg -> copyB.0006.jpg
Reading ./copyA.0006.jpg
Output: copyB.0006.jpg
Writing copyB.0006.jpg

SEQUENCE 6
Begin sequence iteration 6
copyA.#.jpg -> ./copyA.0007.jpg
copyB.#.jpg -> copyB.0007.jpg
Reading ./copyA.0007.jpg
Output: copyB.0007.jpg
Writing copyB.0007.jpg

SEQUENCE 7
Begin sequence iteration 7
copyA.#.jpg -> ./copyA.0008.jpg
copyB.#.jpg -> copyB.0008.jpg
Reading ./copyA.0008.jpg
Output: copyB.0008.jpg
Writing copyB.0008.jpg

SEQUENCE 8
Begin sequence iteration 8
copyA.#.jpg -> ./copyA.0009.jpg
copyB.#.jpg -> copyB.0009.jpg
Reading ./copyA.0009.jpg
Output: copyB.0009.jpg
Writing copyB.0009.jpg

SEQUENCE 9
Begin sequence iteration 9
copyA.#.jpg -> ./copyA.0010.jpg
copyB.#.jpg -> copyB.0010.jpg
Reading ./copyA.0010.jpg
Expand Down

0 comments on commit f40f980

Please sign in to comment.