-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(oiiotool): oiiotool --parallel-frames #3849
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Background info: oiiotool multithreads *within* any individual image operation (such as `--over`), but a series of operations comprising the oiiotool command are done serially -- each command finishes before the next starts. And similarly, for multi-frame operations using frame ranges, frame number wildcards, or view wildcards, the set of commands for each frame runs to completion before the next begins. For complex oiiotool commands, especially when running over frame ranges, this is not a particularly effient way of parallelizing, since any serialized operation (which often includes the expensive writing of the result image from each frame to disk) can substantially serialize the entire run. This PR adds a new oiiotool --parallel-frames option that assumes all frames in the range can be computed independently and in any order, and can therefore be run in parallel. As long as there are a similar number (or more) of frames versus cores, and you have enough RAM to handle that many image pipelines at once, this is a more efficient way of parallelizing the operations than multithreading individual image processing operations. Benchmarking: On my 32-core workstation, I tried the following oiiotool command, contrived but representative of a reasonable oiiotool command complexity that might be a production task: oiiotool --runstats -v --frames 1-100 tmp/noise.exr -cut 2880x1620+960+540 -resize 1920x1080 -colorconvert lin_srgb srgb -d uint8 -o "tmp/test.#.tif" The noise.exr file was a previously created 4k x 3channel 'half' exr image. The command loads he file, cuts a 3k section of it, resizes that to HD res, does a color spaace transformation, then outputs as an 8 bit TIFF file, for each of 100 sequential frames. threads real time user sys peak memory 1 4:44 4:23 0:21 122 MB 2 2:58 4:24 0:29 122 MB 4 1:40 4:36 0:29 122 MB 8 1:10 5:00 0:30 127 MB 16 0:56 5:28 0:37 131 MB all(32) 0:55 8:12 0:54 127 MB parallel 0:25 8:13 1:04 3.4 GB Here we can see the diminishing returns of using multiple threads within each frame, where for low thread counts we are improving steadily, but we pretty much peak at 16 threads, seeing virtually no further gain at 32, and maxing out at about 5x the single threaded performance. This is due to serialization of the I/O and increasingly bumping into various locking at higher thread levels. However, using `--parallel-frames` gives us another doubling of total throughput (giving us a total of 11.3x speedup versus single threaded), though at the expense of 28x the memory use! That memory explosion is acceptable in this case, even 3.4 GB is really nothing on a modern machine, but we can imagine that other possible oiiotool command lines might be more memory intensive and having dozens of frame pipelines computing simultaneously would have problematic memory use. So use this feature carefully and with purpose, always checking that your own workflows experience sufficient speedup and don't use more memory than you have available.
lgritz
added a commit
to lgritz/OpenImageIO
that referenced
this pull request
Jun 7, 2023
…#3849) Background info: oiiotool multithreads *within* any individual image operation (such as `--over`), but a series of operations comprising the oiiotool command are done serially -- each command finishes before the next starts. And similarly, for multi-frame operations using frame ranges, frame number wildcards, or view wildcards, the set of commands for each frame runs to completion before the next begins. For complex oiiotool commands, especially when running over frame ranges, this is not a particularly efficient way of parallelizing, since any serialized operation (which often includes the expensive writing of the result image from each frame to disk) can substantially serialize the entire run. This PR adds a new oiiotool --parallel-frames option that assumes all frames in the range can be computed independently and in any order, and can therefore be run in parallel. As long as there are a similar number (or more) of frames versus cores, and you have enough RAM to handle that many image pipelines at once, this is a more efficient way of parallelizing the operations than multithreading individual image processing operations. Benchmarking: On my 32-core workstation, I tried the following oiiotool command, contrived but representative of a reasonable oiiotool command complexity that might be a production task: oiiotool --runstats -v --frames 1-100 tmp/noise.exr -cut 2880x1620+960+540 -resize 1920x1080 -colorconvert lin_srgb srgb -d uint8 -o "tmp/test.#.tif" The noise.exr file was a previously created 4k x 3channel 'half' exr image. The command loads the file, cuts out a 3k section of it, resizes that to HD res, does a color space transformation, then outputs as an 8 bit TIFF file, for each of 100 sequential frames. threads real time user sys peak memory 1 4:44 4:23 0:21 122 MB 2 2:58 4:24 0:29 122 MB 4 1:40 4:36 0:29 122 MB 8 1:10 5:00 0:30 127 MB 16 0:56 5:28 0:37 131 MB all(32) 0:55 8:12 0:54 127 MB parallel 0:25 8:13 1:04 3.4 GB Here we can see the diminishing returns of using multiple threads within each frame, where for low thread counts we are improving steadily, but we pretty much peak at 16 threads, seeing virtually no further gain at 32, and maxing out at about 5x the single threaded performance. This is due to serialization of the I/O and increasingly bumping into various locking at higher thread levels. However, using `--parallel-frames` gives us another doubling of total throughput (giving us a total of 11.3x speedup versus single threaded), though at the expense of 28x the memory use! That memory explosion is acceptable in this case, even 3.4 GB is really nothing on a modern machine, but we can imagine that other possible oiiotool command lines might be more memory intensive and having dozens of frame pipelines computing simultaneously would have problematic memory use. So use this feature carefully and with purpose, always checking that your own workflows experience sufficient speedup and don't use more memory than you have available.
lgritz
added a commit
to imageworks/OpenImageIO
that referenced
this pull request
Jun 7, 2023
…#3849) Background info: oiiotool multithreads *within* any individual image operation (such as `--over`), but a series of operations comprising the oiiotool command are done serially -- each command finishes before the next starts. And similarly, for multi-frame operations using frame ranges, frame number wildcards, or view wildcards, the set of commands for each frame runs to completion before the next begins. For complex oiiotool commands, especially when running over frame ranges, this is not a particularly efficient way of parallelizing, since any serialized operation (which often includes the expensive writing of the result image from each frame to disk) can substantially serialize the entire run. This PR adds a new oiiotool --parallel-frames option that assumes all frames in the range can be computed independently and in any order, and can therefore be run in parallel. As long as there are a similar number (or more) of frames versus cores, and you have enough RAM to handle that many image pipelines at once, this is a more efficient way of parallelizing the operations than multithreading individual image processing operations. Benchmarking: On my 32-core workstation, I tried the following oiiotool command, contrived but representative of a reasonable oiiotool command complexity that might be a production task: oiiotool --runstats -v --frames 1-100 tmp/noise.exr -cut 2880x1620+960+540 -resize 1920x1080 -colorconvert lin_srgb srgb -d uint8 -o "tmp/test.#.tif" The noise.exr file was a previously created 4k x 3channel 'half' exr image. The command loads the file, cuts out a 3k section of it, resizes that to HD res, does a color space transformation, then outputs as an 8 bit TIFF file, for each of 100 sequential frames. threads real time user sys peak memory 1 4:44 4:23 0:21 122 MB 2 2:58 4:24 0:29 122 MB 4 1:40 4:36 0:29 122 MB 8 1:10 5:00 0:30 127 MB 16 0:56 5:28 0:37 131 MB all(32) 0:55 8:12 0:54 127 MB parallel 0:25 8:13 1:04 3.4 GB Here we can see the diminishing returns of using multiple threads within each frame, where for low thread counts we are improving steadily, but we pretty much peak at 16 threads, seeing virtually no further gain at 32, and maxing out at about 5x the single threaded performance. This is due to serialization of the I/O and increasingly bumping into various locking at higher thread levels. However, using `--parallel-frames` gives us another doubling of total throughput (giving us a total of 11.3x speedup versus single threaded), though at the expense of 28x the memory use! That memory explosion is acceptable in this case, even 3.4 GB is really nothing on a modern machine, but we can imagine that other possible oiiotool command lines might be more memory intensive and having dozens of frame pipelines computing simultaneously would have problematic memory use. So use this feature carefully and with purpose, always checking that your own workflows experience sufficient speedup and don't use more memory than you have available.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background info: oiiotool multithreads within any individual image operation (such as
--over
), but a series of operations comprising the oiiotool command are done serially -- each command finishes before the next starts. And similarly, for multi-frame operations using frame ranges, frame number wildcards, or view wildcards, the set of commands for each frame runs to completion before the next begins. For complex oiiotool commands, especially when running over frame ranges, this is not a particularly efficient way of parallelizing, since any serialized operation (which often includes the expensive writing of the result image from each frame to disk) can substantially serialize the entire run.This PR adds a new oiiotool --parallel-frames option that assumes all frames in the range can be computed independently and in any order, and can therefore be run in parallel. As long as there are a similar number (or more) of frames versus cores, and you have enough RAM to handle that many image pipelines at once, this is a more efficient way of parallelizing the operations than multithreading individual image processing operations.
Benchmarking: On my 32-core workstation, I tried the following oiiotool command, contrived but representative of a reasonable oiiotool command complexity that might be a production task:
The noise.exr file was a previously created 4k x 3channel 'half' exr image. The command loads the file, cuts out a 3k section of it, resizes that to HD res, does a color space transformation, then outputs as an 8 bit TIFF file, for each of 100 sequential frames.
Here we can see the diminishing returns of using multiple threads within each frame, where for low thread counts we are improving steadily, but we pretty much peak at 16 threads, seeing virtually no further gain at 32, and maxing out at about 5x the single threaded performance. This is due to serialization of the I/O and increasingly bumping into various locking at higher thread levels.
However, using
--parallel-frames
gives us another doubling of total throughput (giving us a total of 11.3x speedup versus single threaded), though at the expense of 28x the memory use! That memory explosion is acceptable in this case, even 3.4 GB is really nothing on a modern machine, but we can imagine that other possible oiiotool command lines might be more memory intensive and having dozens of frame pipelines computing simultaneously would have problematic memory use. So use this feature carefully and with purpose, always checking that your own workflows experience sufficient speedup and don't use more memory than you have available.