-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #3600 from patricksjackson/dd-reuse-buffer
dd: reuse buffer for the most common cases
- Loading branch information
Showing
2 changed files
with
82 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# Benchmarking dd | ||
|
||
`dd` is a utility used for copying and converting files. It is often used for | ||
writing directly to devices, such as when writing an `.iso` file directly to a | ||
drive. | ||
|
||
## Understanding dd | ||
|
||
At the core, `dd` has a simple loop of operation. It reads in `blocksize` bytes | ||
from an input, optionally performs a conversion on the bytes, and then writes | ||
`blocksize` bytes to an output file. | ||
|
||
In typical usage, the performance of `dd` is dominated by the speed at which it | ||
can read or write to the filesystem. For those scenarios it is best to optimize | ||
the blocksize for the performance of the devices being read/written to. Devices | ||
typically have an optimal block size that they work best at, so for maximum | ||
performance `dd` should be using a block size, or multiple of the block size, | ||
that the underlying devices prefer. | ||
|
||
For benchmarking `dd` itself we will use fast special files provided by the | ||
operating system that work out of RAM, `/dev/zero` and `/dev/null`. This reduces | ||
the time taken reading/writing files to a minimum and maximises the percentage | ||
time we spend in the `dd` tool itself, but care still needs to be taken to | ||
understand where we are benchmarking the `dd` tool and where we are just | ||
benchmarking memory performance. | ||
|
||
The main parameter to vary for a `dd` benchmark is the blocksize, but benchmarks | ||
testing the conversions that are supported by `dd` could also be interesting. | ||
|
||
`dd` has a convenient `count` argument, that will copy `count` blocks of data | ||
from the input to the output, which is useful for benchmarking. | ||
|
||
## Blocksize Benchmarks | ||
|
||
When measuring the impact of blocksize on the throughput, we want to avoid | ||
testing the startup time of `dd`. `dd` itself will give a report on the | ||
throughput speed once complete, but it's probably better to use an external | ||
tool, such as `hyperfine` to measure the performance. | ||
|
||
Benchmarks should be sized so that they run for a handful of seconds at a | ||
minimum to avoid measuring the startup time unnecessarily. The total time will | ||
be roughly equivalent to the total bytes copied (`blocksize` x `count`). | ||
|
||
Some useful invocations for testing would be the following: | ||
|
||
``` | ||
hyperfine "./target/release/dd bs=4k count=1000000 < /dev/zero > /dev/null" | ||
hyperfine "./target/release/dd bs=1M count=20000 < /dev/zero > /dev/null" | ||
hyperfine "./target/release/dd bs=1G count=10 < /dev/zero > /dev/null" | ||
``` | ||
|
||
Choosing what to benchmark depends greatly on what you want to measure. | ||
Typically you would choose a small blocksize for measuring the performance of | ||
`dd`, as that would maximize the overhead introduced by the `dd` tool. `dd` | ||
typically does some set amount of work per block which only depends on the size | ||
of the block if conversions are used. | ||
|
||
As an example, https://github.com/uutils/coreutils/pull/3600 made a change to | ||
reuse the same buffer between block copies, avoiding the need to reallocate a | ||
new block of memory for each copy. The impact of that change mostly had an | ||
impact on large block size copies because those are the circumstances where the | ||
memory performance dominated the total performance. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters