This file describes benchmarking framework of the library.
The benchmarking framework is provided via the milo
command line application.
Application is not build by default, although integration installation guide shell snippets
explicitly pass -DMILO_APP=ON
option which enables application compilation.
After successful compilation, the application will be placed in the app
directory:
cd milo-build/app
To verify that everything works, list all possible benchmarks with the command:
./milo benchmark primitive list
Possible output:
codec-base-16-encode
codec-base-16-decode
codec-base-64-encode
codec-base-64-decode
hash-sha-1-160
hash-sha-2-224
hash-sha-2-256
hash-sha-2-384
hash-sha-2-512
hash-sha-2-512-224
hash-sha-2-512-256
...
How to invoke benchmarks will be described later.
Assuming that the algorithm accepts a specific set of arguments, execution time is measured with such parameterization.
For example hash algorithms accept --message-size=<int>
parameter,
mac algorithms accept --key-size=<int>
and --message-size=<int>
and so on.
Most benchmarks react to a common set of basic parameters that control behaviors such as:
- inputs sizes
- timed iterations
- warmup iterations
Parametrization is described later in the document.
To measure execution time std::chrono::steady_clock
is used.
The reasoning behind using it instead of rdtsc
or rdpmc
are few:
- It's portable.
- The logic being measured is complex, mentioned clock sources are better suited for micro-benchmarking.
- Calculations are heavily data-dependent, and high-quality samples come from wider time frames.
- The increase in precision in the range of several dozen cycles will not be noticeable, even noised by one context switch.
Benchmarking with other clock source backends may be implemented in the future.
Metrics collected by benchmarks:
duration
nanoseconds_per_call
- average floating point number
throughput
gigabytes_per_second
- average floating point numbermegabytes_per_second
- average floating point number
cpu
cycles_per_call
- average floating point numbercycles_per_byte
- average floating point number
The names and how to interpret them should be pretty self-explanatory .
Before benchmarking, execution environment should be tuned to reduce potential bias.
- Disable frequency scaling.
sudo sh -c 'echo 0 > /sys/devices/system/cpu/cpufreq/boost'
- Disable Simultaneous Multithreading.
- Disable Turbo Boost.
sudo sh -c 'echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo'
- Disable Hyper-Threading.
- Disable Virtualization.
- Disable P-States and C-States.
- Limit number of running services.
The application parameterization tree describes what parameters or options are accepted by the application and subcommands. The command line argument list must respect the tree structure. Parameters and options specified too early or too late will be ignored by the application. Unrecognized options and parameters are ignored. When parameterizing a subcommand, the order of parameters and options is irrelevant.
Parameterization tree:
milo
- application, root of treeoptions
--advanced
- advanced features switch, e.g. algorithm backend benchmarks--verbose
- verbose output
subcommands
benchmark
parameters
--repeats-time=<int>
- measured repeats--repeats-warm=<int>
- unmeasured repeats, can be used for caches warmup, branch predictor training--cpu-clock=<int>
- clock speed inmhz
units, must be set to estimate cycles per call/byte
subcommands
primitive
subcommands
list
- list all possible benchmarks<expression>
- pattern expression that matches one or more items returned by the list subcommandcodec-*
parameters
--bytes-size=<int>
- bytes size
hash-*
parameters
--message-size=<int>
- message size
mac-*
parameters
--key-size=<int>
- key size--message-size=<int>
- message size
kdf-hkdf-*
parameters
--ikm-size=<int>
- ikm size--salt-size=<int>
- salt size--info-size=<int>
- info size--key-size=<int>
- key size
kdf-pbkdf-2-*
parameters
--ikm-size=<int>
- ikm size--salt-size=<int>
- salt size--iterations=<int>
- iterations--key-size=<int>
- key size
cipher-*
parameters
--bytes-size=<int>
- bytes size
aead-*
parameters
--aad-size=<int>
- aad size--bytes-size=<int>
- bytes size
When any parameter that does not have a default fallback value is omitted, an error occurs.
Missing --message-size=<int>
parameter with no default value:
milo benchmark primitive hash-*
Output:
Error. Missing --message-size parameter of hash-* command.
The following examples should give a pretty good understanding of how to compose command line arguments to benchmark algorithms in the library.
Environment:
gcc version 12.2.1
nasm version 2.16.01
Linux 5.19.11-zen1-1-zen
Ryzen Threadripper PRO 3975WX
hash-sha-1-160
./milo benchmark --cpu-clock=3500 primitive "hash-sha-1-160" --message-size=16384
{
"benchmark": {
"primitive": {
"hash-sha-1-160": {
"metrics": {
"cpu": {
"cycles_per_byte": 1.904761,
"cycles_per_call": 31207.603027
},
"throughput": {
"megabytes_per_second": 1837.500943,
"gigabytes_per_second": 1.837501
},
"duration": {
"nanoseconds_per_call": 8916.458008
}
},
"config": {
"input": {
"message-size": 16384
},
"benchmark": {
"cpu-clock": 3500000000,
"repeats-time": 1024,
"repeats-warm": 128
}
}
}
}
}
}
hash-sha-2-256
./milo benchmark --cpu-clock=3500 primitive "hash-sha-2-256" --message-size=16384
{
"benchmark": {
"primitive": {
"hash-sha-2-256": {
"metrics": {
"cpu": {
"cycles_per_byte": 2.033197,
"cycles_per_call": 33311.892578
},
"throughput": {
"megabytes_per_second": 1721.427261,
"gigabytes_per_second": 1.721427
},
"duration": {
"nanoseconds_per_call": 9517.683594
}
},
"config": {
"input": {
"message-size": 16384
},
"benchmark": {
"cpu-clock": 3500000000,
"repeats-time": 1024,
"repeats-warm": 128
}
}
}
}
}
}
mac-poly-1305
./milo benchmark --cpu-clock=3500 primitive "mac-poly-1305" --key-size=32 --message-size=16384
{
"benchmark": {
"primitive": {
"mac-poly-1305": {
"metrics": {
"cpu": {
"cycles_per_byte": 2.422251,
"cycles_per_call": 39686.158203
},
"throughput": {
"megabytes_per_second": 1444.937041,
"gigabytes_per_second": 1.444937
},
"duration": {
"nanoseconds_per_call": 11338.902344
}
},
"config": {
"input": {
"message-size": 16384,
"key-size": 32
},
"benchmark": {
"cpu-clock": 3500000000,
"repeats-time": 1024,
"repeats-warm": 128
}
}
}
}
}
}
cipher-chacha-20
./milo benchmark --cpu-clock=3500 primitive "cipher-chacha-20-*" --aad-size=1024 --bytes-size=16384
{
"benchmark": {
"primitive": {
"cipher-chacha-20-decrypt": {
"metrics": {
"cpu": {
"cycles_per_byte": 2.064460,
"cycles_per_call": 33824.112793
},
"throughput": {
"megabytes_per_second": 1695.358585,
"gigabytes_per_second": 1.695359
},
"duration": {
"nanoseconds_per_call": 9664.032227
}
},
"config": {
"input": {
"bytes-size": 16384
},
"benchmark": {
"cpu-clock": 3500000000,
"repeats-time": 1024,
"repeats-warm": 128
}
}
},
"cipher-chacha-20-encrypt": {
"metrics": {
"cpu": {
"cycles_per_byte": 2.067819,
"cycles_per_call": 33879.145508
},
"throughput": {
"megabytes_per_second": 1692.604673,
"gigabytes_per_second": 1.692605
},
"duration": {
"nanoseconds_per_call": 9679.755859
}
},
"config": {
"input": {
"bytes-size": 16384
},
"benchmark": {
"cpu-clock": 3500000000,
"repeats-time": 1024,
"repeats-warm": 128
}
}
}
}
}
}
aead-chacha-20-poly-1305
./milo benchmark --cpu-clock=3500 primitive "aead-chacha-20-poly-1305-*" --aad-size=1024 --bytes-size=8192
{
"benchmark": {
"primitive": {
"aead-chacha-20-poly-1305-decrypt": {
"metrics": {
"cpu": {
"cycles_per_byte": 4.356709,
"cycles_per_call": 40151.429199
},
"throughput": {
"megabytes_per_second": 803.358701,
"gigabytes_per_second": 0.803359
},
"duration": {
"nanoseconds_per_call": 11471.836914
}
},
"config": {
"input": {
"bytes-size": 8192,
"aad-size": 1024
},
"benchmark": {
"cpu-clock": 3500000000,
"repeats-time": 1024,
"repeats-warm": 128
}
}
},
"aead-chacha-20-poly-1305-encrypt": {
"metrics": {
"cpu": {
"cycles_per_byte": 4.353707,
"cycles_per_call": 40123.760742
},
"throughput": {
"megabytes_per_second": 803.912679,
"gigabytes_per_second": 0.803913
},
"duration": {
"nanoseconds_per_call": 11463.931641
}
},
"config": {
"input": {
"bytes-size": 8192,
"aad-size": 1024
},
"benchmark": {
"cpu-clock": 3500000000,
"repeats-time": 1024,
"repeats-warm": 128
}
}
}
}
}
}