From b41c7dd1534ff0bb90e34ac133fc0f970e9081d3 Mon Sep 17 00:00:00 2001 From: Mark Robert Miller Date: Fri, 13 Dec 2024 09:06:34 -0600 Subject: [PATCH] SOLR-15625 Improve documentation for the benchmark module. (#406) --- solr/benchmark/README.md | 578 ++++++++++++--------- solr/benchmark/docs/jmh-profilers-setup.md | 405 +++++++++++++++ solr/benchmark/docs/jmh-profilers.md | 189 +++++++ 3 files changed, 914 insertions(+), 258 deletions(-) create mode 100644 solr/benchmark/docs/jmh-profilers-setup.md create mode 100644 solr/benchmark/docs/jmh-profilers.md diff --git a/solr/benchmark/README.md b/solr/benchmark/README.md index 7075ef111a7..9b1b8cdf623 100644 --- a/solr/benchmark/README.md +++ b/solr/benchmark/README.md @@ -1,356 +1,418 @@ -JMH-Benchmarks module -===================== +# Solr JMH Benchmark Module -This module contains benchmarks written using [JMH](https://openjdk.java.net/projects/code-tools/jmh/) from OpenJDK. -Writing correct micro-benchmarks in Java (or another JVM language) is difficult and there are many non-obvious -pitfalls (many due to compiler optimizations). JMH is a framework for running and analyzing benchmarks (micro or macro) -written in Java (or another JVM language). +![](https://user-images.githubusercontent.com/448788/140059718-de183e23-414e-4499-883a-34ec3cfbd2b6.png) -* [JMH-Benchmarks module](#jmh-benchmarks-module) - * [Running benchmarks](#running-benchmarks) - * [Using JMH with async profiler](#using-jmh-with-async-profiler) - * [Using JMH GC profiler](#using-jmh-gc-profiler) - * [Using JMH Java Flight Recorder profiler](#using-jmh-java-flight-recorder-profiler) - * [JMH Options](#jmh-options) - * [Writing benchmarks](#writing-benchmarks) - * [SolrCloud MiniCluster Benchmark Setup](#solrcloud-minicluster-benchmark-setup) - * [MiniCluster Metrics](#minicluster-metrics) - * [Benchmark Repeatability](#benchmark-repeatability) +**_`profile, compare and introspect`_** -## Running benchmarks +**A flexible, developer-friendly, microbenchmark framework** -If you want to set specific JMH flags or only run certain benchmarks, passing arguments via gradle tasks is cumbersome. -The process has been simplified by the provided `jmh.sh` script. +![](https://img.shields.io/badge/developer-tool-blue) -The default behavior is to run all benchmarks: +## Table Of Content -`./jmh.sh` +- [](#) + - [Table Of Content](#table-of-content) + - [Overview](#overview) + - [Getting Started](#getting-started) + - [Running `jmh.sh` with no Arguments](#running-jmhsh-with-no-arguments) + - [Pass a regex pattern or name after the command to select the benchmark(s) to run](#pass-a-regex-pattern-or-name-after-the-command-to-select-the-benchmarks-to-run) + - [The argument `-l` will list all the available benchmarks](#the-argument--l-will-list-all-the-available-benchmarks) + - [Check which benchmarks will run by entering a pattern after the -l argument](#check-which-benchmarks-will-run-by-entering-a-pattern-after-the--l-argument) + - [Further Pattern Examples](#further-pattern-examples) + - [`jmh.sh` accepts all the standard arguments that the standard JMH main-class handles](#jmhsh-accepts-all-the-standard-arguments-that-the-standard-jmh-main-class-handles) + - [Overriding Benchmark Parameters](#overriding-benchmark-parameters) + - [Format and Write Results to Files](#format-and-write-results-to-files) + - [JMH Command-Line Arguments](#jmh-command-line-arguments) + - [The JMH Command-Line Syntax](#the-jmh-command-line-syntax) + - [The Full List of JMH Arguments](#the-full-list-of-jmh-arguments) + - [Writing JMH benchmarks](#writing-jmh-benchmarks) + - [Continued Documentation](#continued-documentation) -Pass a pattern or name after the command to select the benchmarks: +--- -`./jmh.sh CloudIndexing` +## Overview -Check which benchmarks match the provided pattern: +JMH is a Java **microbenchmark** framework from some of the developers that work on +OpenJDK. Not surprisingly, OpenJDK is where you will find JMH's home today, alongside some +other useful little Java libraries such as JOL (Java Object Layout). -`./jmh.sh -l CloudIndexing` +The significant value in JMH is that you get to stand on the shoulders of some brilliant +engineers that have done some tricky groundwork that many an ambitious Java benchmark writer +has merrily wandered past. -Run a specific test and overrides the number of forks, iterations and sets warm-up iterations to `2`: +Rather than simply providing a boilerplate framework for driving iterations and measuring +elapsed times, which JMH does happily do, the focus is on the many forces that +deceive and disorient the earnest benchmark enthusiast. -`./jmh.sh -f 2 -i 2 -wi 2 CloudIndexing` +From spinning your benchmark into all new generated source code +in an attempt to avoid falling victim to undesirable optimizations, to offering +**BlackHoles** and a solid collection of convention and cleverly thought out yet +simple boilerplate, the goal of JMH is to lift the developer off the +microbenchmark floor and at least to their knees. -Run a specific test with async and GC profilers on Linux and flame graph output: +JMH reaches out a hand to both the best and most regular among us in a solid, cautious +effort to promote the willing into the real, often-obscured world of the microbenchmark. -`./jmh.sh -prof gc -prof async:libPath=/path/to/libasyncProfiler.so\;output=flamegraph\;dir=profile-results CloudIndexing` +## Code Organization Breakdown -### Using JMH with async profiler +![](https://img.shields.io/badge/data-...move-blue) -It's good practice to check profiler output for micro-benchmarks in order to verify that they represent the expected -application behavior and measure what you expect to measure. Some example pitfalls include the use of expensive mocks or -accidental inclusion of test setup code in the benchmarked code. JMH includes -[async-profiler](https://github.com/jvm-profiling-tools/async-profiler) integration that makes this easy: +- **JMH:** microbenchmark classes and some common base code to support them. -`./jmh.sh -prof async:libPath=/path/to/libasyncProfiler.so\;dir=profile-results` +- **Random Data:** a framework for easily generating specific and repeatable random data. + +## Getting Started + +Running **JMH** is handled via the `jmh.sh` shell script. This script uses Gradle to +extract the correct classpath and configures a handful of helpful Java +command prompt arguments and system properties. For the most part, `jmh.sh` script +will pass any arguments it receives directly to JMH. You run the script +from the root benchmark module directory (i.e. `solr/benchmark`). + +### Running `jmh.sh` with no Arguments -With flame graph output: - -`./jmh.sh -prof async:libPath=/path/to/libasyncProfiler.so\;output=flamegraph\;dir=profile-results` - -Simultaneous cpu, allocation and lock profiling with async profiler 2.0 and jfr output: - -`./jmh.sh -prof async:libPath=/path/to/libasyncProfiler.so\;output=jfr\;alloc\;lock\;dir=profile-results CloudIndexing` - -A number of arguments can be passed to configure async profiler, run the following for a description: - -`./jmh.sh -prof async:help` - -You can also skip specifying libPath if you place the async profiler lib in a predefined location, such as one of the -locations in the env variable `LD_LIBRARY_PATH` if it has been set (many Linux distributions set this env variable, Arch -by default does not), or `/usr/lib` should work. - -#### OS Permissions for Async Profiler - -Async Profiler uses perf to profile native code in addition to Java code. It will need the following for the necessary -access. - -```bash -echo 0 > /proc/sys/kernel/kptr_restrict -echo 1 > /proc/sys/kernel/perf_event_paranoid -``` - -or - -```bash -sudo sysctl -w kernel.kptr_restrict=0 -sudo sysctl -w kernel.perf_event_paranoid=1 -``` - -### Using JMH GC profiler - -You can run a benchmark with `-prof gc` to measure its allocation rate: - -`./jmh.sh -prof gc:dir=profile-results` - -Of particular importance is the `norm` alloc rates, which measure the allocations per operation rather than allocations -per second. - -### Using JMH Java Flight Recorder profiler - -JMH comes with a variety of built-in profilers. Here is an example of using JFR: - -`./jmh.sh -prof jfr:dir=profile-results\;configName=jfr-profile.jfc` - -In this example we point to the included configuration file with configName, but you could also do something like -settings=default or settings=profile. - -### Benchmark Outputs - -By default, output that benchmarks generate is created in the build/work directory. You can change this location by setting the workBaseDir system property like this: - - -jvmArgsAppend -DworkBaseDir=/data3/bench_work - -If a profiler generates output, it will generally be written to the current working directory - that is the benchmark module directory itself. You can usually change this via the dir option, for example: - - ./jmh.sh -prof jfr:dir=build/work/profile-results JsonFaceting - -### Using a Separate MiniCluster Base Directory - -If you have a special case MiniCluster you have generated, such as one you have prepared with very large indexes for a search benchmark run, you can change the base directory used by the profiler -for the MiniCluster with the miniClusterBaseDir system property. This is for search based benchmarks in general and the MiniCluster wil not be removed automatically by the benchmark. - -### JMH Options - -Some common JMH options are: - -```text +> +> ```zsh +> # run all benchmarks found in subdirectories +> ./jmh.sh +> ``` + +### Pass a regex pattern or name after the command to select the benchmark(s) to run + +> +> ```zsh +> ./jmh.sh BenchmarkClass +> ``` + +### The argument `-l` will list all the available benchmarks + +> +> ```zsh +> ./jmh.sh -l +> ``` + +### Check which benchmarks will run by entering a pattern after the -l argument + +Use the full benchmark class name, the simple class name, the benchmark +method name, or a substring. + +> +> ```zsh +> ./jmh.sh -l Ben +> ``` + +### Further Pattern Examples + +> +> ```shell +>./jmh.sh -l org.apache.solr.benchmark.search.BenchmarkClass +>./jmh.sh -l BenchmarkClass +>./jmh.sh -l BenchmarkClass.benchmethod +>./jmh.sh -l Bench +>./jmh.sh -l benchme + +### The JMH Script Accepts _ALL_ of the Standard JMH Arguments + +Here we tell JMH to run the trial iterations twice, forking a new JVM for each +trial. We also explicitly set the number of warmup iterations and the +measured iterations to 2. + +> +> ```zsh +> ./jmh.sh -f 2 -wi 2 -i 2 BenchmarkClass +> ``` + +### Overriding Benchmark Parameters + +> ![](https://img.shields.io/badge/overridable-params-blue) +> +> ```java +> @Param("1000") +> private int numDocs; +> ``` + +The state objects that can be specified in benchmark classes will often have a +number of input parameters that benchmark method calls will access. The notation +above will default numDocs to 1000 and also allow you to override that value +using the `-p` argument. A benchmark might also use a @Param annotation such as: + +> ![](https://img.shields.io/badge/sequenced-params-blue) +> +> ```java +> @Param("1000","5000","1000") +> private int numDocs; +> ``` + +By default, that would cause the benchmark +to be run enough times to use each of the specified values. If multiple input +parameters are specified this way, the number of runs needed will quickly +expand. You can pass multiple `-p` +arguments and each will completely replace the behavior of any default +annotation values. + +> +> ```zsh +> # use 2000 docs instead of 1000 +> ./jmh.sh BenchmarkClass -p numDocs=2000 +> +> +> # use 5 docs, then 50, then 500 +> ./jmh.sh BenchmarkClass -p numDocs=5,50,500 +> +> +> # run the benchmark enough times to satisfy every combination of two +> # multi-valued input parameters +> ./jmh.sh BenchmarkClass -p numDocs=10,20,30 -p docSize 250,500 +> ``` + +### Format and Write Results to Files + +Rather than just dumping benchmark results to the console, you can specify the +`-rf` argument to control the output format; for example, you can choose CSV or +JSON. The `-rff` argument will dictate the filename and output location. + +> +> ```zsh +> # format output to JSON and write the file to the `work` directory relative to +> # the JMH working directory. +> ./jmh.sh BenchmarkClass -rf json -rff work/jmh-results.json +> ``` +> +> 💡 **If you pass only the `-rf` argument, JMH will write out a file to the +> current working directory with the appropriate extension, e.g.,** `jmh-results.csv`. + +## JMH Command-Line Arguments + +### The JMH Command-Line Syntax + +> ![](https://img.shields.io/badge/Help-output-blue) +> +> ```zsh +> Usage: ./jmh.sh [regexp*] [options] +> [opt] means optional argument. +> means required argument. +> "+" means comma-separated list of values. +> "time" arguments accept time suffixes, like "100ms". +> +> Command-line options usually take precedence over annotations. +> ``` + +### The Full List of JMH Arguments + +```zsh Usage: ./jmh.sh [regexp*] [options] [opt] means optional argument. means required argument. - "+" means comma-separated list of values. + "+" means a comma-separated list of values. "time" arguments accept time suffixes, like "100ms". -Command line options usually take precedence over annotations. +Command-line options usually take precedence over annotations. [arguments] Benchmarks to run (regexp+). (default: .*) - -bm Benchmark mode. Available modes are: [Throughput/thrpt, - AverageTime/avgt, SampleTime/sample, SingleShotTime/ss, + -bm Benchmark mode. Available modes are: + [Throughput/thrpt, AverageTime/avgt, + SampleTime/sample, SingleShotTime/ss, All/all]. (default: Throughput) -bs Batch size: number of benchmark method calls per operation. Some benchmark modes may ignore this - setting, please check this separately. (default: - 1) + setting; please check this separately. + (default: 1) -e Benchmarks to exclude from the run. - -f How many times to fork a single benchmark. Use 0 to - disable forking altogether. Warning: disabling - forking may have detrimental impact on benchmark - and infrastructure reliability, you might want - to use different warmup mode instead. (default: - 5) - - -foe Should JMH fail immediately if any benchmark had - experienced an unrecoverable error? This helps - to make quick sanity tests for benchmark suites, - as well as make the automated runs with checking error + -f How many times to fork a single benchmark. Use 0 + to disable forking altogether. Warning: + disabling forking may have a detrimental impact on + benchmark and infrastructure reliability. You might + want to use a different warmup mode instead. (default: 1) + + -foe Should JMH fail immediately if any benchmark has + experienced an unrecoverable error? Failing fast + helps to make quick sanity tests for benchmark + suites and allows automated runs to do error + checking. codes. (default: false) -gc Should JMH force GC between iterations? Forcing - the GC may help to lower the noise in GC-heavy benchmarks, - at the expense of jeopardizing GC ergonomics decisions. + GC may help lower the noise in GC-heavy benchmarks + at the expense of jeopardizing GC ergonomics + decisions. Use with care. (default: false) - -h Display help, and exit. + -h Displays this help output and exits. - -i Number of measurement iterations to do. Measurement - iterations are counted towards the benchmark score. - (default: 1 for SingleShotTime, and 5 for all other - modes) + -i Number of measurement iterations to do. + Measurement + iterations are counted towards the benchmark + score. + (default: 1 for SingleShotTime, and 5 for all + other modes) - -jvm Use given JVM for runs. This option only affects forked - runs. + -jvm Use given JVM for runs. This option only affects + forked runs. - -jvmArgs Use given JVM arguments. Most options are inherited - from the host VM options, but in some cases you want - to pass the options only to a forked VM. Either single - space-separated option line, or multiple options - are accepted. This option only affects forked runs. + -jvmArgs Use given JVM arguments. Most options are + inherited from the host VM options, but in some + cases, you want to pass the options only to a forked + VM. Either single space-separated option line or + multiple options are accepted. This option only + affects forked runs. - -jvmArgsAppend Same as jvmArgs, but append these options after the - already given JVM args. + -jvmArgsAppend Same as jvmArgs, but append these options after + the already given JVM args. -jvmArgsPrepend Same as jvmArgs, but prepend these options before the already given JVM arg. - -l List the benchmarks that match a filter, and exit. + -l List the benchmarks that match a filter and exit. - -lp List the benchmarks that match a filter, along with + -lp List the benchmarks that match a filter, along + with parameters, and exit. - -lprof List profilers, and exit. + -lprof List profilers and exit. - -lrf List machine-readable result formats, and exit. + -lrf List machine-readable result formats and exit. -o Redirect human-readable output to a given file. - -opi Override operations per invocation, see @OperationsPerInvocation - Javadoc for details. (default: 1) + -opi Override operations per invocation, see + @OperationsPerInvocation Javadoc for details. + (default: 1) - -p Benchmark parameters. This option is expected to - be used once per parameter. Parameter name and parameter - values should be separated with equals sign. Parameter - values should be separated with commas. + -p Benchmark parameters. This option is expected to + be used once per parameter. The parameter name and + parameter values should be separated with an + equal sign. Parameter values should be separated + with commas. - -prof Use profilers to collect additional benchmark data. - Some profilers are not available on all JVMs and/or - all OSes. Please see the list of available profilers - with -lprof. + -prof Use profilers to collect additional benchmark + data. + Some profilers are not available on all JVMs or + all OSes. '-lprof' will list the available + profilers that are available and that can run + with the current OS configuration and installed dependencies. - -r