diff --git a/README.md b/README.md index b219802c3..4576eea8c 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,7 @@ SPDX-License-Identifier: Apache-2.0 --> # BenchExec + ## A Framework for Reliable Benchmarking and Resource Measurement [![Build Status](https://gitlab.com/sosy-lab/software/benchexec/badges/main/pipeline.svg)](https://gitlab.com/sosy-lab/software/benchexec/pipelines) @@ -15,58 +16,71 @@ SPDX-License-Identifier: Apache-2.0 [![PyPI version](https://img.shields.io/pypi/v/BenchExec.svg)](https://pypi.python.org/pypi/BenchExec) [![DOI](https://zenodo.org/badge/30758422.svg)](https://zenodo.org/badge/latestdoi/30758422) - **News and Updates**: -- BenchExec is part of [Google Summer of Code](https://summerofcode.withgoogle.com/) again! If you are interested in being paid by Google for contributing to BenchExec, check our [project ideas and instructions](https://www.sosy-lab.org/gsoc/) and [contact us](https://github.com/sosy-lab/benchexec/discussions)! + +- BenchExec is part of [Google Summer of Code](https://summerofcode.withgoogle.com/) again! + If you are interested in being paid by Google for contributing to BenchExec, check our [project ideas and instructions](https://www.sosy-lab.org/gsoc/) and [contact us](https://github.com/sosy-lab/benchexec/discussions)! - BenchExec 3.18 brings support for systems with cgroups v2! -- Linux kernel 5.11 finally [makes it possible](https://github.com/sosy-lab/benchexec/blob/main/doc/INSTALL.md#kernel-requirements) to use all BenchExec features on distributions other than Ubuntu! -- We now provide an [Ubuntu PPA](https://launchpad.net/~sosy-lab/+archive/ubuntu/benchmarking) that makes installing and upgrading BenchExec easier ([docs](https://github.com/sosy-lab/benchexec/blob/main/doc/INSTALL.md#debianubuntu)). +- Linux kernel 5.11 finally [makes it possible](doc/INSTALL.md#kernel-requirements) to use all BenchExec features on distributions other than Ubuntu! +- We provide an [Ubuntu PPA](https://launchpad.net/~sosy-lab/+archive/ubuntu/benchmarking) that makes installing and upgrading BenchExec easier ([docs](doc/INSTALL.md#debianubuntu)). - An extended version of our paper on BenchExec and its background was published as open access in the journal STTT, you can read [Reliable Benchmarking: Requirements and Solutions](https://doi.org/10.1007/s10009-017-0469-y) online. We also provide a set of [overview slides](https://www.sosy-lab.org/research/prs/Latest_ReliableBenchmarking.pdf). +## Features + BenchExec provides three major features: -- execution of arbitrary commands with precise and reliable measurement - and limitation of resource usage (e.g., CPU time and memory), - and isolation against other running processes -- an easy way to define benchmarks with specific tool configurations - and resource limits, - and automatically executing them on large sets of input files +- execution of arbitrary commands with precise and reliable measurement and limitation of resource usage (e.g., CPU time and memory), and isolation against other running processes +- an easy way to define benchmarks with specific tool configurations and resource limits, and automatically executing them on large sets of input files - generation of interactive tables and plots for the results - -Unlike other benchmarking frameworks, -BenchExec is able to reliably measure and limit resource usage -of the benchmarked tool even if the latter spawns subprocesses. -In order to achieve this, -it uses the [cgroups feature](https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt) -of the Linux kernel to correctly handle groups of processes. -For proper isolation of the benchmarks, it uses (if available) -Linux [user namespaces](http://man7.org/linux/man-pages/man7/namespaces.7.html) -and an [overlay filesystem](https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt) -to create a [container](https://github.com/sosy-lab/benchexec/blob/main/doc/container.md) -that restricts interference of the executed tool with the benchmarking host. +Unlike other benchmarking frameworks, BenchExec is able to reliably measure and limit resource usage of the benchmarked tool even if the latter spawns subprocesses. +In order to achieve this, it uses the [cgroups feature](https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt) of the Linux kernel to correctly handle groups of processes. +For proper isolation of the benchmarks, it uses (if available) Linux [user namespaces](http://man7.org/linux/man-pages/man7/namespaces.7.html) and an [overlay filesystem](https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt) to create a [container](doc/container.md) that restricts interference of the executed tool with the benchmarking host. BenchExec is intended for benchmarking non-interactive tools on Linux systems. -It measures CPU time, wall time, and memory usage of a tool, -and allows to specify limits for these resources. -It also allows to limit the CPU cores and (on NUMA systems) memory regions, -and the container mode allows to restrict filesystem and network access. -In addition to measuring resource usage, -BenchExec can verify that the result of the tool was as expected, -and extract further statistical data from the output. -Results from multiple runs can be combined into CSV and interactive HTML tables, -of which the latter provide scatter and quantile plots -(have a look at our [demo table](https://sosy-lab.github.io/benchexec/example-table/svcomp-simple-cbmc-cpachecker.table.html)). - -BenchExec works only on Linux and needs a one-time setup of cgroups by the machine's administrator. +It measures CPU time, wall time, and memory usage of a tool, and allows to specify limits for these resources. +It also allows to limit the CPU cores and (on NUMA systems) memory regions, and the container mode allows to restrict filesystem and network access. +In addition to measuring resource usage, BenchExec can verify that the result of the tool was as expected, and extract further statistical data from the output. +Results from multiple runs can be combined into CSV and interactive HTML tables, of which the latter provide scatter and quantile plots (have a look at our [demo table](https://sosy-lab.github.io/benchexec/example-table/svcomp-simple-cbmc-cpachecker.table.html)). + +## Usage + +BenchExec is more than just the tool `benchexec`: +For benchmarking, BenchExec offers two command line utilities to execute benchmarks, `benchexec` and `runexec`, as well as a Python API, which are described in the following.. + +Before benchmarking, please consider the [general guidelines](doc/benchmarking.md) for benchmarking. + +### runexec: Benchmark a Single Command + +The tool `runexec` allows to run a single command with the isolation and resource limitation features provided by benchexec. +It can be viewed as a (much more powerful) replacement to a combination of utilities like `taskset`, `time`, and `timeout`. +For example, +``` + runexec --cores 0 --timelimit 5 --memlimit 1GB -- echo Test +``` +executes the command `echo Test`, restricted to a single core, 1 GB of memory, and 5 seconds of CPU-time, and displays precise measurements related to time, memory, I/O, etc. +Use `runexec` to reliably measure a single to a handful of executions. +See [`runexec`'s documentation](doc/runexec.md) for further usage instructions. + +### benchexec: Run a Benchmark Suite + +The tool `benchexec` is an eleborate wrapper around the base functionality of `runexec`. +This tool provides management for defining and executing a large number individual invocations as well as checking and aggregating their results. +We recommend using `benchexec` when executing a concrete benchmark suite, especially when multiple different tools are involved. +See [`benchexec`'s documentation](doc/benchexec.md) for further details, e.g., how to define benchmark sets, run `benchexec`, view results, or integrate tools. + +### API + +Benchexec also provides a [python API](doc/python-api.md) from which all features of `runexec` can be accessed directly. + +### Setup & Requirements + +BenchExec only works on Linux and needs a one-time setup of cgroups by the machine's administrator. The actual benchmarking can be done by any user and does not need root access. +See the [installation guide](doc/INSTALL.md) for detailed setup steps and troubleshooting. -BenchExec was originally developed for use with the software verification framework -[CPAchecker](https://cpachecker.sosy-lab.org) -and is now developed as an independent project -at the [Software Systems Lab](https://www.sosy-lab.org) -of the [Ludwig-Maximilians-Universität München (LMU Munich)](https://www.uni-muenchen.de). +## Useful Resources ### Links @@ -87,17 +101,19 @@ of the [Ludwig-Maximilians-Universität München (LMU Munich)](https://www.uni-m ### License and Copyright -BenchExec is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0), -copyright [Dirk Beyer](https://www.sosy-lab.org/people/beyer/). -Exceptions are some tool-info modules -and third-party code that is bundled in the HTML tables, -which are available under several other free licenses -(cf. [folder `LICENSES`](https://github.com/sosy-lab/benchexec/tree/main/LICENSES)). +BenchExec is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0), copyright [Dirk Beyer](https://www.sosy-lab.org/people/beyer/). +Exceptions are some tool-info modules and third-party code that is bundled in the HTML tables, which are available under several other free licenses (cf. [folder `LICENSES`](https://github.com/sosy-lab/benchexec/tree/main/LICENSES)). + +## Developers + +BenchExec was originally developed for use with the software verification framework [CPAchecker](https://cpachecker.sosy-lab.org) and is now developed as an independent project at the [Software Systems Lab](https://www.sosy-lab.org) of the [Ludwig-Maximilians-Universität München (LMU Munich)](https://www.uni-muenchen.de). + +### Maintainer + +[Philipp Wendler](https://www.philippwendler.de) -### Authors -Maintainer: [Philipp Wendler](https://www.philippwendler.de) +### Contributors -Contributors: - [Aditya Arora](https://github.com/alohamora) - [Levente Bajczi](https://github.com/leventeBajczi) - [Dirk Beyer](https://www.sosy-lab.org/people/beyer/) @@ -126,14 +142,10 @@ Contributors: - [Ilja Zakharov](https://github.com/IljaZakharov) - and [lots of more people who integrated tools into BenchExec](https://github.com/sosy-lab/benchexec/graphs/contributors) -### Users of BenchExec +## Users of BenchExec -BenchExec was successfully used for benchmarking in all instances -of the international competitions on [Software Verification](https://sv-comp.sosy-lab.org) -and [Software Testing](https://test-comp.sosy-lab.org) -with a wide variety of benchmarked tools and hundreds of thousands benchmark runs. -It is integrated into the cluster-based logic-solving service -[StarExec](https://www.starexec.org/starexec/public/about.jsp) ([GitHub](https://github.com/StarExec/StarExec)). +BenchExec was successfully used for benchmarking in all instances of the international competitions on [Software Verification](https://sv-comp.sosy-lab.org) and [Software Testing](https://test-comp.sosy-lab.org) with a wide variety of benchmarked tools and hundreds of thousands benchmark runs. +It is integrated into the cluster-based logic-solving service [StarExec](https://www.starexec.org/starexec/public/about.jsp) ([GitHub](https://github.com/StarExec/StarExec)). The developers of the following tools use BenchExec: diff --git a/doc/INDEX.md b/doc/INDEX.md index 175a5486e..e7b490da2 100644 --- a/doc/INDEX.md +++ b/doc/INDEX.md @@ -72,11 +72,9 @@ BenchExec always uses the SI standard units: which consists of the options for a tool configuration and will be combined with a task to define a run. -- **task**: A combination of a set of input files, a property file, and an expected verdict - that defines a problem for a tool to solve. +- **task**: A complete definition of a problem for a tool to solve, e.g. through one or more input files, with an optional expected result. A task corresponds to exactly one row in the result tables. Depending on what the tool supports, the set of input files can be empty. - Properties and expected verdicts are optional. - **task definition**: A file in [this format](https://gitlab.com/sosy-lab/benchmarking/task-definition-format) that describes a set of tasks diff --git a/doc/benchexec.md b/doc/benchexec.md index 7e8939cf9..0f2ae4184 100644 --- a/doc/benchexec.md +++ b/doc/benchexec.md @@ -8,47 +8,141 @@ SPDX-License-Identifier: Apache-2.0 --> # BenchExec: benchexec + ## Benchmarking a Collection of Runs -The program `benchexec` provides the possibility to easily benchmark -multiple executions of a tool in one go. +The program `benchexec` provides the possibility to easily benchmark thousands of executions of a tool in one go. +This document lays out the three major components of this process + + 1. Defining a set of runs that should be benchmarked + 2. Running `benchexec` to measure all these runs + 3. Gathering and displaying the results ### Input for benchexec -`benchexec` uses as input an XML file that defines the command(s) to execute, -the resource limits, and the tasks for which the command should be run. -A complete definition of the input format can be found in the file -[doc/benchmark.xml](benchmark.xml), -and examples in [doc/benchmark-example-rand.xml](benchmark-example-rand.xml), -[doc/benchmark-example-calculatepi.xml](benchmark-example-calculatepi.xml), -and [doc/benchmark-example-cbmc.xml](benchmark-example-cbmc.xml). -The document type of these files should be +Effectively, a benchmark comprises a set of runs, where a run means to "execute tool `X` with configuration `Y` on task `Z`". +In order to concisely specify a benchmark comprising a large set of runs, BenchExec provides a custom specification language in the form of XML files, which are described below. + +In a nutshell these files reference a particular tool and define one or more different configurations of the tool together with a set of inputs for which the tool (in each of the defined configurations) should be executed. +Every combination of tool configuration and task is then mapped to an actual invocation by a tool-specific module, called *tool-info module* (see [Tool Integration](tool-integration.md)). + +In most of the following specifications, BenchExec allows the use of variables, which are replaced by their corresponding values. +All possible variables and their values are explained further below. + +#### Benchmark Files + +The top level definition of a benchmark is an XML file that defines the command(s) to execute, the resource limits, and the inputs for which the command should be run. +This benchmark-definition file consist of a root tag ``, that defines the tool to use and the resource limits of the benchmark. +For example +``` + +... + +``` +declares that this benchmark should execute the tool `cpachecker` with a timelimit of 4 seconds. +(To reference your own tool, you need to add a [tool-info module](tool-integration.md)), which is then associated with such a tool name.) + +Within the benchmark tag, different configurations of the tool as well as different sets of inputs can be defined, described in the following. +A complete definition of the input format can be found in the file [doc/benchmark.xml](benchmark.xml), and examples in + [doc/benchmark-example-rand.xml](benchmark-example-rand.xml), + [doc/benchmark-example-calculatepi.xml](benchmark-example-calculatepi.xml), and + [doc/benchmark-example-cbmc.xml](benchmark-example-cbmc.xml). +The document type of these files should be ```XML ``` +A document-type definition with a formal specification of input files can be found in [doc/benchmark.dtd](benchmark.dtd). + +#### Tool Configuration + +Fixed command-line options for the tool can be specified with ` + precise + + ... + +``` +would define two variants of the tool `my_tool`, `default` and `precise`. +The former would execute the tool with `--strict --mode default`, the latter with `--strict --mode precise`. +Both of these will be executed on each of the defined inputs. +Note that you need to use a separate `