Skip to content
Ryan Newton edited this page Feb 14, 2015 · 24 revisions

Once you've figured out how to upload data to the fusion table of your choice, there's a big remaining question: how do you use all those fields in the table?

However, before thinking about what any of these fields mean, you must know two things:

  • HSBencher runs multiple trials per row of data it uploads - thus each row represents a "complete benchmark result" not an individual program execution.
  • HSBencher has way too many builtin fields -- Early versions of hsbencher did not have the ability to extend the schema with custom fields on a per-project basis, so the builtin set became very bloated. By and large these extra columns are harmless and you can hide them in the fusion table view, and not bother downloading them when you fetch data. This bloated set of builtins remains in there as of version 1.20, but the plan is to deprecate them and then cut that set way down in a 2.0 release.

A note on multiple trials per benchmark

Because of the multiple-trials setup, one can reasonably ask of any field "which trial did this correspond to", for which the answer is:

  1. constant: many fields like PROGNAME are constant across trials.
  2. all of them: some fields accumulate data from all trials, we tend to use ALLTIMES for a space separated list of the list of all tmes
  3. the "median run": unless otherwise specified all scalar fields (other than MINTIME, MAXTIME) correspond to the run whose time was MEDIANTIME. Time is implicitly the main measurement and this median run is the

If you have any problems with this (admittedly inflexible) scheme, the quickest fix is just to run with TRIALS=1 and run multiple rounds of complete the benchmark suite. Then do the aggregation of trials yourself later on.

Aside: why did you end up with this take-only-the-median scheme?

You could ask why we don't just always accumulate all data from all trials. Why not either use the TRIALS=1 approach or make all fields accumulating like ALLTIMES?

The short answer is that data storage backends (Fusion Tables, or regular spreadsheets) can make it very easy to explore data that has simple scalar fields. For example, Fusion Tables can give you easy scatter plots of data in your table and you an pass links to these plots around on IM, email, or websites. But as soon as you need to do any smart parsing, aggregation, or processing of the data, then that exceeds the capabilities of these simple visualization systems.

Intended use of core fields

Here are some of the core fields and what they are used for.

First, the independent variables:

  • PROGNAME, ARGS - generally benchmarks with the same name and arguments are comparable in the sense that they compute the same thing / do the same amount of work, albeit possibly in different ways.

  • VARIANT, THREADS, RUNTIME_FLAGS, COMPILER, COMPILE_FLAGS, ENV_VARS - these describe exactly which implementation strategy was used to run the benchmark. Generally looking at different settings for these is what you want to plot when analyzing a given benchmark.

  • TRIALS - how many times each benchmark is run (for statistical rigor).

Second, dependent variables:

These represent what was measured from the benchmark run, across one or more trials:

  • MINTIME, MEDIANTIME, MAXTIME - These are the main outcome of most benchmarking runs. How long did it take? Whether this number represents real time measurement, CPU time measurement, or linear regression (e.g. with Criterion) is up to you.

  • MEDIANTIME_PRODUCTIVITY - this is fraction of time (0.0-1.0) spent in the mutator as opposed to in garbage collection. It is gathered automatically by an HSBencher harness for Cabal or GHC-based benchmark runs. This particular value corresponds to the MEDIANTIME RUN. The name is thus redundant and could just be "PRODUCTIVITY", because the default is for scalar values to correspond

  • MEDIANTIME_ALLOCRATE, MEDIANTIME_MEMFOOTPRINT - these are already deprecated, and store the allocation rate and the high-watermark memory footprint, in bytes/sec and bytes.

  • ALLJITTIMES - deprecated: this stores JIT compilation times, if any across all trials.

  • RETRIES - this measures how many times the benchmark had to be rerun if it crashed or failed to run successfully.

Third, fields carrying metadata about the benchmark environment

  • HOSTNAME - what machine it ran on. Note: usually its best to make this the same name irrespective of which node in a supercomputer something runs on. Also, override the default choice so you get something reasonable rather than node names like "n001" which may actually be non-unique between different supercomputers.

  • RUNID - a unique ID generated by the HSBencher harness using a combination of hostname and time since the epoch.

  • CI_BUILD_ID - a place to put the build ID used by the continuous integration system (e.g. Jenkins) which launched the job in the first place. This is useful for tracking down the build logs that correspond to a given benchmark result.

  • GIT_DEPTH, GIT_HASH, GIT_BRANCH - HSBencher will try to call the "git" command on the system to ascertain these qualities of the repository its being run from. (This has not been properly generalized across multiple version control systems. But it's recommended that you simply reuse these fields if you are including analogous information about your alternative version control system.) See below notes on version control tips.

  • BENCH_VERSION - often unused: a place to track which version of the benchmark suite itself was used. This would then provide a notion of which benchmark results are comparable. GIT_DEPTH should provide a conservative approximation to this, which says that every commit to the benchmark suite makes results incomparable with all previous results. Alternatively, projects with completely static code for the benchmarks can declare all results mutually comparable and ignore this.

  • BENCH_FILE - often unused: which file contained the benchmarks? When we had standalone text files rather than executable HSBencher harnesses, we would typically keep files like "benchlist_server.txt" and "benchlist_desktop.txt".

Finally, some of these benchmark-environment fields require the HSBencher harness to run several additional system commands and they also can add a lot of data to every row of uploaded benchmark data. Thus most of them are only populated if you explicitly ask the HSBencher to do so. And some of them are only spottily supported / supported on some platforms (mainly Linux):

  • UNAME - the HSBencher harness stores the output of uname -a to record a little something about the benchmark machine the data came from.
  • PROCESSOR - the exact (CPU or GPU) chip model that the benchmark result was taken from
  • TOPOLOGY - often unused: this is used to record the NUMA topology of the machine, e.g. as produced by lstopo.
  • ETC_ISSUE - the contents of the /etc/issue file which usually indicates which operating system the machine was running (on Linux.)
  • LSPCI - output of the lspci command to see what peripherals (including GPUs) were attached to the system at the time the data set was recorded.
  • WHO - the output of the who command. This is a sanity check that no one else was logged into the benchmark machine, which should have been doing nothing but running benchmarks at the time data was gathered.

Fourth, hsbencher-fusion-upload-criterion extras

These fields are not part of the core, but they are automatically added when uploading a criterion report. All of them (and MEDIANTIME) are calculated by Criterion's normal linear regression methodology. That is, the represent the expected marginal cost of adding one more iteration of the (possibly very short) benchmark:

  • BYTES_ALLOC - bytes allocated per iteration of the benchmark
  • BYTES_COPIED - expected bytes copied by the garbage collector for each additional invocation of the
  • CPUTIME - in uploaded Criterion data, MEDIANTIME is real time in seconds. Whereas this represents CPU time in seconds. In parallel benchmarks it may be greater than real time.
  • CYCLES - this compensates for clock frequency differences between machines and lists the actual number of cycles per iteration.
  • NUMGC - expected number of garbage collections for each iteration of the benchmark. Typically much smaller than 1.0.

Naming tips

Inevitably you will be mixing many different variations an parameters for your benchmarks. Here are the guidelines that we (in the parfunc group at IU) follow when naming benchmarks:

  • Use a lowercase, hyphen separated name for PROGNAME and VARIANT.
  • e.g. "map-insert" might be the benchmark for inserting in a map, and "skiplist" or "binary-tree" might be the VARIANT.
  • both name and variant might need to be compound symbols of several of these fields. Keeping them simple and short helps when they end up as filenames on the disk for different plots.
  • ARGS can contain a big list of command line arguments, but it's easier for plotting if it contains just a single number. Often this is the input size. Keep THREADS separate.

  • RUNTIME_FLAGS is seemingly redundant with ARGS. In most benchmarks these are both passed in as command line arguments to an executable. However, there's a difference. ARGS is usually part of the "key" -- the identity of which benchmark this is and how much work it did. By convention RUNTIME_FLAGS is more about tuning the implementation strategy. Changing RUNTIME_FLAGS should not make the results incomparable.

Version control tips

Always run your benchmarks from the same git repository. Everything should be included in there or be a submodule.

This is helpful because:

  • You can set up Jenkins to run benchmarks every time you push to the benchmark repository.
  • GIT_DEPTH becomes consistent and meaningful - This is a measurement of how many commits deep a given revision is. It is thus similar to mercurials notion of revision number and is a quick and handy way to communicate the vintage of a result or a cutoff point for a certain breaking change to your teammates. Use it.