-
Notifications
You must be signed in to change notification settings - Fork 8
SchemaStyleGuide
Once you've figured out how to upload data to the fusion table of your choice, there's a big remaining question: how do you use all those fields in the table?
However, before thinking about what any of these fields mean, you must know two things:
- HSBencher runs multiple trials per row of data it uploads - thus each row represents a "complete benchmark result" not an individual program execution.
- HSBencher has way too many builtin fields -- Early versions of hsbencher did not have the ability to extend the schema with custom fields on a per-project basis, so the builtin set became very bloated. By and large these extra columns are harmless and you can hide them in the fusion table view, and not bother downloading them when you fetch data. This bloated set of builtins remains in there as of version 1.20, but the plan is to deprecate them and then cut that set way down in a 2.0 release.
Because of the multiple-trials setup, one can reasonably ask of any field "which trial did this correspond to", for which the answer is:
- constant: many fields like
PROGNAME
are constant across trials. - all of them: some fields accumulate data from all trials, we tend to use
ALLTIMES
for a space separated list of the list of all tmes - the "median run": unless otherwise specified all scalar fields (other than
MINTIME
,MAXTIME
) correspond to the run whose time wasMEDIANTIME
. Time is implicitly the main measurement and this median run is the
If you have any problems with this (admittedly inflexible) scheme, the quickest fix is just to run with TRIALS=1 and run multiple rounds of complete the benchmark suite. Then do the aggregation of trials yourself later on.
You could ask why we don't just always accumulate all data from all trials. Why not either use the TRIALS=1 approach or make all fields accumulating like ALLTIMES
?
The short answer is that data storage backends (Fusion Tables, or regular spreadsheets) can make it very easy to explore data that has simple scalar fields. For example, Fusion Tables can give you easy scatter plots of data in your table and you an pass links to these plots around on IM, email, or websites. But as soon as you need to do any smart parsing, aggregation, or processing of the data, then that exceeds the capabilities of these simple visualization systems.
Here are some of the core fields and what they are used for.
-
PROGNAME
,ARGS
- generally benchmarks with the same name and arguments are comparable in the sense that they compute the same thing / do the same amount of work, albeit possibly in different ways. -
VARIANT
,THREADS
,RUNTIME_FLAGS
,COMPILER
,COMPILE_FLAGS
,ENV_VARS
- these describe exactly which implementation strategy was used to run the benchmark. Generally looking at different settings for these is what you want to plot when analyzing a given benchmark. -
TRIALS
- how many times each benchmark is run (for statistical rigor).
These represent what was measured from the benchmark run, across one or more trials:
-
MINTIME
,MEDIANTIME
,MAXTIME
- These are the main outcome of most benchmarking runs. How long did it take? Whether this number represents real time measurement, CPU time measurement, or linear regression (e.g. with Criterion) is up to you. -
MEDIANTIME_PRODUCTIVITY
- this is fraction of time (0.0-1.0) spent in the mutator as opposed to in garbage collection. It is gathered automatically by an HSBencher harness for Cabal or GHC-based benchmark runs. This particular value corresponds to the MEDIANTIME RUN. The name is thus redundant and could just be "PRODUCTIVITY", because the default is for scalar values to correspond -
MEDIANTIME_ALLOCRATE
,MEDIANTIME_MEMFOOTPRINT
- these are already deprecated, and store the allocation rate and the high-watermark memory footprint, in bytes/sec and bytes. -
ALLJITTIMES
- deprecated: this stores JIT compilation times, if any across all trials. -
RETRIES
- this measures how many times the benchmark had to be rerun if it crashed or failed to run successfully.
-
HOSTNAME
- what machine it ran on. Note: usually its best to make this the same name irrespective of which node in a supercomputer something runs on. Also, override the default choice so you get something reasonable rather than node names like "n001" which may actually be non-unique between different supercomputers. -
RUNID
- a unique ID generated by the HSBencher harness using a combination of hostname and time since the epoch. -
CI_BUILD_ID
- a place to put the build ID used by the continuous integration system (e.g. Jenkins) which launched the job in the first place. This is useful for tracking down the build logs that correspond to a given benchmark result. -
GIT_DEPTH
,GIT_HASH
,GIT_BRANCH
- HSBencher will try to call the "git" command on the system to ascertain these qualities of the repository its being run from. (This has not been properly generalized across multiple version control systems. But it's recommended that you simply reuse these fields if you are including analogous information about your alternative version control system.) See below notes on version control tips. -
BENCH_VERSION
- often unused: a place to track which version of the benchmark suite itself was used. This would then provide a notion of which benchmark results are comparable.GIT_DEPTH
should provide a conservative approximation to this, which says that every commit to the benchmark suite makes results incomparable with all previous results. Alternatively, projects with completely static code for the benchmarks can declare all results mutually comparable and ignore this. -
BENCH_FILE
- often unused: which file contained the benchmarks? When we had standalone text files rather than executable HSBencher harnesses, we would typically keep files like "benchlist_server.txt" and "benchlist_desktop.txt".
Finally, some of these benchmark-environment fields require the HSBencher harness to run several additional system commands and they also can add a lot of data to every row of uploaded benchmark data. Thus most of them are only populated if you explicitly ask the HSBencher to do so. And some of them are only spottily supported / supported on some platforms (mainly Linux):
-
UNAME
- the HSBencher harness stores the output ofuname -a
to record a little something about the benchmark machine the data came from. -
PROCESSOR
- the exact (CPU or GPU) chip model that the benchmark result was taken from -
TOPOLOGY
- often unused: this is used to record the NUMA topology of the machine, e.g. as produced bylstopo
. -
ETC_ISSUE
- the contents of the/etc/issue
file which usually indicates which operating system the machine was running (on Linux.) -
LSPCI
- output of thelspci
command to see what peripherals (including GPUs) were attached to the system at the time the data set was recorded. -
WHO
- the output of thewho
command. This is a sanity check that no one else was logged into the benchmark machine, which should have been doing nothing but running benchmarks at the time data was gathered.
These fields are not part of the core, but they are automatically added when uploading a criterion report. All of them (and MEDIANTIME) are calculated by Criterion's normal linear regression methodology. That is, the represent the expected marginal cost of adding one more iteration of the (possibly very short) benchmark:
-
BYTES_ALLOC
- bytes allocated per iteration of the benchmark -
BYTES_COPIED
- expected bytes copied by the garbage collector for each additional invocation of the -
CPUTIME
- in uploaded Criterion data,MEDIANTIME
is real time in seconds. Whereas this represents CPU time in seconds. In parallel benchmarks it may be greater than real time. -
CYCLES
- this compensates for clock frequency differences between machines and lists the actual number of cycles per iteration. -
NUMGC
- expected number of garbage collections for each iteration of the benchmark. Typically much smaller than 1.0.
Inevitably you will be mixing many different variations an parameters for your benchmarks. Here are the guidelines that we (in the parfunc group at IU) follow when naming benchmarks:
- Use a lowercase, hyphen separated name for PROGNAME and VARIANT.
- e.g. "map-insert" might be the benchmark for inserting in a map, and "skiplist" or "binary-tree" might be the VARIANT.
- both name and variant might need to be compound symbols of several of these fields. Keeping them simple and short helps when they end up as filenames on the disk for different plots.
-
ARGS
can contain a big list of command line arguments, but it's easier for plotting if it contains just a single number. Often this is the input size. KeepTHREADS
separate. -
RUNTIME_FLAGS
is seemingly redundant withARGS
. In most benchmarks these are both passed in as command line arguments to an executable. However, there's a difference.ARGS
is usually part of the "key" -- the identity of which benchmark this is and how much work it did. By conventionRUNTIME_FLAGS
is more about tuning the implementation strategy. ChangingRUNTIME_FLAGS
should not make the results incomparable.
Always run your benchmarks from the same git repository. Everything should be included in there or be a submodule.
This is helpful because:
- You can set up Jenkins to run benchmarks every time you push to the benchmark repository.
-
GIT_DEPTH
becomes consistent and meaningful - This is a measurement of how many commits deep a given revision is. It is thus similar to mercurials notion of revision number and is a quick and handy way to communicate the vintage of a result or a cutoff point for a certain breaking change to your teammates. Use it.