Skip to content
Scott Pakin edited this page Feb 2, 2017 · 7 revisions

Name

bf-clang - Inject Byfl instrumentation while compiling a program

Synopsis

bf-clang [-bf-by-func] [-bf-call-stack] [-bf-data-structs] [-bf-types] [-bf-inst-mix] [-bf-inst-deps] [-bf-vectors] [-bf-unique-bytes] [-bf-mem-footprint] [-bf-strides] [-bf-every-bb] [-bf-merge-bb=count] [-bf-reuse-dist[=loads|stores] [-bf-include=function[,function]...] [-bf-exclude=function[,function]...] [-bf-thread-safe] [-bf-verbose] [-bf-libdir=path/to/byfl/lib/] [-bf-plugin=path/to/bytesflops.so] [-bf-disable=feature] [clang_options...] [file...]

Description

bf-clang is the Byfl project's C compiler. It compiles C code, instrumenting it to report various software performance counters at execution time. Software performance counters are analogous to the hardware performance counters provided by modern processors but measure program execution in a hardware-independent fashion. That is, users can expect to observe the same measurements across different processor architectures.

Options

  • -bf-by-func

    Report performance counters for each function individually.

  • -bf-call-stack

    Report performance counters for each unique call stack.

  • -bf-data-structs

    Report loads and stores on a per-data-structure basis.

  • -bf-types

    Tally the number of times each data type is loaded or stored.

  • -bf-inst-mix

    Tally the number of times each instruction type was executed.

  • -bf-inst-deps

    Tally what instructions feed into what other instructions.

  • -bf-vectors

    Report information about the number and type of vector operations performed.

  • -bf-unique-bytes

    Report the number of unique memory addresses referenced.

  • -bf-mem-footprint

    Report the memory capacity requires to hold various percentages of the dynamic memory accesses.

  • -bf-strides

    Bin the stride sizes observes by each load and store.

  • -bf-every-bb

    Report performance counters at the basic-block level.

  • -bf-merge-bb=count

    Aggregate basic blocks into groups of count to reduce the output volume.

  • -bf-reuse-dist[=loads|stores]

    Track data reuse distance. With an argument of loads, only loads are tracked. With an argument of stores, only stores are tracked. With no argument—or with an argument of loads,stores)—both loads and stores are tracked.

  • -bf-include=function[,function]...

    Instrument only the specified functions.

  • -bf-exclude=function[,function]...

    Do not instrument the specified functions.

  • -bf-thread-safe

    Prevent corruption caused by simultaneous accesses to the same set of performance counters.

  • -bf-verbose

    Make bf-clang output all of the helper programs it calls.

  • -bf-libdir=path/to/byfl/lib/

    Point bf-clang to the directory containing the Byfl library (libbyfl.a or libbyfl.so).

  • -bf-plugin=path/to/bytesflops.so

    Point bf-clang to the Byfl plugin (bytesflops.so).

  • -bf-disable=feature

    Disable certain aspects of bf-clang's operation.

In addition, bf-clang accepts all of the common clang options.

Examples

The simplest usage of bf-clang is to compile just like with clang:

bf-clang -O2 -g -o myprog myprog.c

The resulting myprog executable will output a basic set of performance information at the end of the run.

More performance information can be requested—at the cost of slower execution and a larger memory footprint:

bf-clang -bf-by-func -bf-types -bf-inst-mix -bf-vectors \
  -bf-mem-footprint -O2 -g -o myprog myprog.c

Environment

  • BF_OPTS

    Provide a space-separated list of bf-clang command-line arguments.

  • BF_PREFIX

    Prefix each line of output with the contents of the BF_PREFIX environment variable.

  • BF_BINOUT

    Specify the name of a .byfl file to which to write detailed Byfl output in binary format.

  • BF_CLANG

    Wrap the specified compiler instead of clang.

BF_OPTS is used at compile time. Command-line arguments take precedence over those read from the BF_OPTS environment variable. The advantage of using the environment variable, however, is that a user can rebuild a project with different sets of performance counters without having to modify the project's Makefiles (or analogue in another build system) beyond an initial modification to use bf-clang as the C compiler.

BF_PREFIX is used at run time. An important characteristic of the BF_PREFIX environment variable is that it honors POSIX shell-style variable expansions. For example, if BF_PREFIX is set to the string Rank ${OMPI_COMM_WORLD_RANK}, then a line that would otherwise begin with BYFL_SUMMARY: will instead begin with Rank 3 BYFL_SUMMARY:, assuming that the OMPI_COMM_WORLD_RANK environment variable has the value 3.

Although the characters |, &, ;, <, >, (, ), {, and } are not normally allowed within BF_PREFIX, BF_PREFIX does support backquoted-command evaluation, and the child command can contain those characters, as in

BF_PREFIX='`if true; then (echo YES; echo MAYBE); else echo NO; fi`'

(which prefixes each line with YES MAYBE).

As a special case, if BF_PREFIX expands to a string that begins with / or ./, it is treated not as a prefix but as a filename. The Byfl-instrumented executable will redirect all of its Byfl output to that file instead of to the standard output device.

BF_BINOUT is also used at run time. Like BF_PREFIX, it honors POSIX shell-style variable expansions. If BF_BINOUT is set to the empty string, no binary output file will be produced.

Notes

Explanation of command-line options

When -bf-call-stack is specified, a function F is reported separately when called from function A and when called from function B). -bf-call-stack overrides -bf-by-func.

For the purposes of -bf-data-structs, Byfl defines a data structure as either a statically allocated block of memory (which has a name in the executable's symbol table) or a collection of data blocks dynamically allocated from the same program call point (i.e., instruction address). Byfl assigns the latter a name based on a textual description of the call point.

The -bf-types option tallies, for example, the number of loads of 64-bit floating-point values as distinct from loads of 64-bit unsigned integer values.

See the LLVM Language Reference Manual for descriptions of the instructions that are tallied by -bf-inst-mix.

-bf-inst-deps tallies each instruction with the instructions that produced its first two operands. (Ellipses are used to indicate that the instruction takes more than two operands.) For example, Xor(Add, Mul) represents an exclusive OR with one operand being the result of a previous integer addition and one being the result of a previous integer multiplication (i.e., A = (B + C) XOR (D * E)).

Use of -bf-unique-bytes consumes one bit of memory per unique address referenced by the program.

Use of -bf-mem-footprint consumes 8 bytes of memory per unique address referenced by the program.

A basic block is a unit of code that can be entered only at the first instruction and that branches only at the last instruction. Because basic blocks tend to be small, -bf-every-bb produces a substantial amount of output for typical programs. It is recommended that -bf-every-bb always be used in conjunction with -bf-merge-bb to reduce the amount of information output.

The -bf-disable option is quite useful for troubleshooting. Its option can be one of the following:

  • none

    Don't disable any features (the default).

  • byfl

    Disable the Byfl plugin (i.e., inject no instrumentation into the code).

That is, if bf-clang fails to compile or link an application, try disabling byfl to see if the problem is truly with Byfl.

The Byfl plugin proper (bytesflops.so) honors all of the command-line options listed above except -bf-verbose and -bf-disable. Those options are specific to the bf-clang script.

Selective instrumentation

The simplest way to instrument only part of a program is at the module level. That is, compile the "interesting" modules with bf-clang and the rest with clang (and link with bf-clang to pull in the Byfl run-time library). However, bf-clang also supports inserting programmer-defined "calipers" into the code. These can not only selectively enable and disable performance counters but can also distinguish blocks of code with a program-defined tag. To enable this feature, an application must define a function with the following C-language prototype:

const char* bf_categorize_counters (void);

That is, bf_categorize_counters() takes no arguments and returns a short tag describing the current phase of the application. A return value of NULL disables logging of performance counters.

Application developers should be aware of the following caveats regarding bf_categorize_counters():

  • bf_categorize_counters() should be written to execute quickly because it will be invoked extremely frequently (once per basic block). Consequently, a typical definition is for bf_categorize_counters() simply to return a global variable and for the application to assign to that global variable at various points in the code.
  • bf_categorize_counters() works only when -bf-every-bb is specified. (bf-clang issues a warning message if the function is defined but the option is not specified.) If the user is not interested in seeing per-basic-block counters, these can be effectively disabled by specified a large argument to -bf-merge-bb (e.g., -bf-merge-bb=18446744073709551615).
  • Because bf-clang instruments code at compile time while bf_categorize_counters() works at run time, the implication is that returning NULL still pays a performance penalty relative to uninstrumented code.

Bugs

Thread safety is still quite premature. Even with -bf-thread-safe, instrumented code is likely to crash.

At -O0, the underlying clang compiler aborts with the following message and a dump of internal compiler state:

Pass 'Bytes:flops instrumentation' is not initialized.
Verify if there is a pass dependency cycle.
Required Passes:
        Data Layout

Until this is resolved, please always specify at least -O1 when compiling programs with bf-clang.

Author

Scott Pakin, [email protected]

See also

clang(1), the Byfl home page

Clone this wiki locally