-
Notifications
You must be signed in to change notification settings - Fork 15
bf flang
bf-flang - Inject Byfl instrumentation while compiling a program
bf-flang [-bf-by-func] [-bf-call-stack] [-bf-data-structs] [-bf-types] [-bf-inst-mix] [-bf-inst-deps] [-bf-vectors] [-bf-unique-bytes] [-bf-mem-footprint] [-bf-strides] [-bf-every-bb] [-bf-merge-bb=count] [-bf-reuse-dist[=loads|stores] [-bf-include=function[,function]...] [-bf-exclude=function[,function]...] [-bf-thread-safe] [-bf-verbose] [-bf-libdir=path/to/byfl/lib/] [-bf-plugin=path/to/bytesflops.so] [-bf-disable=feature] [flang_options...] [file...]
bf-flang is the Byfl project's Fortran compiler. It compiles Fortran code, instrumenting it to report various software performance counters at execution time. Software performance counters are analogous to the hardware performance counters provided by modern processors but measure program execution in a hardware-independent fashion. That is, users can expect to observe the same measurements across different processor architectures.
-
-bf-by-func
Report performance counters for each function individually.
-
-bf-call-stack
Report performance counters for each unique call stack.
-
-bf-data-structs
Report loads and stores on a per-data-structure basis.
-
-bf-types
Tally the number of times each data type is loaded or stored.
-
-bf-inst-mix
Tally the number of times each instruction type was executed.
-
-bf-inst-deps
Tally what instructions feed into what other instructions.
-
-bf-vectors
Report information about the number and type of vector operations performed.
-
-bf-unique-bytes
Report the number of unique memory addresses referenced.
-
-bf-mem-footprint
Report the memory capacity requires to hold various percentages of the dynamic memory accesses.
-
-bf-strides
Bin the stride sizes observes by each load and store.
-
-bf-every-bb
Report performance counters at the basic-block level.
-
-bf-merge-bb=count
Aggregate basic blocks into groups of count to reduce the output volume.
-
-bf-reuse-dist[=loads|stores]
Track data reuse distance. With an argument of
loads
, only loads are tracked. With an argument ofstores
, only stores are tracked. With no argument—or with an argument ofloads,stores
)—both loads and stores are tracked. -
-bf-include=function[,function]...
Instrument only the specified functions.
-
-bf-exclude=function[,function]...
Do not instrument the specified functions.
-
-bf-thread-safe
Prevent corruption caused by simultaneous accesses to the same set of performance counters.
-
-bf-verbose
Make bf-flang output all of the helper programs it calls.
-
-bf-libdir=path/to/byfl/lib/
Point bf-flang to the directory containing the Byfl library (
libbyfl.a
orlibbyfl.so
). -
-bf-plugin=path/to/bytesflops.so
Point bf-flang to the Byfl plugin (
bytesflops.so
). -
-bf-disable=feature
Disable certain aspects of bf-flang's operation.
In addition, bf-flang accepts all of the common flang options.
The simplest usage of bf-flang is to compile just like with flang:
bf-flang -O2 -g -o myprog myprog.f90
The resulting myprog
executable will output a basic set of
performance information at the end of the run.
More performance information can be requested—at the cost of slower execution and a larger memory footprint:
bf-flang -bf-by-func -bf-types -bf-inst-mix -bf-vectors \
-bf-mem-footprint -O2 -g -o myprog myprog.f90
-
BF_OPTS
Provide a space-separated list of bf-flang command-line arguments.
-
BF_PREFIX
Prefix each line of output with the contents of the
BF_PREFIX
environment variable. -
BF_BINOUT
Specify the name of a
.byfl
file to which to write detailed Byfl output in binary format. -
BF_FLANG
Wrap the specified compiler instead of flang.
BF_OPTS
is used at compile time. Command-line arguments take
precedence over those read from the BF_OPTS
environment variable.
The advantage of using the environment variable, however, is that a
user can rebuild a project with different sets of performance counters
without having to modify the project's Makefile
s (or analogue in
another build system) beyond an initial modification to use bf-flang
as the Fortran compiler.
BF_PREFIX
is used at run time. An important characteristic of the
BF_PREFIX
environment variable is that it honors POSIX shell-style
variable expansions. For example, if BF_PREFIX
is set to the
string Rank ${OMPI_COMM_WORLD_RANK}
, then a line that would
otherwise begin with BYFL_SUMMARY:
will instead begin with Rank 3 BYFL_SUMMARY:
, assuming that the OMPI_COMM_WORLD_RANK
environment
variable has the value 3
.
Although the characters |
, &
, ;
, <
, >
, (
,
)
, {
, and }
are not normally allowed within BF_PREFIX
,
BF_PREFIX
does support backquoted-command evaluation, and the child
command can contain those characters, as in
BF_PREFIX='`if true; then (echo YES; echo MAYBE); else echo NO; fi`'
(which prefixes each line with YES MAYBE
).
As a special case, if BF_PREFIX
expands to a string that begins
with /
or ./
, it is treated not as a prefix but as a filename.
The Byfl-instrumented executable will redirect all of its Byfl output
to that file instead of to the standard output device.
BF_BINOUT
is also used at run time. Like BF_PREFIX
, it honors
POSIX shell-style variable expansions. If BF_BINOUT
is set to the
empty string, no binary output file will be produced.
When -bf-call-stack is specified, a function F is reported separately when called from function A and when called from function B). -bf-call-stack overrides -bf-by-func.
For the purposes of -bf-data-structs, Byfl defines a data structure as either a statically allocated block of memory (which has a name in the executable's symbol table) or a collection of data blocks dynamically allocated from the same program call point (i.e., instruction address). Byfl assigns the latter a name based on a textual description of the call point.
The -bf-types option tallies, for example, the number of loads of 64-bit floating-point values as distinct from loads of 64-bit unsigned integer values.
See the LLVM Language Reference Manual for descriptions of the instructions that are tallied by -bf-inst-mix.
-bf-inst-deps tallies each instruction with the instructions that
produced its first two operands. (Ellipses are used to indicate that
the instruction takes more than two operands.) For example,
Xor(Add, Mul)
represents an exclusive OR with one operand being the
result of a previous integer addition and one being the result of a
previous integer multiplication (i.e., A = (B + C) XOR (D * E)
).
Use of -bf-unique-bytes consumes one bit of memory per unique address referenced by the program.
Use of -bf-mem-footprint consumes 8 bytes of memory per unique address referenced by the program.
A basic block is a unit of code that can be entered only at the first instruction and that branches only at the last instruction. Because basic blocks tend to be small, -bf-every-bb produces a substantial amount of output for typical programs. It is recommended that -bf-every-bb always be used in conjunction with -bf-merge-bb to reduce the amount of information output.
The -bf-disable option is quite useful for troubleshooting. Its option can be one of the following:
-
none
Don't disable any features (the default).
-
byfl
Disable the Byfl plugin (i.e., inject no instrumentation into the code).
That is, if bf-flang fails to compile or link an application, try
disabling byfl
to see if the problem is truly with Byfl.
The Byfl plugin proper (bytesflops.so
) honors all of the command-line
options listed above except -bf-verbose and -bf-disable. Those
options are specific to the bf-flang script.
The simplest way to instrument only part of a program is at the module level. That is, compile the "interesting" modules with bf-flang and the rest with flang (and link with bf-flang to pull in the Byfl run-time library). However, bf-flang also supports inserting programmer-defined "calipers" into the code. These can not only selectively enable and disable performance counters but can also distinguish blocks of code with a program-defined tag. To enable this feature, an application must define a function with the following C-language prototype:
const char* bf_categorize_counters (void);
That is, bf_categorize_counters()
takes no arguments and returns a
short tag describing the current phase of the application. A return
value of NULL
disables logging of performance counters.
Application developers should be aware of the following caveats
regarding bf_categorize_counters()
:
-
bf_categorize_counters()
should be written to execute quickly because it will be invoked extremely frequently (once per basic block). Consequently, a typical definition is forbf_categorize_counters()
simply to return a global variable and for the application to assign to that global variable at various points in the code. -
bf_categorize_counters()
works only when -bf-every-bb is specified. (bf-flang issues a warning message if the function is defined but the option is not specified.) If the user is not interested in seeing per-basic-block counters, these can be effectively disabled by specified a large argument to -bf-merge-bb (e.g.,-bf-merge-bb=18446744073709551615
). - Because bf-flang instruments code at compile time while
bf_categorize_counters()
works at run time, the implication is that returningNULL
still pays a performance penalty relative to uninstrumented code.
Thread safety is still quite premature. Even with -bf-thread-safe, instrumented code is likely to crash.
At -O0, the underlying flang compiler aborts with the following message and a dump of internal compiler state:
Pass 'Bytes:flops instrumentation' is not initialized.
Verify if there is a pass dependency cycle.
Required Passes:
Data Layout
Until this is resolved, please always specify at least -O1 when compiling programs with bf-flang.
Scott Pakin, [email protected]
flang(1), the Byfl home page