Skip to content

likwid perfscope

Thomas Roehl edited this page Jun 17, 2016 · 2 revisions

likwid-perfscope: Tool to perform live plotting of performance data

Introduction

likwid-perfscope is a command line application written in Lua that uses the timeline mode of likwid-perfctr to create on-the-fly pictures with the current measurements. It uses the feedGnuplot Perl script to send the current data to gnuplot. In order to make it more convenient for users, preconfigured plots of interesting metrics are embedded into likwid-perfscope. Since the plot windows are normally closed directly after the execution of the monitored applications, likwid-perfscope waits until Ctrl+c is pressed.

Options

-h, --help		 Help message
-v, --version		 Version information
-V, --verbose <level>	 Verbose output, 0 (only errors), 1 (info), 2 (details), 3 (developer)
-a			 Print all preconfigured plot configurations for the current system.
-c <list>		 Processor ids to measure, e.g. 1,2-4,8
-C <list>		 Processor ids to pin threads and measure, e.g. 1,2-4,8
-g, --group <string>	 Preconfigured plot group or custom event set string with plot config.
-t, --time <time>	 Frequency in s, ms or us, e.g. 300ms, for the timeline mode of likwid-perfctr
-d, --dump		 Print output as it is send to feedGnuplot.
-p, --plotdump		 Use dump functionality of feedGnuplot. Plots out plot configurations plus data to directly submit to gnuplot
--host <host>		 Run likwid-perfctr on the selected host using SSH. Evaluation and plotting is done locally.
			         This can be used for machines that have no gnuplot installed. All paths must be similar to the local machine.

Usage

The basic usage of likwid-perfscope is to use one of the predefined plot configurations that are embedded into the Lua script. All of them are time resolved, e.g. Mbyte/s or FLOP/s. A list of all plot available for the current architecture can be retrieved with

$ likwid-perfscope -a

which prints on an Intel IvyBridge EP system:

Group NUMA
	Perfctr group: NUMA
	Match for metric: Local DRAM bandwidth [MByte/s]
	Title of plot: NUMA separated memory bandwidth
	Title of x-axis: Time
	Title of y-axis: Bandwidth [MBytes/s]
	Match for second metric: Remote DRAM bandwidth [MByte/s]
	Title of y2-axis: Bandwidth [MBytes/s]

Group MEM_BAND
	Perfctr group: MEM
	Match for metric: Memory bandwidth [MBytes/s]
	Title of plot: Memory bandwidth
	Title of x-axis: Time
	Title of y-axis: Bandwidth [MBytes/s]

Group FLOPS_DP
	Perfctr group: FLOPS_DP
	Match for metric: MFlops/s
	Title of plot: Double Precision Flop Rate
	Title of x-axis: Time
	Title of y-axis: MFlops/s

Group L2_BAND
	Perfctr group: L2
	Match for metric: L2 bandwidth [MBytes/s]
	Title of plot: L2 cache bandwidth
	Title of x-axis: Time
	Title of y-axis: Bandwidth [MBytes/s]

Group L3_BAND
	Perfctr group: L3
	Match for metric: L3 bandwidth [MBytes/s]
	Title of plot: L3 cache bandwidth
	Title of x-axis: Time
	Title of y-axis: Bandwidth [MBytes/s]

Group FLOPS_SP
	Perfctr group: FLOPS_SP
	Match for metric: MFlops/s
	Title of plot: Single Precision Flop Rate
	Title of x-axis: Time
	Title of y-axis: MFlops/s

Group TEMP
	Perfctr group: ENERGY
	Match for metric: Temperature [C]
	Title of plot: Temperature
	Title of x-axis: Time
	Title of y-axis: Temperature [C]

Group POWER
	Perfctr group: ENERGY
	Match for metric: Power [W]
	Title of plot: Consumed power
	Title of x-axis: Time
	Title of y-axis: Power [W]
	Match for second metric: Power DRAM [W]
	Title of y2-axis: Power DRAM [W]

Group QPI_BAND
	Perfctr group: QPI
	Match for metric: QPI data bandwidth [MByte/s]
	Title of plot: QPI bandwidth
	Title of x-axis: Time
	Title of y-axis: Bandwidth [MBytes/s]
	Match for second metric: QPI link bandwidth [MByte/s]
	Title of y2-axis: Bandwidth [MBytes/s]

You can run these groups in a similar manner as with likwid-perfctr like:

$ likwid-perfscope -C S0:0 -g L3_BAND ./a.out

which measures the memory bandwidth on the first CPU of socket 0 and plots it using the title "L3 cache bandwidth", the x-axis has the label "Time" and the y-axis the label "Bandwidth [MBytes/s]". If you execute on multiple CPUs, each CPU gets its own line in the plot. There are plot configurations, like POWER that plots two lines per CPU, one for the CPU package power consumption and one for the DRAM power consumption. The DRAM power consumption uses the right y-axis with an own axis label "Power DRAM [W]".

You can increase the number of samples by setting -t <time> on the command line. The default value is one sample per second.

$ likwid-perfscope -C S0:0 -g L3_BAND -t 500ms ./a.out

Moreover, you can use the group switching functionality of the timeline mode to measure multiple metrics at once:

$ likwid-perfscope -C S0:0 -g L3_BAND -g L2_BAND -g MEM_BAND -t 500ms ./a.out

Each group opens its own plotting window and is updated in a round-robin fashion. Each group is measured 500ms.

If you want to record the measurements, you can use either -d or -p. The difference is, that -d outputs the strings that are send to feedGnuplot. The plot environment (title, labels) is not included. With -p the dump is made by feedGnuplot which prints the plot environment first and then for each update step the whole data that has been collected.

Output format of -d: <groupID> <runtime> <value_1_CPU1> (<value_2_CPU1>) (<value_1_CPU2>) (<value_2_CPU2>) ...

Example output of -p:

set grid
set xlabel  "Time"
set ylabel  "Bandwidth [MBytes/s]"
set title   "L3 cache bandwidth"
set boxwidth 1
histbin(x) = 1 * floor(0.5 + x/1)
set xtics
set xrange ["0":]
plot '-'   title "L3 bandwidth [MBytes/s]"   with linespoints
0 0
1.000161322585 48.433210629261
2.000241249986 21.798359943835
3.0003206090227 21.337482595053
4.0004001520114 14.873424079086
5.0004813269837 7.8612681493985
e

You can also perform the measurements on another host using the --host option:

$ likwid-perfscope -C S0:0@S1:0 -g POWER --host host1 ./a.out

but all paths need to be similar to the local system, the group must be available on the host and the CPU list valid. This feature is currently experimental.

Clone this wiki locally