-
Notifications
You must be signed in to change notification settings - Fork 16
/
README.gpu
83 lines (50 loc) · 2.03 KB
/
README.gpu
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
On Apple Mac OS X, to use OpenCL, please configure TAU with:
./configure -opencl=/System/Library/Frameworks/OpenCL.framework
make install
tau_exec -T serial -opencl ./a.out
pprof
On other systems,
./configure
Configure TAU with:
./configure -cuda=<path to cuda toolkit>
or
./configure -opencl=<opencl headaers/libaries>
Then:
make install
Add <arch>/bin to your path and add <arch>/lib to your LD_LIBRARY_PATH.
Now to collect performance data run your application with tau_exec:
tau_exec -T serial <-cuda|-opencl> ./a.out
For traces type:
export TAU_TRACE=1
before the tau_exec command.
Post-process the trace file with these commands:
%> tau_multimerge
%> tau2slog ...
or
%> tau2otf ...
NOTE: TAU requires a call be made to cudaThreadExit()/cudaDeviceReset() or
clReleaseContext() and the end of execution. If this is not present in the code
you are wishing to instrument you will have to add it.
== CUPTI callbacks ==
CUPTI API calls can be tracking using the -cupti option to tau_exec:
%> tau_exec -T serial -cupti ./a.out
The default record the Runtime API. To track the Driver API calls type:
TAU_CUPTI_API=driver or TAU_CUPTI_API=both.
In any case the performance data is written out as profile.* files.
Type 'pprof' to get a text display of the data or 'paraprof' for a graphic display.
Setting TAU_TRACE=1 will produce trace files, post-process then with:
== GPU COUNTERS ==
The CUPTI counters available for a given machine can assessed by typing:
%> tau_cupti_avail
Set the counters you wish to collect by exporting them as a colon separated list
to the TAU_METRICS variable. ex:
export TAU_METRICS=CUDA.GeForce_GT_240.domain_b.instructions
then run with tau_exec:
tau_exec (-cuda|-cupti) ./a.out.
== PGI OpenACC ==
With PGI compilers, please use:
./configure -c++=pgCC -cc=pgcc -fortran=pgi ...
and OpenACC support will be configured in automatically if the PGI compiler has an
accelerator license and supports OpenACC. You may then use:
% tau_exec -T openacc ./a.out
to instrument the application using the OpenACC profiling API.