-
Notifications
You must be signed in to change notification settings - Fork 7
/
geopm_agent_gpu_activity.7.html
366 lines (345 loc) · 33.9 KB
/
geopm_agent_gpu_activity.7.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
<!DOCTYPE html>
<html class="writer-html5" lang="en" data-content_root="./">
<head>
<meta charset="utf-8" /><meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>geopm_agent_gpu_activity(7) – agent for selecting GPU frequency based on GPU compute activity — GEOPM documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=80d5e7a1" />
<link rel="stylesheet" type="text/css" href="_static/css/theme.css?v=e59714d7" />
<script src="_static/jquery.js?v=5d32c60e"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
<script src="_static/documentation_options.js?v=5929fcd5"></script>
<script src="_static/doctools.js?v=9bcbadda"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<script src="_static/js/theme.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="geopm_agent_monitor(7) – agent implementation for aggregating statistics" href="geopm_agent_monitor.7.html" />
<link rel="prev" title="geopm_agent_frequency_map(7) – agent for running regions at user selected frequencies" href="geopm_agent_frequency_map.7.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="index.html" class="icon icon-home">
GEOPM
<img src="https://geopm.github.io/images/geopm-logo-clear.png" class="logo" alt="Logo"/>
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="overview.html">Getting Started</a></li>
<li class="toctree-l1"><a class="reference internal" href="user_guides.html">User Guides</a></li>
<li class="toctree-l1"><a class="reference internal" href="contrib.html">Contributor Guide</a></li>
<li class="toctree-l1"><a class="reference internal" href="devel.html">Developer Guide</a></li>
<li class="toctree-l1"><a class="reference internal" href="publications.html">Publications</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="reference.html">Reference Manual</a><ul class="current">
<li class="toctree-l2 current"><a class="reference internal" href="reference.html#geopm-manual-pages">GEOPM Manual Pages</a><ul class="current">
<li class="toctree-l3"><a class="reference internal" href="geopm.7.html">geopm(7) – Global Extensible Open Power Manager</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_pio.7.html">geopm_pio(7) – GEOPM PlatformIO interface</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_pio_cnl.7.html">geopm_pio_cnl(7) – Signals and controls for Compute Node Linux Board-Level Metrics</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_pio_const_config.7.html">geopm_pio_const_config(7) – Signals for ConstConfigIOGroup</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_pio_cpuinfo.7.html">geopm_pio_cpuinfo(7) – Signals and controls for the CPUInfoIOGroup</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_pio_dcgm.7.html">geopm_pio_dcgm(7) – IOGroup providing signals and controls for NVIDIA GPUs</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_pio_levelzero.7.html">geopm_pio_levelzero(7) – IOGroup providing signals and controls for Intel GPUs</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_pio_msr.7.html">geopm_pio_msr(7) – Signals and controls for Model Specific Registers (MSRs)</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_pio_nvml.7.html">geopm_pio_nvml(7) – IOGroup providing signals and controls for NVIDIA GPUs</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_pio_profile.7.html">geopm_pio_profile(7) – Signals and controls for the ProfileIOGroup</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_pio_service.7.html">geopm_pio_service(7) – Signals and controls for the ServiceIOGroup</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_pio_sst.7.html">geopm_pio_sst(7) – Signals and controls for Intel Speed Select Technology</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_pio_sysfs.7.html">geopm_pio_sysfs(7) – Signals and controls for sysfs attributes</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_pio_time.7.html">geopm_pio_time(7) – Signals and controls for Time IO Group</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopmaccess.1.html">geopmaccess(1) – Access management for the GEOPM Service</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopmadmin.1.html">geopmadmin(1) – tool for GEOPM system administrators</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopmagent.1.html">geopmagent(1) – query agent information and create static policies</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopmbench.1.html">geopmbench(1) – synthetic benchmark application</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopmctl.1.html">geopmctl(1) – GEOPM runtime control application</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopmendpoint.1.html">geopmendpoint(1) – command line tool for dynamic policy control</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopmexporter.1.html">geopmexporter(1) – Prometheus exporter for GEOPM metrics</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopmlaunch.1.html">geopmlaunch(1) – application launch wrapper</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopmread.1.html">geopmread(1) – query platform information</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopmsession.1.html">geopmsession(1) – Command line interface for the GEOPM service batch read features</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopmwrite.1.html">geopmwrite(1) – modify platform state</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopmdpy.7.html">geopmdpy(7) – global extensible open power manager python daemon package</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopmpy.7.html">geopmpy(7) – global extensible open power manager python package</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_agent.3.html">geopm_agent(3) – query information about available agents</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_ctl.3.html">geopm_ctl(3) – GEOPM runtime control thread</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_daemon.3.html">geopm_daemon(3) – helpers for GEOPM daemons</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_endpoint.3.html">geopm_endpoint(3) – dynamic policy control for resource management</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_error.3.html">geopm_error(3) – error code descriptions</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_field.3.html">geopm_agent(3) – query information about available agents</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_fortran.3.html">geopm_fortran(3) – GEOPM fortran interface</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_hash.3.html">geopm_hash(3) – helper methods for encoding</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_imbalancer.3.html">geopm_imbalancer(3) – set artificial runtime imbalance</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_pio.3.html">geopm_pio(3) – interfaces to query and modify platform</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_policystore.3.html">geopm_policystore(3) – GEOPM resource policy store interface</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_prof.3.html">geopm_prof(3) – application profiling interfaces</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_sched.3.html">geopm_sched(3) – interface with Linux scheduler</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_time.3.html">geopm_time(3) – helper methods for time</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_topo.3.html">geopm_topo(3) – query platform component topology</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_version.3.html">geopm_version(3) – GEOPM library version</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3AAgent.3.html">geopm::Agent(3) – GEOPM agent plugin interface</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3AAgg.3.html">geopm::Agg(3) – data aggregation functions</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3ACNLIOGroup.3.html">geopm::CNLIOGroup(3) – IOGroup for interaction with Compute Node Linux</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3ACPUActivityAgent.3.html">geopm::CPUActivityAgent(3) – agent for selecting CPU frequency based on CPU compute activity</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3ACircularBuffer.3.html">geopm::CircularBuffer(3) – generic circular buffer</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3AComm.3.html">geopm::Comm(3) – communication abstractions</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3ACpuinfoIOGroup.3.html">geopm::CpuinfoIOGroup(3) – IOGroup for CPU frequency limits</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3ADaemon.3.html">geopm::Daemon(3) – GEOPM daemon helper methods</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3AEndpoint.3.html">geopm::Endpoint(3) – GEOPM endpoint interface</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3AException.3.html">geopm::Exception(3) – custom GEOPM exceptions</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3AGPUActivityAgent.3.html">geopm::GPUActivityAgent(3) – agent for selecting GPU frequency based on GPU compute activity</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3AHelper.3.html">geopm::Helper – common helper methods</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3AIOGroup.3.html">geopm::IOGroup(3) – provides system values and settings</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3AMPIComm.3.html">geopm::MPIComm(3) – implementation of Comm using MPI</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3AMSRIO.3.html">geopm::MSRIO(3) – methods for reading and writing MSRs</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3AMSRIOGroup.3.html">geopm::MSRIOGroup – IOGroup providing MSR-based signals and controls</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3AMonitorAgent.3.html">geopm::MonitorAgent – agent that enforces no policies</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3APlatformIO.3.html">geopm::PlatformIO(3) – GEOPM platform interface</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3APlatformTopo.3.html">geopm::PlatformTopo(3) – platform topology information</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3APluginFactory.3.html">geopm::PluginFactory(3) – abstract factory for plugins</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3APowerBalancer.3.html">geopm::PowerBalancer(3) – balances power according to epoch runtime</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3APowerBalancerAgent.3.html">geopm::PowerBalancerAgent(3) – agent optimizing performance under a power cap</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3APowerGovernor.3.html">geopm::PowerGovernor(3) – enforces a power limit</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3APowerGovernorAgent.3.html">geopm::PowerGovernorAgent(3) – agent that enforces a power cap</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3AProfileIOGroup.3.html">geopm::ProfileIOGroup(3) – IOGroup providing application signals</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3ASampleAggregator.3.html">geopm::SampleAggregator(3) – per-region aggregated signal data</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3ASharedMemory.3.html">geopm::SharedMemory(3) – abstractions for shared memory</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm%3A%3ATimeIOGroup.3.html">geopm::TimeIOGroup(3) – IOGroup providing time signals</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_report.7.html">geopm_report(7) – GEOPM summary report file</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_agent_cpu_activity.7.html">geopm_agent_cpu_activity(7) – agent for selecting CPU frequency based on CPU compute activity</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_agent_ffnet.7.html">geopm_agent_ffnet(7) – agent for adjusting frequencies based on application behavior</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_agent_frequency_map.7.html">geopm_agent_frequency_map(7) – agent for running regions at user selected frequencies</a></li>
<li class="toctree-l3 current"><a class="current reference internal" href="#">geopm_agent_gpu_activity(7) – agent for selecting GPU frequency based on GPU compute activity</a><ul>
<li class="toctree-l4"><a class="reference internal" href="#description">Description</a></li>
<li class="toctree-l4"><a class="reference internal" href="#agent-name">Agent Name</a></li>
<li class="toctree-l4"><a class="reference internal" href="#policy-parameters">Policy Parameters</a></li>
<li class="toctree-l4"><a class="reference internal" href="#constconfigiogroup-configuration-file-generation">ConstConfigIOGroup Configuration File Generation</a></li>
<li class="toctree-l4"><a class="reference internal" href="#example-policy">Example Policy</a></li>
<li class="toctree-l4"><a class="reference internal" href="#report-extensions">Report Extensions</a></li>
<li class="toctree-l4"><a class="reference internal" href="#control-loop-rate">Control Loop Rate</a></li>
<li class="toctree-l4"><a class="reference internal" href="#see-also">SEE ALSO</a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="geopm_agent_monitor.7.html">geopm_agent_monitor(7) – agent implementation for aggregating statistics</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_agent_power_balancer.7.html">geopm_agent_power_balancer(7) – agent optimizes performance under a power cap</a></li>
<li class="toctree-l3"><a class="reference internal" href="geopm_agent_power_governor.7.html">geopm_agent_power_governor(7) – agent enforces a power cap</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="reference.html#doxygen-pages">Doxygen Pages</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="releases.html">Releases</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="index.html">GEOPM</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="index.html" class="icon icon-home" aria-label="Home"></a></li>
<li class="breadcrumb-item"><a href="reference.html">Reference Manual</a></li>
<li class="breadcrumb-item active">geopm_agent_gpu_activity(7) – agent for selecting GPU frequency based on GPU compute activity</li>
<li class="wy-breadcrumbs-aside">
<a href="_sources/geopm_agent_gpu_activity.7.rst.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="geopm-agent-gpu-activity-7-agent-for-selecting-gpu-frequency-based-on-gpu-compute-activity">
<h1>geopm_agent_gpu_activity(7) – agent for selecting GPU frequency based on GPU compute activity<a class="headerlink" href="#geopm-agent-gpu-activity-7-agent-for-selecting-gpu-frequency-based-on-gpu-compute-activity" title="Link to this heading"></a></h1>
<section id="description">
<h2>Description<a class="headerlink" href="#description" title="Link to this heading"></a></h2>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>This is currently an experimental agent and is only available when
building GEOPM with the <code class="docutils literal notranslate"><span class="pre">--enable-beta</span></code> flag. Some areas or aspects that
are subject to change include its interface (e.g. the policy) and
algorithm. It is also possible that this agent may be refactored and
combined with other agents.</p>
</div>
<p>The goal of <strong>GPUActivityAgent</strong> is to save GPU energy by scaling GPU frequency
based upon the compute activity of each GPU as provided by the
<code class="docutils literal notranslate"><span class="pre">GPU_CORE_ACTIVITY</span></code> signal and modified by the <code class="docutils literal notranslate"><span class="pre">GPU_UTILIZATION</span></code> signal.</p>
<p>The agent scales frequency in the range of <code class="docutils literal notranslate"><span class="pre">Fe</span></code> to <code class="docutils literal notranslate"><span class="pre">Fmax</span></code>, where <code class="docutils literal notranslate"><span class="pre">Fmax</span></code>
is provided by the NVMLIOGroup or LevelZeroIOGroup and <code class="docutils literal notranslate"><span class="pre">Fe</span></code> is provided by the
ConstConfigIOGroup or LevelZeroIOGroup.</p>
<p>Low activity regions (compute activity
of 0.0) run at the <code class="docutils literal notranslate"><span class="pre">Fe</span></code> frequency, high activity regions (compute activity of 1.0)
run at the <code class="docutils literal notranslate"><span class="pre">Fmax</span></code> frequency, and regions in between the extremes run at a frequency (F)
selected using the equation:</p>
<p><code class="docutils literal notranslate"><span class="pre">F</span> <span class="pre">=</span> <span class="pre">Fe</span> <span class="pre">+</span> <span class="pre">(Fmax</span> <span class="pre">-</span> <span class="pre">Fe)</span> <span class="pre">*</span> <span class="pre">GPU_CORE_ACTIVITY/GPU_UTILIZATION</span></code></p>
<p><code class="docutils literal notranslate"><span class="pre">GPU_UTILIZATION</span></code> is used to scale the <code class="docutils literal notranslate"><span class="pre">GPU_CORE_ACTIVITY</span></code> in order
to scale frequency selection with the percentage of time a kernel is running on
the GPU. This tends to help with workloads that contain short but highly
scalable GPU phases.</p>
<p><code class="docutils literal notranslate"><span class="pre">Fe</span></code> is intended to be an energy efficient frequency that is selected via system
characterization. The recommended approach to selecting <code class="docutils literal notranslate"><span class="pre">Fe</span></code> is to perform a
frequency sweep on the GPUs of interest using a workload that scales strongly with
frequency. With this approach, <code class="docutils literal notranslate"><span class="pre">Fe</span></code> will be the frequency that provides the lowest
GPU energy consumption for the workload.</p>
<p><code class="docutils literal notranslate"><span class="pre">Fmax</span></code> is intended to be the maximum allowable frequency, and may be set as the
default GPU maximum frequency, or limited based upon user/admin preference.</p>
<p>The agent provides an optional input of <code class="docutils literal notranslate"><span class="pre">phi</span></code> that allows for biasing the
frequency range used by the agent. The default <code class="docutils literal notranslate"><span class="pre">phi</span></code> value of 0.5 provides frequency
selection in the full range from <code class="docutils literal notranslate"><span class="pre">Fe</span></code> to <code class="docutils literal notranslate"><span class="pre">Fmax</span></code>. A <code class="docutils literal notranslate"><span class="pre">phi</span></code> value less than 0.5 biases the
agent towards higher frequencies by increasing the <code class="docutils literal notranslate"><span class="pre">Fe</span></code> value.
In the extreme case (<code class="docutils literal notranslate"><span class="pre">phi</span></code> of 0) <code class="docutils literal notranslate"><span class="pre">Fe</span></code> will be raised to <code class="docutils literal notranslate"><span class="pre">Fmax</span></code>. A <code class="docutils literal notranslate"><span class="pre">phi</span></code> value greater than
0.5 biases the agent towards lower frequencies by reducing the <code class="docutils literal notranslate"><span class="pre">Fmax</span></code> value.
In the extreme case (<code class="docutils literal notranslate"><span class="pre">phi</span></code> of 1.0) <code class="docutils literal notranslate"><span class="pre">Fmax</span></code> will be lowered to <code class="docutils literal notranslate"><span class="pre">Fe</span></code>.</p>
<p>For NVIDIA based systems the agent should be used with DCGM settings of
<code class="docutils literal notranslate"><span class="pre">DCGM::FIELD_UPDATE_RATE</span></code> = 100 ms, <code class="docutils literal notranslate"><span class="pre">DCGM::MAX_STORAGE_TIME</span></code> = 1 s, and <code class="docutils literal notranslate"><span class="pre">DCGM::MAX_SAMPLES</span></code>
= 100. While the DCGM documentation indicates that users should generally query
no faster than 100 ms, the interface allows for setting the polling rate in the
microsecond range. If the agent is intended to be used with workloads that exhibit
extremely short phase behavior a 1 ms polling rate can be used.
As the 1 ms polling rate is not officially recommended by the DCGM API the 100 ms
setting should be used by default.</p>
</section>
<section id="agent-name">
<h2>Agent Name<a class="headerlink" href="#agent-name" title="Link to this heading"></a></h2>
<p>The agent described in this manual is selected in many geopm
interfaces with the <code class="docutils literal notranslate"><span class="pre">"gpu_activity"</span></code> agent name. This name can be
passed to <a class="reference internal" href="geopmlaunch.1.html"><span class="doc">geopmlaunch(1)</span></a> as the argument to the <code class="docutils literal notranslate"><span class="pre">--geopm-agent</span></code>
option, or the <code class="docutils literal notranslate"><span class="pre">GEOPM_AGENT</span></code> environment variable can be set to this
name (see <a class="reference internal" href="geopm.7.html"><span class="doc">geopm(7)</span></a>). This name can also be passed to the
<a class="reference internal" href="geopmagent.1.html"><span class="doc">geopmagent(1)</span></a> as the argument to the <code class="docutils literal notranslate"><span class="pre">'-a'</span></code> option.</p>
</section>
<section id="policy-parameters">
<h2>Policy Parameters<a class="headerlink" href="#policy-parameters" title="Link to this heading"></a></h2>
<p>The <code class="docutils literal notranslate"><span class="pre">Phi</span></code> input is the only policy value.</p>
<blockquote>
<div><dl class="simple">
<dt><code class="docutils literal notranslate"><span class="pre">GPU_PHI</span></code>:</dt><dd><p>The performance bias knob. The value must be between
0.0 and 1.0. If NAN is passed, it will use 0.5 by default.</p>
</dd>
</dl>
</div></blockquote>
</section>
<section id="constconfigiogroup-configuration-file-generation">
<h2>ConstConfigIOGroup Configuration File Generation<a class="headerlink" href="#constconfigiogroup-configuration-file-generation" title="Link to this heading"></a></h2>
<p>This version of the agent uses ConstConfigIO to provide per-node Fe values.</p>
<p>The GPU compute activity ConstConfigIOGroup configuration file can be generated by running:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">integration</span><span class="o">/</span><span class="n">experiment</span><span class="o">/</span><span class="n">gpu_frequency_sweep</span><span class="o">/</span><span class="n">gen_gpu_activity_constconfig_recommendation</span><span class="o">.</span><span class="n">py</span> <span class="o">--</span><span class="n">path</span> <span class="o"><</span><span class="n">GPU_SWEEP_DIR</span><span class="o">></span>
</pre></div>
</div>
<p>Depending on the number of runs, system noise, and other factors there may be more than one reasonable
value for <code class="docutils literal notranslate"><span class="pre">Fe</span></code>. In these cases a warning similar to the following will be provided:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="s1">'Warning: Found N possible alternate Fe value(s) within 5</span><span class="si">% e</span><span class="s1">nergy consumption of Fe for <frequency>.</span>
<span class="n">Consider</span> <span class="n">using</span> <span class="n">the</span> <span class="n">energy</span><span class="o">-</span><span class="n">margin</span> <span class="n">options</span><span class="o">.</span>\<span class="n">n</span><span class="s1">'</span>
</pre></div>
</div>
<p>If the occurs the user may choose to use the provided configuration file or rerun the recommendation script with
the energy-margin option <code class="docutils literal notranslate"><span class="pre">--gpu-energy-margin</span></code> along with a value such as 0.05 (5%).
This option attempt to identify a lower <code class="docutils literal notranslate"><span class="pre">Fe</span></code> for the gpu domain that costs less than the energy consumed at <code class="docutils literal notranslate"><span class="pre">Fe</span></code>
plus the energy-margin percentage provided.</p>
<p>An example ConstConfigIOGroup configuration file is provided below:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">{</span>
<span class="s2">"GPU_FREQUENCY_EFFICIENT_HIGH_INTENSITY"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"domain"</span><span class="p">:</span> <span class="s2">"board"</span><span class="p">,</span>
<span class="s2">"description"</span><span class="p">:</span> <span class="s2">"Defines the efficient compute frequency to use for GPUs. This value is based on a workload that scales strongly with the frequency domain."</span><span class="p">,</span>
<span class="s2">"units"</span><span class="p">:</span> <span class="s2">"hertz"</span><span class="p">,</span>
<span class="s2">"aggregation"</span><span class="p">:</span> <span class="s2">"average"</span><span class="p">,</span>
<span class="s2">"values"</span><span class="p">:</span> <span class="p">[</span><span class="mf">982000000.0</span><span class="p">]</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
</div>
</section>
<section id="example-policy">
<h2>Example Policy<a class="headerlink" href="#example-policy" title="Link to this heading"></a></h2>
<p>An example policy is provided below:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">{</span><span class="s2">"GPU_PHI"</span><span class="p">:</span> <span class="mf">0.5</span><span class="p">}</span>
</pre></div>
</div>
</section>
<section id="report-extensions">
<h2>Report Extensions<a class="headerlink" href="#report-extensions" title="Link to this heading"></a></h2>
<blockquote>
<div><dl class="simple">
<dt><code class="docutils literal notranslate"><span class="pre">GPU</span> <span class="pre">Frequency</span> <span class="pre">Requests</span></code>:</dt><dd><p>The number of frequency requests made by the agent</p>
</dd>
<dt><code class="docutils literal notranslate"><span class="pre">Resolved</span> <span class="pre">Max</span> <span class="pre">Frequency</span></code>:</dt><dd><p><code class="docutils literal notranslate"><span class="pre">Fmax</span></code> after <code class="docutils literal notranslate"><span class="pre">phi</span></code> has been taken into account</p>
</dd>
<dt><code class="docutils literal notranslate"><span class="pre">Resolved</span> <span class="pre">Efficient</span> <span class="pre">Frequency</span></code>:</dt><dd><p><code class="docutils literal notranslate"><span class="pre">Fe</span></code> after <code class="docutils literal notranslate"><span class="pre">phi</span></code> has been taken into account</p>
</dd>
<dt><code class="docutils literal notranslate"><span class="pre">Resolved</span> <span class="pre">Frequency</span> <span class="pre">Range</span></code>:</dt><dd><p>The frequency selection range of the agent after <code class="docutils literal notranslate"><span class="pre">phi</span></code> has
been taken into account</p>
</dd>
<dt><code class="docutils literal notranslate"><span class="pre">GPU</span> <span class="pre">#</span> <span class="pre">Active</span> <span class="pre">Region</span> <span class="pre">Energy</span></code>:</dt><dd><p>Per GPU energy reading during the Region
of Interest (ROI) where ROI is determined as the
first sample of GPU activity to the last sample of GPU
activity.</p>
</dd>
<dt><code class="docutils literal notranslate"><span class="pre">GPU</span> <span class="pre">#</span> <span class="pre">Active</span> <span class="pre">Region</span> <span class="pre">Time</span></code>:</dt><dd><p>Per GPU time during the Region
of Interest (ROI) where ROI is determined as the
first sample of GPU activity to the last sample of GPU
activity.</p>
</dd>
<dt><code class="docutils literal notranslate"><span class="pre">GPU</span> <span class="pre">#</span> <span class="pre">Active</span> <span class="pre">Region</span> <span class="pre">Start</span> <span class="pre">Time</span></code>:</dt><dd><p>Per GPU start time for the Region
of Interest (ROI) where ROI is determined as the
first sample of GPU activity to the last sample of GPU
activity.</p>
</dd>
<dt><code class="docutils literal notranslate"><span class="pre">GPU</span> <span class="pre">#</span> <span class="pre">Active</span> <span class="pre">Region</span> <span class="pre">Stop</span> <span class="pre">Time</span></code>:</dt><dd><p>Per GPU stop time for the Region
of Interest (ROI) where ROI is determined as the
first sample of GPU activity to the last sample of GPU
activity.</p>
</dd>
</dl>
</div></blockquote>
</section>
<section id="control-loop-rate">
<h2>Control Loop Rate<a class="headerlink" href="#control-loop-rate" title="Link to this heading"></a></h2>
<p>The agent gates the control loop to a cadence of 20ms.</p>
</section>
<section id="see-also">
<h2>SEE ALSO<a class="headerlink" href="#see-also" title="Link to this heading"></a></h2>
<p><a class="reference internal" href="geopm.7.html"><span class="doc">geopm(7)</span></a>,
<a class="reference internal" href="geopm_agent_monitor.7.html"><span class="doc">geopm_agent_monitor(7)</span></a>,
<a class="reference internal" href="geopm%3A%3AAgent.3.html"><span class="doc">geopm::Agent(3)</span></a>,
<a class="reference internal" href="geopm_agent.3.html"><span class="doc">geopm_agent(3)</span></a>,
<a class="reference internal" href="geopm_prof.3.html"><span class="doc">geopm_prof(3)</span></a>,
<a class="reference internal" href="geopmagent.1.html"><span class="doc">geopmagent(1)</span></a>,
<a class="reference internal" href="geopmlaunch.1.html"><span class="doc">geopmlaunch(1)</span></a></p>
</section>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="geopm_agent_frequency_map.7.html" class="btn btn-neutral float-left" title="geopm_agent_frequency_map(7) – agent for running regions at user selected frequencies" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="geopm_agent_monitor.7.html" class="btn btn-neutral float-right" title="geopm_agent_monitor(7) – agent implementation for aggregating statistics" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>© Copyright 2015 - 2024 Intel Corporation. All rights reserved..</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>