diff --git a/docs/conceptual/rocprof-sys-feature-set.rst b/docs/conceptual/rocprof-sys-feature-set.rst index b26e8f13..5f630777 100644 --- a/docs/conceptual/rocprof-sys-feature-set.rst +++ b/docs/conceptual/rocprof-sys-feature-set.rst @@ -2,9 +2,9 @@ :description: ROCm Systems Profiler feature set documentation and reference :keywords: rocprof-sys, rocprofiler-systems, Omnitrace, ROCm, profiler, feature set, use cases, tracking, visualization, tool, Instinct, accelerator, AMD -*************************************** -The ROCm Systems Profiler feature set and use cases -*************************************** +******************************************** +ROCm Systems Profiler features and use cases +******************************************** `ROCm Systems Profiler `_ is designed to be highly extensible. Internally, it leverages the `Timemory performance analysis toolkit `_ @@ -129,4 +129,4 @@ broad picture. In terms of CPU analysis, ROCm Systems Profiler does not target any specific vendor. It works just as well on AMD and non-AMD CPUs. With regard to the GPU, ROCm Systems Profiler is currently restricted to HIP and HSA APIs -and kernels running on AMD GPUs. \ No newline at end of file +and kernels running on AMD GPUs. diff --git a/docs/how-to/configuring-runtime-options.rst b/docs/how-to/configuring-runtime-options.rst index bc816883..f624318a 100644 --- a/docs/how-to/configuring-runtime-options.rst +++ b/docs/how-to/configuring-runtime-options.rst @@ -173,7 +173,7 @@ PAPI components from different namespaces: about the PAPI library used by ROCm Systems Profiler (because ROCm Systems Profiler statically links to ``libpapi``). However, all of these tools are installed with the prefix ``rocprof-sys-`` with - underscores replaced with hypens, for example ``papi_avail`` becomes ``rocprof-sys-papi-avail``. + underscores replaced with hyphens, for example ``papi_avail`` becomes ``rocprof-sys-papi-avail``. ROCPROFSYS_ROCM_EVENTS ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ diff --git a/docs/how-to/general-tips-using-rocprof-sys.rst b/docs/how-to/general-tips-using-rocprof-sys.rst index b74b4b54..bf0aee9d 100644 --- a/docs/how-to/general-tips-using-rocprof-sys.rst +++ b/docs/how-to/general-tips-using-rocprof-sys.rst @@ -2,9 +2,9 @@ :description: ROCm Systems Profiler general tips and usage documentation and reference :keywords: rocprof-sys, rocprofiler-systems, Omnitrace, ROCm, tips, how to, profiler, tracking, visualization, tool, Instinct, accelerator, AMD -********************************** +******************************************** General tips for using ROCm Systems Profiler -********************************** +******************************************** Follow these general guidelines when using ROCm Systems Profiler. For an explanation of the terms used in this topic, see the :doc:`ROCm Systems Profiler glossary <../reference/rocprof-sys-glossary>`. diff --git a/docs/how-to/performing-causal-profiling.rst b/docs/how-to/performing-causal-profiling.rst index a2629b15..c95a6d12 100644 --- a/docs/how-to/performing-causal-profiling.rst +++ b/docs/how-to/performing-causal-profiling.rst @@ -97,32 +97,32 @@ This can happen in three different ways: Key concepts ----------------------------------- -+------------------+-------------------------------------+----------------------------------+--------------------------------------------+ -| Concept | Setting | Options | Description | -+==================+=====================================+==================================+============================================+ ++------------------+--------------------------------------+----------------------------------+--------------------------------------------+ +| Concept | Setting | Options | Description | ++==================+======================================+==================================+============================================+ | Backend | ``ROCPROFSYS_CAUSAL_BACKEND`` | ``perf``, ``timer`` | Backend for recording samples required | -| | | | to calculate the virtual speed-up | -+------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| | | | to calculate the virtual speed-up | ++------------------+--------------------------------------+----------------------------------+--------------------------------------------+ | Mode | ``ROCPROFSYS_CAUSAL_MODE`` | ``function``, ``line`` | Select an entire function or individual | -| | | | line of code for causal experiments | -+------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| | | | line of code for causal experiments | ++------------------+--------------------------------------+----------------------------------+--------------------------------------------+ | End-to-end | ``ROCPROFSYS_CAUSAL_END_TO_END`` | Boolean | Perform a single experiment during the | -| | | | entire run (does not require | -| | | | progress points) | -+------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| | | | entire run (does not require | +| | | | progress points) | ++------------------+--------------------------------------+----------------------------------+--------------------------------------------+ | Fixed speed-up | ``ROCPROFSYS_CAUSAL_FIXED_SPEEDUP`` | one or more values from [0, 100] | Virtual speed-up or pool of virtual | -| | | | speed-ups to randomly select | -+------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| | | | speed-ups to randomly select | ++------------------+--------------------------------------+----------------------------------+--------------------------------------------+ | Binary scope | ``ROCPROFSYS_CAUSAL_BINARY_SCOPE`` | regular expression(s) | Dynamic binaries containing code for | -| | | | experiments | -+------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| | | | experiments | ++------------------+--------------------------------------+----------------------------------+--------------------------------------------+ | Source scope | ``ROCPROFSYS_CAUSAL_SOURCE_SCOPE`` | regular expression(s) | ```` and/or ``:`` | -| | | | containing code to include in experiments | -+------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| | | | containing code to include in experiments | ++------------------+--------------------------------------+----------------------------------+--------------------------------------------+ | Function scope | ``ROCPROFSYS_CAUSAL_FUNCTION_SCOPE`` | regular expression(s) | Restricts experiments to matching | -| | | | functions (function mode) or lines of | -| | | | code within matching functions (line mode) | -+------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| | | | functions (function mode) or lines of | +| | | | code within matching functions (line mode) | ++------------------+--------------------------------------+----------------------------------+--------------------------------------------+ .. note:: diff --git a/docs/how-to/profiling-python-scripts.rst b/docs/how-to/profiling-python-scripts.rst index 9b0b3efa..695e070f 100644 --- a/docs/how-to/profiling-python-scripts.rst +++ b/docs/how-to/profiling-python-scripts.rst @@ -28,7 +28,7 @@ be the same size. ``OS`` is the operating system, and ``ABI`` is the application binary interface, for example, ``libpyrocprofsys.cpython-38-x86_64-linux-gnu.so``. -Getting Started +Getting started ======================================== The ROCm Systems Profiler Python package is installed in ``lib/pythonX.Y/site-packages/rocprofsys``. @@ -44,7 +44,7 @@ Both the ``share/rocprofiler-systems/setup-env.sh`` script and the module file i environment variable. Running ROCm Systems Profiler on a Python script -======================================== +================================================ ROCm Systems Profiler provides an ``rocprof-sys-python`` helper bash script which ensures ``PYTHONPATH`` is properly set and the correct Python interpreter is used. @@ -200,7 +200,7 @@ And then run using the command ``rocprof-sys-python -b -- ./example.py``, ROCm S |-----------------------------------------------------------| ROCm Systems Profiler Python source instrumentation -======================================== +=================================================== Starting with the unmodified ``example.py`` script above, import the ``rocprofsys`` module: @@ -268,7 +268,7 @@ original ``rocprofsys-python ./example.py`` results: numerous functions called when more complex modules are imported, such as ``import numpy``. ROCm Systems Profiler Python source instrumentation configuration -------------------------------------------------------------- +----------------------------------------------------------------- Within the Python source code, the profiler can be configured by directly modifying the ``rocprof-sys.profiler.config`` data fields. diff --git a/docs/how-to/sampling-call-stack.rst b/docs/how-to/sampling-call-stack.rst index 0a4417d6..f8702373 100644 --- a/docs/how-to/sampling-call-stack.rst +++ b/docs/how-to/sampling-call-stack.rst @@ -343,7 +343,7 @@ An rocprof-sys-sample example Here is the full output from the previous ``rocprof-sys-sample -PTDH -E all -o rocprof-sys-output %tag% -- ./parallel-overhead-locks 30 4 100`` command: -.. code-block:: shell +.. code-block:: shell-session $ rocprof-sys-sample -PTDH -E all -o rocprof-sys-output %tag% -c -- ./parallel-overhead-locks 30 4 100 @@ -403,3 +403,4 @@ Here is the full output from the previous [rocprof-sys][1785877][metadata]> Outputting 'rocprof-sys-output/2024-07-15_16.21/parallel-overhead-locksmetadata-1785877.json' and 'rocprof-sys-output/2024-07-15_16.21/parallel-overhead-locksfunctions-1785877.json' [rocprof-sys][1785877][0][rocprofsys_finalize] Finalized: 0.054582 sec wall_clock, 0.000 MB peak_rss, -1.798 MB page_rss, 0.040000 sec cpu_clock, 73.3 % cpu_util [989.312] perfetto.cc:60128 Tracing session 1 ended, total sessions:0 + diff --git a/docs/how-to/understanding-rocprof-sys-output.rst b/docs/how-to/understanding-rocprof-sys-output.rst index 22549e24..66cb9312 100644 --- a/docs/how-to/understanding-rocprof-sys-output.rst +++ b/docs/how-to/understanding-rocprof-sys-output.rst @@ -238,7 +238,7 @@ Metadata JSON Sample } Configuring the ROCm Systems Profiler output -======================================== +============================================ ROCm Systems Profiler includes a core set of options for controlling the format and contents of the output files. For additional information, see the guide on diff --git a/docs/how-to/using-rocprof-sys-api.rst b/docs/how-to/using-rocprof-sys-api.rst index 78b4c808..7de4c11c 100644 --- a/docs/how-to/using-rocprof-sys-api.rst +++ b/docs/how-to/using-rocprof-sys-api.rst @@ -10,7 +10,7 @@ The following example shows how a program can use the ROCm Systems Profiler API for run-time analysis. ROCm Systems Profiler user API example program -======================================== +============================================== You can use the ROCm Systems Profiler API to define custom regions to profile and trace. The following C++ program demonstrates this technique by calling several functions from the @@ -157,7 +157,7 @@ ROCm Systems Profiler API, such as ``rocprofsys_user_push_region`` and } Linking the ROCm Systems Profiler libraries to another program -======================================================= +============================================================== To link the ``rocprofiler-systems-user-library`` to another program, use the following CMake and ``g++`` directives. @@ -186,7 +186,7 @@ Output from the API example program First, instrument and run the program. -.. code-block:: shell +.. code-block:: shell-session $ rocprof-sys-instrument -l --min-instructions=8 -E custom_push_region -o -- ./user-api ... diff --git a/docs/index.rst b/docs/index.rst index d11bd339..c498487b 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -2,17 +2,17 @@ :description: ROCm Systems Profiler documentation and reference :keywords: rocprof-sys, rocprofiler-systems, Omnitrace, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD -*********************** +*********************************** ROCm Systems Profiler documentation -*********************** +*********************************** -ROCm Systems Profiler, formerly known as "Omnitrace", is designed for the high-level profiling and comprehensive tracing +ROCm Systems Profiler is designed for the high-level profiling and comprehensive tracing of applications running on the CPU or the CPU and GPU. It supports dynamic binary instrumentation, call-stack sampling, and various other features for determining which function and line number are currently executing. To learn more, see :doc:`what-is-rocprof-sys` -The code is open and hosted at ``_. - +ROCm Systems Profiler is open source and hosted at ``__. +It is the successor to ``__. .. grid:: 2 :gutter: 3 @@ -22,17 +22,12 @@ The code is open and hosted at ``_. * :doc:`Quick start <./install/quick-start>` * :doc:`ROCm Systems Profiler installation <./install/install>` - -The documentation is structured as follows: +Use the following topics to learn more about the advantages of ROCm Systems Profiler in application +profiling, how it supports performance analysis, and how to leverage its capabilities in practice: .. grid:: 2 :gutter: 3 - .. grid-item-card:: Tutorials - - * `GitHub examples `_ - * :doc:`Video tutorials <./tutorials/video-tutorials>` - .. grid-item-card:: How to * :doc:`Configuring and validating the ROCm Systems Profiler environment <./how-to/configuring-validating-environment>` @@ -48,19 +43,24 @@ The documentation is structured as follows: .. grid-item-card:: Conceptual * :doc:`Data collection modes <./conceptual/data-collection-modes>` - * :doc:`The ROCm Systems Profiler feature set <./conceptual/rocprof-sys-feature-set>` + * :doc:`Features and use cases <./conceptual/rocprof-sys-feature-set>` .. grid-item-card:: Reference * :doc:`Development guide <./reference/development-guide>` - * :doc:`ROCm Systems Profiler glossary <./reference/rocprof-sys-glossary>` + * :doc:`Glossary <./reference/rocprof-sys-glossary>` * :doc:`API library <./doxygen/html/files>` * :doc:`Class member functions <./doxygen/html/functions>` * :doc:`Globals <./doxygen/html/globals>` * :doc:`Classes, structures, and interfaces <./doxygen/html/annotated>` + .. grid-item-card:: Tutorials + + * `GitHub examples `_ + * :doc:`Video tutorials <./tutorials/video-tutorials>` + To contribute to the documentation, refer to `Contributing to ROCm `_. You can find licensing information on the -`Licensing `_ page. \ No newline at end of file +`Licensing `_ page. diff --git a/docs/license.md b/docs/license.md deleted file mode 100644 index 1f8761f2..00000000 --- a/docs/license.md +++ /dev/null @@ -1,4 +0,0 @@ -# License - -```{include} ../LICENSE -``` diff --git a/docs/license.rst b/docs/license.rst new file mode 100644 index 00000000..a65784da --- /dev/null +++ b/docs/license.rst @@ -0,0 +1,8 @@ +.. meta:: + :description: ROCm Systems Profiler license + +******* +License +******* + +.. include:: ../LICENSE diff --git a/docs/reference/development-guide.rst b/docs/reference/development-guide.rst index 1f47fbfa..2a6c881d 100644 --- a/docs/reference/development-guide.rst +++ b/docs/reference/development-guide.rst @@ -16,7 +16,7 @@ Executables This section lists the ROCm Systems Profiler executables. rocprof-sys-avail: `source/bin/rocprof-sys-avail `_ -------------------------------------------------------------------------------------------------------------------------------- +----------------------------------------------------------------------------------------------------------------------------------------------- The ``main`` routine of ``rocprof-sys-avail`` has three important sections: @@ -25,7 +25,7 @@ The ``main`` routine of ``rocprof-sys-avail`` has three important sections: * Printing hardware counters rocprof-sys-sample: `source/bin/rocprof-sys-sample `_ ----------------------------------------------------------------------------------------------------------------------------------- +-------------------------------------------------------------------------------------------------------------------------------------------------- * Requires a command-line format of ``rocprof-sys-sample -- `` * Translates command-line options into environment variables @@ -33,7 +33,7 @@ rocprof-sys-sample: `source/bin/rocprof-sys-sample `` and a modified environment rocprof-sys-casual: `source/bin/rocprof-sys-causal `_ ----------------------------------------------------------------------------------------------------------------------------------- +--------------------------------------------------------------------------------------------------------------------------------------------------- When there is exactly one causal profiling configuration variant (which enables debugging), ``rocprof-sys-casual`` has a nearly identical design to ``rocprof-sys-sample`` @@ -46,7 +46,7 @@ the following actions take place for each variant: * the parent process waits for the child process to finish rocprof-sys-instrument: `source/bin/rocprof-sys-instrument `_ ----------------------------------------------------------------------------------------------------------------------------------------------- +-------------------------------------------------------------------------------------------------------------------------------------------------------------- * Requires a command-line format of ``rocprof-sys-instrument -- `` * Allows the user to provide options specifying whether to perform runtime instrumentation, use binary rewrite, or @@ -95,7 +95,7 @@ librocprof-sys: `source/lib/rocprof-sys `_ --------------------------------------------------------------------------------------------------------------------------------- +----------------------------------------------------------------------------------------------------------------------------------------- This is a lightweight, front-end library for ``librocprof-sys`` which serves three primary purposes: @@ -106,7 +106,7 @@ This is a lightweight, front-end library for ``librocprof-sys`` which serves thr * Coordinates communication between ``librocprof-sys-user`` and ``librocprof-sys`` librocprof-sys-user: `source/lib/rocprof-sys-user `_ --------------------------------------------------------------------------------------------------------------------------------- +----------------------------------------------------------------------------------------------------------------------------------------------- * Provides a set of functions and types for the users to add to their code, for example, disabling data collection globally or on a specific thread or diff --git a/docs/reference/rocprof-sys-glossary.rst b/docs/reference/rocprof-sys-glossary.rst index f14919fb..f259bb85 100644 --- a/docs/reference/rocprof-sys-glossary.rst +++ b/docs/reference/rocprof-sys-glossary.rst @@ -2,9 +2,9 @@ :description: ROCm Systems Profiler glossary and reference :keywords: rocprof-sys, rocprofiler-systems, Omnitrace, ROCm, glossary, terminology, profiler, tracking, visualization, tool, Instinct, accelerator, AMD -******************* -ROCm Systems Profiler Glossary -******************* +******** +Glossary +******** This topic explains the terminology necessary to use ROCm Systems Profiler. The list below provides a basic glossary for those who @@ -13,59 +13,59 @@ when certain terms have different contextual meanings, for example, the ROCm Systems Profiler meaning of the term "module" when instrumenting Python. -**Binary** +Binary A file written in the Executable and Linkable Format (ELF). This is the standard file format for executable files, shared libraries, etc. -**Binary instrumentation** +Binary instrumentation Inserting callbacks to instrumentation into an existing binary. This can be performed statically or dynamically. -**Static binary instrumentation** +Static binary instrumentation Loads an existing binary, determines instrumentation points, and generates a new binary with instrumentation directly embedded. It is applicable to executables and libraries but limited to only the functions defined in the binary. This is also known as **Binary rewrite**. -**Dynamic binary instrumentation** +Dynamic binary instrumentation Loads an existing binary into memory, inserts instrumentation, and runs the binary. It is limited to executables but is capable of instrumenting linked libraries. This is also known as **Runtime instrumentation**. -**Statistical sampling** +Statistical sampling At periodic intervals, the application is paused and the current call-stack of the CPU is recorded along with various other metrics. It uses timers that measure either (A) real clock time or (B) the CPU time used by the current thread and the CPU time expended on behalf of the thread by the system. This is also known as simply **sampling**. - **Sampling rate** + Sampling rate * The period at which (A) or (B) are triggered (in units of ``# interrupts / second``) * Higher values increase the number of samples - **Sampling delay** + Sampling delay * How long to wait before (A) and (B) begin triggering at their designated rate - **Sampling duration** + Sampling duration * The amount of time (in real-time) after the start of the application to record samples. * After this time limit has been reached, no more samples are recorded. -**Process sampling** +Process sampling At periodic (real-time) intervals, a background thread records global metrics without interrupting the current process. These metrics include, but are not limited to: CPU frequency, CPU memory high-water mark (i.e. peak memory usage), GPU temperature, and GPU power usage. - **Sampling rate** + Sampling rate * The real-time period for recording metrics (in units of ``# measurements / second``) * Higher values increase the number of samples - **Sampling delay** + Sampling delay * How long to wait (in real-time) before recording samples - **Sampling duration** + Sampling duration * The amount of time (in real-time) after the start of the application to record samples. * After this time limit has been reached, no more samples are recorded. -**Module** +Module With respect to binary instrumentation, a module is defined as either the filename (such as ``foo.c``) or library name (``libfoo.so``) which contains the definition of one or more functions. @@ -74,18 +74,18 @@ when instrumenting Python. the definition of one or more functions. The full path to this file typically contains the name of the "Python module". -**Basic block** +Basic block A straight-line code sequence with no branches in (except for the entry) and no branches out (except for the exit). -**Address range** +Address range The instructions for a function in a binary start at certain address with the ELF file and end at a certain address. The range is ``end - start``. The address range is a decent approximation for the "cost" of a function. For example, a larger address range approximately equates to more instructions. -**Instrumentation traps** +Instrumentation traps On the x86 architecture, because instructions are of variable size, an instruction might be too small for Dyninst to replace it with the normal code sequence used to call instrumentation. When instrumentation is placed at points other @@ -93,10 +93,10 @@ when instrumenting Python. the instrumentation fits. (By default, ``rocprof-sys-instrument`` avoids instrumentation which requires a trap.) -**Overlapping functions** +Overlapping functions Due to language constructs or compiler optimizations, it might be possible for multiple functions to overlap (that is, share part of the same function body) or for a single function to have multiple entry points. In practice, it's impossible to determine the difference between multiple overlapping functions and a single function with multiple entry points. (By default, ``rocprof-sys-instrument`` - avoids instrumenting overlapping functions.) \ No newline at end of file + avoids instrumenting overlapping functions.) diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index 486613a7..e6fb884d 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -15,13 +15,6 @@ subtrees: - file: install/install.rst title: ROCm Systems Profiler installation guide - - caption: Tutorials - entries: - - url: https://github.com/ROCm/rocprofiler-systems/tree/amd-mainline/examples - title: GitHub examples - - file: tutorials/video-tutorials.rst - title: Video tutorials - - caption: How to entries: - file: how-to/configuring-validating-environment.rst @@ -45,17 +38,17 @@ subtrees: - caption: Conceptual entries: + - file: conceptual/rocprof-sys-feature-set.rst + title: Features and use cases - file: conceptual/data-collection-modes.rst title: Data collection modes - - file: conceptual/rocprof-sys-feature-set.rst - title: The ROCm Systems Profiler feature set and use cases - caption: Reference entries: - file: reference/development-guide.rst title: Development guide - file: reference/rocprof-sys-glossary.rst - title: ROCm Systems Profiler glossary + title: Glossary - file: doxygen/html/files title: API library - file: doxygen/html/functions @@ -65,6 +58,13 @@ subtrees: - file: doxygen/html/annotated title: Classes, structures, and interfaces + - caption: Tutorials + entries: + - url: https://github.com/ROCm/rocprofiler-systems/tree/amd-mainline/examples + title: GitHub examples + - file: tutorials/video-tutorials.rst + title: Video tutorials + - caption: About entries: - - file: license.md + - file: license.rst diff --git a/docs/tutorials/video-tutorials.rst b/docs/tutorials/video-tutorials.rst index 71ef2b0c..37fcecb9 100644 --- a/docs/tutorials/video-tutorials.rst +++ b/docs/tutorials/video-tutorials.rst @@ -23,8 +23,8 @@ Instrumenting a binary

-Writing an ROCm Systems Profiler configuration file -======================================== +Writing a ROCm Systems Profiler configuration file +================================================== .. raw:: html @@ -35,4 +35,4 @@ Visualization and features of Perfetto traces .. raw:: html -

\ No newline at end of file +

diff --git a/docs/what-is-rocprof-sys.rst b/docs/what-is-rocprof-sys.rst index 09ec88a6..fb2a4ed1 100644 --- a/docs/what-is-rocprof-sys.rst +++ b/docs/what-is-rocprof-sys.rst @@ -2,9 +2,9 @@ :description: ROCm Systems Profiler introduction, explanation, and reference :keywords: rocprof-sys, rocprofiler-systems, Omnitrace, ROCm, profiler, explanation, introduction, what is, tracking, visualization, tool, Instinct, accelerator, AMD -****************** +****************************** What is ROCm Systems Profiler? -****************** +****************************** ROCm Systems Profiler is designed for the high-level profiling and comprehensive tracing of applications running on the CPU or the CPU and GPU. It supports dynamic binary