From 17ff1dc31d13ec5e735485dd2c775da4547c994b Mon Sep 17 00:00:00 2001 From: David Galiffi Date: Wed, 25 Sep 2024 22:34:46 -0400 Subject: [PATCH] Reference "Known Issue" next to references of Perfetto. (#393) * And provide workaround link to Perfetto v46.0 Co-authored-by: Peter Park --- .../how-to/understanding-omnitrace-output.rst | 62 +++++++------ docs/install/install.rst | 93 ++++++++++--------- docs/reference/development-guide.rst | 8 +- docs/what-is-omnitrace.rst | 13 ++- 4 files changed, 96 insertions(+), 80 deletions(-) diff --git a/docs/how-to/understanding-omnitrace-output.rst b/docs/how-to/understanding-omnitrace-output.rst index b7301f41..08d56099 100644 --- a/docs/how-to/understanding-omnitrace-output.rst +++ b/docs/how-to/understanding-omnitrace-output.rst @@ -28,7 +28,7 @@ For example, starting with the following base configuration: [omnitrace] Outputting 'omnitrace-example-output/wall-clock.txt'... [omnitrace] Outputting 'omnitrace-example-output/wall-clock.json'... -If the ``OMNITRACE_USE_PID`` option is enabled, then running a non-MPI executable +If the ``OMNITRACE_USE_PID`` option is enabled, then running a non-MPI executable with a PID of ``63453`` results in the following output: .. code-block:: shell @@ -58,7 +58,7 @@ Metadata ======================================== Omnitrace outputs a ``metadata.json`` file. This metadata file contains -information about the settings, environment variables, output files, and info +information about the settings, environment variables, output files, and info about the system and the run, as follows: * Hardware cache sizes @@ -240,14 +240,14 @@ Metadata JSON Sample Configuring the Omnitrace output ======================================== -Omnitrace includes a core set of options for controlling the format +Omnitrace includes a core set of options for controlling the format and contents of the output files. For additional information, see the guide on :doc:`configuring runtime options <./configuring-runtime-options>`. Core configuration settings ----------------------------------- -.. csv-table:: +.. csv-table:: :header: "Setting", "Value", "Description" :widths: 30, 30, 100 @@ -261,20 +261,20 @@ Core configuration settings Output prefix keys ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Output prefix keys have many uses but are most helpful when dealing with multiple +Output prefix keys have many uses but are most helpful when dealing with multiple profiling runs or large MPI jobs. -They are included in Omnitrace because they were introduced into Timemory +They are included in Omnitrace because they were introduced into Timemory for `compile-time-perf `_. -They are needed to create different output files for a generic wrapper around +They are needed to create different output files for a generic wrapper around compilation commands while still overwriting the output from the last time a file was compiled. -When doing scaling studies and specifying options via the command line, +When doing scaling studies and specifying options via the command line, the recommended process is to use a common ``OMNITRACE_OUTPUT_PATH``, disable ``OMNITRACE_TIME_OUTPUT``, set ``OMNITRACE_OUTPUT_PREFIX="%argt%-"``, and let Omnitrace cleanly organize the output. -.. csv-table:: +.. csv-table:: :header: "String", "Encoding" :widths: 20, 120 @@ -311,16 +311,22 @@ set ``OMNITRACE_OUTPUT_PREFIX="%argt%-"``, and let Omnitrace cleanly organize th .. note:: In any output prefix key which contains a ``/`` character, the ``/`` characters - are replaced with ``_`` and any leading underscores are stripped. For example, - an ``%arg0%`` of ``/usr/bin/foo`` translates to ``usr_bin_foo``. Additionally, any ``%arg%`` keys which + are replaced with ``_`` and any leading underscores are stripped. For example, + an ``%arg0%`` of ``/usr/bin/foo`` translates to ``usr_bin_foo``. Additionally, any ``%arg%`` keys which do not have a command line argument at position ```` are ignored. Perfetto output ======================================== -Use the ``OMNITRACE_OUTPUT_FILE`` to specify a specific location. If this is an +Use the ``OMNITRACE_OUTPUT_FILE`` to specify a specific location. If this is an absolute path, then all ``OMNITRACE_OUTPUT_PATH`` and similar -settings are ignored. Visit `ui.perfetto.dev `_ and open this file. +settings are ignored. Visit `ui.perfetto.dev `_ and open +this file. + +.. important:: + Perfetto validation is done with trace_processor v46.0 as there is a known issue with v47.0. + If you are experiencing problems viewing your trace in the latest version of `Perfetto `_, + then try using `Perfetto UI v46.0 `_. .. image:: ../data/omnitrace-perfetto.png :alt: Visualization of a performance graph in Perfetto @@ -349,20 +355,20 @@ Use ``omnitrace-avail --components --filename`` to view the base filename for ea | sampling_wall_clock | true | sampling_wall_clock | |---------------------------------|---------------|------------------------| -The ``OMNITRACE_COLLAPSE_THREADS`` and ``OMNITRACE_COLLAPSE_PROCESSES`` settings are -only valid when full `MPI support is enabled <../install/install.html#mpi-support-within-omnitrace>`_. -When they are set, Timemory combines the per-thread and per-rank data (respectively) of +The ``OMNITRACE_COLLAPSE_THREADS`` and ``OMNITRACE_COLLAPSE_PROCESSES`` settings are +only valid when full `MPI support is enabled <../install/install.html#mpi-support-within-omnitrace>`_. +When they are set, Timemory combines the per-thread and per-rank data (respectively) of identical call stacks. -The ``OMNITRACE_FLAT_PROFILE`` setting removes all call stack hierarchy. +The ``OMNITRACE_FLAT_PROFILE`` setting removes all call stack hierarchy. Using ``OMNITRACE_FLAT_PROFILE=ON`` in combination -with ``OMNITRACE_COLLAPSE_THREADS=ON`` is a useful configuration for identifying +with ``OMNITRACE_COLLAPSE_THREADS=ON`` is a useful configuration for identifying min/max measurements regardless of the calling context. -The ``OMNITRACE_TIMELINE_PROFILE`` setting (with ``OMNITRACE_FLAT_PROFILE=OFF``) effectively +The ``OMNITRACE_TIMELINE_PROFILE`` setting (with ``OMNITRACE_FLAT_PROFILE=OFF``) effectively generates similar data to that found -in Perfetto. Enabling timeline and flat profiling effectively generates +in Perfetto. Enabling timeline and flat profiling effectively generates similar data to ``strace``. However, while Timemory generally -requires significantly less memory than Perfetto, this is not the case in timeline +requires significantly less memory than Perfetto, this is not the case in timeline mode, so use this setting with caution. Timemory text output @@ -381,11 +387,11 @@ The truncation settings be changed through the ``OMNITRACE_MAX_WIDTH`` setting. Timemory text output example ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -In the following example, the ``NN`` field in ``|NN>>>`` is the thread ID. If MPI support is enabled, +In the following example, the ``NN`` field in ``|NN>>>`` is the thread ID. If MPI support is enabled, this becomes ``|MM|NN>>>`` where ``MM`` is the rank. -If ``OMNITRACE_COLLAPSE_THREADS=ON`` and ``OMNITRACE_COLLAPSE_PROCESSES=ON`` are configured, +If ``OMNITRACE_COLLAPSE_THREADS=ON`` and ``OMNITRACE_COLLAPSE_PROCESSES=ON`` are configured, neither the ``MM`` nor the ``NN`` are present unless the -component explicitly sets type traits. Type traits specify that the data is only +component explicitly sets type traits. Type traits specify that the data is only relevant per-thread or per-process, such as the ``thread_cpu_clock`` clock component. .. code-block:: shell @@ -573,15 +579,15 @@ relevant per-thread or per-process, such as the ``thread_cpu_clock`` clock compo Timemory JSON output ------------------------------------------------------------------------- -Timemory represents the data within the JSON output in two forms: +Timemory represents the data within the JSON output in two forms: a flat structure and a hierarchical structure. The flat JSON data represents the data similar to the text files, where the hierarchical information is represented by the indentation of the ``prefix`` field and the ``depth`` field. -The hierarchical JSON contains additional information with respect +The hierarchical JSON contains additional information with respect to inclusive and exclusive values. However, its structure must be processed using recursion. This section of the JSON output supports analysis by `hatchet `_. -All the data entries for the flat structure are in a single JSON array. It is easier to +All the data entries for the flat structure are in a single JSON array. It is easier to write a simple Python script for post-processing using this format than with the hierarchical structure. .. note:: @@ -929,7 +935,7 @@ Timemory JSON output Python post-processing example ) ) -The result of applying this script to the corresponding JSON output from the :ref:`text-output-example-label` +The result of applying this script to the corresponding JSON output from the :ref:`text-output-example-label` section is as follows: .. code-block:: shell diff --git a/docs/install/install.rst b/docs/install/install.rst index f0ee1662..8973e5ed 100644 --- a/docs/install/install.rst +++ b/docs/install/install.rst @@ -18,8 +18,8 @@ Release links To review and install either the current Omnitrace release or earlier releases, use these links: -* Latest Omnitrace Release: ``_ -* All Omnitrace Releases: ``_ +* Latest Omnitrace Release: ``_ +* All Omnitrace Releases: ``_ Operating system support ======================================== @@ -39,7 +39,7 @@ Other OS distributions might function but are not supported or tested. Identifying the operating system ----------------------------------- -If you are unsure of the operating system and version, the ``/etc/os-release`` and +If you are unsure of the operating system and version, the ``/etc/os-release`` and ``/usr/lib/os-release`` files contain operating system identification data for Linux systems. .. code-block:: shell @@ -84,8 +84,8 @@ For example, ... omnitrace-1.0.0-ubuntu-20.04-ROCm-50000-OMPT-PAPI-Python3.sh -Any of the ``EXTRA`` fields with a CMake build option -(for example, PAPI, as referenced in a following section) or +Any of the ``EXTRA`` fields with a CMake build option +(for example, PAPI, as referenced in a following section) or with no link requirements (such as OMPT) have self-contained support for these packages. @@ -113,17 +113,17 @@ Installing Omnitrace from source ======================================== Omnitrace needs a GCC compiler with full support for C++17 and CMake v3.16 or higher. -The Clang compiler may be used in lieu of the GCC compiler if `Dyninst `_ +The Clang compiler may be used in lieu of the GCC compiler if `Dyninst `_ is already installed. Build requirements ----------------------------------- * GCC compiler v7+ - + * Older GCC compilers may be supported but are not tested * Clang compilers are generally supported for Omnitrace but not Dyninst - + * `CMake `_ v3.16+ .. note:: @@ -139,7 +139,7 @@ Build requirements Required third-party packages ----------------------------------- -* `Dyninst `_ for dynamic or static instrumentation. +* `Dyninst `_ for dynamic or static instrumentation. Dyninst uses the following required and optional components. * `TBB `_ (required) @@ -155,7 +155,7 @@ during the Omnitrace build. The following list indicates the package, the versio the application that requires the package (for example, Omnitrace requires Dyninst while Dyninst requires TBB), and the CMake option to build the package alongside Omnitrace: -.. csv-table:: +.. csv-table:: :header: "Third-Party Library", "Minimum Version", "Required By", "CMake Option" :widths: 15, 10, 12, 40 @@ -182,13 +182,13 @@ Optional third-party packages * ``OMNITRACE_USE_MPI`` enables full MPI support * ``OMNITRACE_USE_MPI_HEADERS`` enables wrapping of the dynamically-linked MPI C function calls. - (By default, if Omnitrace cannot find an OpenMPI MPI distribution, it uses a local copy + (By default, if Omnitrace cannot find an OpenMPI MPI distribution, it uses a local copy of the OpenMPI ``mpi.h``.) -* Several optional third-party profiling tools supported by Timemory +* Several optional third-party profiling tools supported by Timemory (for example, `Caliper `_, `TAU `_, CrayPAT, and others) -.. csv-table:: +.. csv-table:: :header: "Third-Party Library", "CMake Enable Option", "CMake Build Option" :widths: 15, 45, 40 @@ -204,10 +204,10 @@ The easiest way to install Dyninst is alongside Omnitrace, but it can also be in Building Dyninst alongside Omnitrace ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -To install Dyninst alongside Omnitrace, configure Omnitrace with ``OMNITRACE_BUILD_DYNINST=ON``. +To install Dyninst alongside Omnitrace, configure Omnitrace with ``OMNITRACE_BUILD_DYNINST=ON``. Depending on the version of Ubuntu, the ``apt`` package manager might have current enough -versions of the Dyninst Boost, TBB, and LibIberty dependencies -(use ``apt-get install libtbb-dev libiberty-dev libboost-dev``). +versions of the Dyninst Boost, TBB, and LibIberty dependencies +(use ``apt-get install libtbb-dev libiberty-dev libboost-dev``). However, it is possible to request Dyninst to install its dependencies via ``DYNINST_BUILD_=ON``, as follows: @@ -216,7 +216,7 @@ its dependencies via ``DYNINST_BUILD_=ON``, as follows: git clone https://github.com/ROCm/omnitrace.git omnitrace-source cmake -B omnitrace-build -DOMNITRACE_BUILD_DYNINST=ON -DDYNINST_BUILD_{TBB,ELFUTILS,BOOST,LIBIBERTY}=ON omnitrace-source -where ``-DDYNINST_BUILD_{TBB,BOOST,ELFUTILS,LIBIBERTY}=ON`` is expanded by +where ``-DDYNINST_BUILD_{TBB,BOOST,ELFUTILS,LIBIBERTY}=ON`` is expanded by the shell to ``-DDYNINST_BUILD_TBB=ON -DDYNINST_BUILD_BOOST=ON ...`` Installing Dyninst via Spack @@ -237,19 +237,24 @@ Installing Dyninst via Spack Installing Omnitrace ----------------------------------- -Omnitrace has CMake configuration options for MPI support (``OMNITRACE_USE_MPI`` or -``OMNITRACE_USE_MPI_HEADERS``), HIP kernel tracing (``OMNITRACE_USE_ROCTRACER``), -ROCm device sampling (``OMNITRACE_USE_ROCM_SMI``), OpenMP-Tools (``OMNITRACE_USE_OMPT``), +Omnitrace has CMake configuration options for MPI support (``OMNITRACE_USE_MPI`` or +``OMNITRACE_USE_MPI_HEADERS``), HIP kernel tracing (``OMNITRACE_USE_ROCTRACER``), +ROCm device sampling (``OMNITRACE_USE_ROCM_SMI``), OpenMP-Tools (``OMNITRACE_USE_OMPT``), hardware counters via PAPI (``OMNITRACE_USE_PAPI``), among other features. -Various additional features can be enabled via the +Various additional features can be enabled via the ``TIMEMORY_USE_*`` `CMake options `_. -Any ``OMNITRACE_USE_`` option which has a corresponding ``TIMEMORY_USE_`` +Any ``OMNITRACE_USE_`` option which has a corresponding ``TIMEMORY_USE_`` option means that the Timemory support for this feature has been integrated -into Perfetto support for Omnitrace, for example, ``OMNITRACE_USE_PAPI=`` also configures +into Perfetto support for Omnitrace, for example, ``OMNITRACE_USE_PAPI=`` also configures ``TIMEMORY_USE_PAPI=``. This means the data that Timemory is able to collect via this package -is passed along to Perfetto and is displayed when the ``.proto`` file is visualized +is passed along to Perfetto and is displayed when the ``.proto`` file is visualized in `the Perfetto UI `_. +.. important:: + Perfetto validation is done with trace_processor v46.0 as there is a known issue with v47.0. + If you are experiencing problems viewing your trace in the latest version of `Perfetto `_, + then try using `Perfetto UI v46.0 `_. + .. code-block:: shell git clone https://github.com/ROCm/omnitrace.git omnitrace-source @@ -280,26 +285,26 @@ MPI support within Omnitrace ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Omnitrace can have full (``OMNITRACE_USE_MPI=ON``) or partial (``OMNITRACE_USE_MPI_HEADERS=ON``) MPI support. -The only difference between these two modes is whether or not the results collected +The only difference between these two modes is whether or not the results collected via Timemory and/or Perfetto can be aggregated into a single -output file during finalization. When full MPI support is enabled, combining the +output file during finalization. When full MPI support is enabled, combining the Timemory results always occurs, whereas combining the Perfetto results is configurable via the ``OMNITRACE_PERFETTO_COMBINE_TRACES`` setting. -The primary benefits of partial or full MPI support are the automatic wrapping +The primary benefits of partial or full MPI support are the automatic wrapping of MPI functions and the ability -to label output with suffixes which correspond to the ``MPI_COMM_WORLD`` rank ID +to label output with suffixes which correspond to the ``MPI_COMM_WORLD`` rank ID instead of having to use the system process identifier (i.e. ``PID``). -In general, it's recommended to use partial MPI support with the OpenMPI +In general, it's recommended to use partial MPI support with the OpenMPI headers as this is the most portable configuration. -If full MPI support is selected, make sure your target application is built +If full MPI support is selected, make sure your target application is built against the same MPI distribution as Omnitrace. For example, do not build Omnitrace with MPICH and use it on a target application built against OpenMPI. If partial support is selected, the reason the OpenMPI headers are recommended instead of the MPICH headers is -because the ``MPI_COMM_WORLD`` in OpenMPI is a pointer to ``ompi_communicator_t`` (8 bytes), -whereas ``MPI_COMM_WORLD`` in MPICH is an ``int`` (4 bytes). Building Omnitrace with partial MPI support +because the ``MPI_COMM_WORLD`` in OpenMPI is a pointer to ``ompi_communicator_t`` (8 bytes), +whereas ``MPI_COMM_WORLD`` in MPICH is an ``int`` (4 bytes). Building Omnitrace with partial MPI support and the MPICH headers and then using -Omnitrace on an application built against OpenMPI causes a segmentation fault. +Omnitrace on an application built against OpenMPI causes a segmentation fault. This happens because the value of the ``MPI_COMM_WORLD`` is truncated during the function wrapping before being passed along to the underlying MPI function. @@ -330,7 +335,7 @@ Alternatively, you can directly source the ``setup-env.sh`` script: Test the executables ----------------------------------- -Successful execution of these commands confirms that the installation does not have any +Successful execution of these commands confirms that the installation does not have any issues locating the installed libraries: .. code-block:: shell @@ -353,7 +358,7 @@ This section explains how to resolve certain issues that might happen when you f Issues with RHEL and SELinux ---------------------------------------------------- -RHEL (Red Hat Enterprise Linux) and related distributions of Linux automatically enable a security feature +RHEL (Red Hat Enterprise Linux) and related distributions of Linux automatically enable a security feature named SELinux (Security-Enhanced Linux) that prevents Omnitrace from running. This issue applies to any Linux distribution with SELinux installed, including RHEL, CentOS, Fedora, and Rocky Linux. The problem can happen with any GPU, or even without a GPU. @@ -367,7 +372,7 @@ run ``omnitrace-run`` with the instrumented program. omniperf-instrument -M sampling -o hello.instr -- ./hello omnitrace-run -- ./hello.instr -Instead of successfully running the binary with call-stack sampling, +Instead of successfully running the binary with call-stack sampling, Omnitrace crashes with a segmentation fault. .. note:: @@ -375,10 +380,10 @@ Omnitrace crashes with a segmentation fault. If you are physically logged in on the system (not using SSH or a remote connection), the operating system might display an SELinux pop-up warning in the notifications. -To workaround this problem, either disable SELinux or configure it to use a more +To workaround this problem, either disable SELinux or configure it to use a more permissive setting. -To avoid this problem for the duration of the current session, run this command +To avoid this problem for the duration of the current session, run this command from the shell: .. code-block:: shell @@ -386,25 +391,25 @@ from the shell: sudo setenforce 0 For a permanent workaround, edit the SELinux configuration file using the command -``sudo vim /etc/sysconfig/selinux`` and change the ``SELINUX`` setting to +``sudo vim /etc/sysconfig/selinux`` and change the ``SELINUX`` setting to either ``Permissive`` or ``Disabled``. .. note:: - Permanently changing the SELinux settings can have security implications. + Permanently changing the SELinux settings can have security implications. Ensure you review your system security settings before making any changes. Modifying RPATH details ---------------------------------------------------- -If you're experiencing problems loading your application with an instrumented library, -then you might have to check and modify the RPATH specified in your application. +If you're experiencing problems loading your application with an instrumented library, +then you might have to check and modify the RPATH specified in your application. See the section on `troubleshooting RPATHs <../how-to/instrumenting-rewriting-binary-application.html#rpath-troubleshooting>`_ for further details. Configuring PAPI to collect hardware counters ---------------------------------------------------- -To use PAPI to collect the majority of hardware counters, ensure -the ``/proc/sys/kernel/perf_event_paranoid`` setting has a value less than or equal to ``2``. +To use PAPI to collect the majority of hardware counters, ensure +the ``/proc/sys/kernel/perf_event_paranoid`` setting has a value less than or equal to ``2``. For more information, see the :ref:`omnitrace_papi_events` section. \ No newline at end of file diff --git a/docs/reference/development-guide.rst b/docs/reference/development-guide.rst index d04338ed..d2140199 100644 --- a/docs/reference/development-guide.rst +++ b/docs/reference/development-guide.rst @@ -25,7 +25,7 @@ The ``main`` routine of ``omnitrace-avail`` has three important sections: * Printing hardware counters omnitrace-sample: `source/bin/omnitrace-sample `_ -------------------------------------------------------------------------------------------------------------------------------- +---------------------------------------------------------------------------------------------------------------------------------- * Requires a command-line format of ``omnitrace-sample -- `` * Translates command-line options into environment variables @@ -33,7 +33,7 @@ omnitrace-sample: `source/bin/omnitrace-sample `` and a modified environment omnitrace-casual: `source/bin/omnitrace-causal `_ -------------------------------------------------------------------------------------------------------------------------------- +---------------------------------------------------------------------------------------------------------------------------------- When there is exactly one causal profiling configuration variant (which enables debugging), ``omnitrace-casual`` has a nearly identical design to ``omnitrace-sample`` @@ -46,7 +46,7 @@ the following actions take place for each variant: * the parent process waits for the child process to finish omnitrace-instrument: `source/bin/omnitrace-instrument `_ -------------------------------------------------------------------------------------------------------------------------------------------- +---------------------------------------------------------------------------------------------------------------------------------------------- * Requires a command-line format of ``omnitrace-instrument -- `` * Allows the user to provide options specifying whether to perform runtime instrumentation, use binary rewrite, or @@ -409,4 +409,4 @@ to this sequence: Eventually, the goal is to migrate all subsets of data collection which currently support more rudimentary models of time window constraints, such as process sampling and causal profiling, -to this model. \ No newline at end of file +to this model. diff --git a/docs/what-is-omnitrace.rst b/docs/what-is-omnitrace.rst index e2112688..4ad340b8 100644 --- a/docs/what-is-omnitrace.rst +++ b/docs/what-is-omnitrace.rst @@ -12,13 +12,18 @@ instrumentation, call-stack sampling, and various other features for determining which function and line number are currently executing. A visualization of the comprehensive Omnitrace results can be observed in any modern -web browser. Upload the Perfetto (``.proto``) output files produced by Omnitrace at +web browser. Upload the Perfetto (``.proto``) output files produced by Omnitrace at `ui.perfetto.dev `_ to see the details. -Aggregated high-level results are available as human-readable text files and -JSON files for programmatic analysis. The JSON output files are compatible with the +.. important:: + Perfetto validation is done with trace_processor v46.0 as there is a known issue with v47.0. + If you are experiencing problems viewing your trace in the latest version of `Perfetto `_, + then try using `Perfetto UI v46.0 `_. + +Aggregated high-level results are available as human-readable text files and +JSON files for programmatic analysis. The JSON output files are compatible with the `hatchet `_ Python package. Hatchet converts -the performance data into pandas data frames and facilitates multi-run comparisons, filtering, +the performance data into pandas data frames and facilitates multi-run comparisons, filtering, and visualization in Jupyter notebooks. To use Omnitrace for instrumentation, follow these two configuration steps: