Skip to content

Commit

Permalink
Update release/rocm-rel-6.2 for RC3 (#968) (#972)
Browse files Browse the repository at this point in the history
* Small doc update to remove restrictions no longer present (#917)

* Small doc update to remove restrictions no longer present

* Add calls to stop and wait for a debugger (#916)



* Small change to sample for clarity (#913)



* Added error log for query counter info (#903)

* Added error log for query counter info

* Add dimension query to counter collection sample (#918)



* Disable PC sampling service if counter collection service is configured (#899)

* The NULL value of an internal correlation ID defined (#901)

* Remove duplicate table code from tests (#922)

* Remove duplicate table code from tests

Remove duplicate HSA table code from tests. Cleanup
includes (and remove unnecessary ones).

* SWDEV-465322: Adding support for Perfcounter SIMD Mask in ATT (#910)

* SWDEV-465322: Adding support for r Perfcounter SIMD Mask in ATT

* Apply suggestions from code review




* Adding unit tests

* Adding counters check for gfx9 and SQ block only

* Addressing review comments

* changing the struct size

* fixing header includes

---------




* Fix for SLES/RHEL compilers (#925)

* Fix for SLES/RHEL compilers

---------



* Fix agent profiling for SQ counters (#919)

* Fix agent profiling for SQ counters


---------



* Disable counter collection if PC sampling is enabled (#924)

* docs and tests format (#927)

* ATT API changes - add user_data field and separation of dispatch vs agent profiling (#893)

* DRM Issue Fix for SLES 15 (#897)

* DRM Issue Fix

* Formatting Fix

* PC sampling: CID manager unit test (#898)

* Adding per-dispatch userdata field to ATT

* Clang tidy

* Formatting

* Update source/lib/rocprofiler-sdk/hsa/aql_packet.hpp



* Adding dispatch_id, fixing user_data and update aql_profile_v2

* Formatting

* Tidy fixes

* Second fix for userdata

* removing assert for union

* Adding serialization. Created agent profiling-like thread trace

* Implemented agent thread trace

* Update source/lib/rocprofiler-sdk/hsa/aql_packet.hpp



* Restructured thread trace packets

* Added agent API tests

* Fixing multigpu for agent test

* Formatting

* Formatting

* Improving header locations

* Fixing merge conflicts

* Tidy

* Tidy

* Tidy

---------




* Allow multiple agents in a single context for agent profiling (#908)

Allow multiple profiles for agent profiling



* Remove unnecessary AgentCache argument from profile construction (#931)

This argument is not necessary. Removed.



* Update controller.cpp (#932)

* Update controller.cpp

* Update controller.cpp

* Formatting

* Pumping down the ioctl version for CI only (#928)

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Replicate global counters across all derived counters (#936)

Fix derived counters to have globals replicated across all architectures (that support them).
---------



* Incremental Counter Profile Creation (#933)

* Incremental Counter Profile Creation

Adds support for incremental counter creation. How this functions is the
behavior of rocprofiler_create_profile_config has been changed.

rocprofiler_create_profile_config(rocprofiler_agent_id_t           agent_id,
                                  rocprofiler_counter_id_t*        counters_list,
                                  size_t                           counters_count,
                                  rocprofiler_profile_config_id_t* config_id)

The behavior of this function now allows an existing config_id to be
supplied via config_id. The counters contained in this config will be
copied over and used as a base for a new config along with any counters
supplied in counters_list. The new config id is returned via config_id
and can be used in future dispatch/agent counting sessions.

A new config is created over modifying an existing config since there
is no gaurentee that the existing config isn't already in use. While we
could add locks (or other mutual exclusion properties) to check if its
in use and reject an update, the benefit from doing so is minor in
comparison to just creating a new config. This also side steps a common
pattern a tool may use to add additional counters at some point later on
during execution. Now they can do that without destroying the existing
config.

---------




* PC Sampling IOCTL version check introduced (#944)

* doc update for 6.2 release (#938)

* doc update for 6.2 release

* Adding warning for gerrit->github nightly sync

* PC sampling IOCTL versioning refactored (#945)

The following changes are introduced:
- Use functions instead of macros.
- Verify the error code when querying KFD IOCTL version.
- Skip tests and samples if KFD IOCTL < 1.16 or PC Sampling IOCTL < 0.1.

* Add HSA tracing support for `hsa_amd_vmem_address_reserve_align` (#946)

* Add support for hsa_amd_vmem_address_reserve_align

* Update lib/rocprofiler-sdk/hsa/types.hpp

- support HSA_AMD_EXT_API_TABLE_STEP_VERSION == 0x2 for HSA v1.14.0

---------



* readthedocs updates (#877)

* readthedocs updates

* Adding License

* correcting table of contents path

* Move doc requirements to sphinx dir

* Compile requirements.txt

* Update path to reqs

* Adding missing python module

* changing sphinx version

* changing docutils version

* enabling sphinx extensions

* trying sphinx-rtd-theme

* Remove unused doc configs

* Remove unused html theme options

* Add files to toc

* temp commit to test

* updating environment.yml for CI build

* Update doc requirements

To include rocprofiler-sdk in projects.yaml

* Set external_projects_current_project as rocprofiler-sdk

* Exclude external projects

* Fix warning for missing static path

* updating conf.py

* Removing reST syntax

* Use rocm-docs-core doxygen integration

* Remove RST syntax from Markdown files

* Generate doxyfile post checkout on RTD

* Use custom RTD env

* Specify mambaforge

* Put conda before post checkout cmd

* Add doxyfile for RTD

* Run cmake from conf.py

* Update environment.yml

* Use mambaforge

* Fix path to environment.yml

* Call build doxyfile

* Add Developer API title to Doxyfile

* Config version header

* Fix typo in conf.py

* Format fix for conf.py

* Increasing timeout for build-docs-from-source

* Remove README as mainpage for doxyfile

* Fix formatting in conf.py

---------



* Fixing OpenSuse build (#947)

* Fix documentation (#949)

* Sync queue and async copy on client finalizer (#950)

* Add `logical_node_type_id` field to `rocprofiler_agent_t` (#948)

* Add logical_node_type_id field to rocprofiler_agent_t

* Patch queue_controller

* Remove fatal error when callback and buffer tracing API in one context (#952)

- one context for callback and buffer tracing of same API produces erroneous fatal error -- this is a valid use case

* Adding wrappers on HSA for executable load/unload and allowing multiple agents per context on ATT (#951)

* Codeobj wrappers around HSA calls for ATT

* Formatting

* Bookeeping

* Tidy

* Tidy

* Update source/lib/rocprofiler-sdk/thread_trace/code_object.hpp



* Update source/lib/rocprofiler-sdk/thread_trace/att_core.hpp



* Variable naming

---------



* Removing cache of decoded lines and returning shared_ptr (#953)

* Update continuous_integration.yml (#926)

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

---------



* Accumulation metrics support and update counter collection API to aqlprofile_v2 (#915)

* Updating to v3 API

* General fixes

* Extending dimension bits to 54

* Disabling agent profiling tests

* Fixed unit test

* Adding accumulate metric support for parsing counters (#609)

* Adding accumulate metric support for parsing counters

* Adding metric flag

* Updating tests

* source formatting (clang-format v11) (#610)



* source formatting (clang-format v11) (#614)



* Adding evaluate ast test

* source formatting (clang-format v11) (#633)



* Update scanner generated file

* Adding flags to events for aqlprofile

* Fix Mi200 failing test

---------





* Revert "Extending dimension bits to 54"

This reverts commit 3cd6628452484044a93e129f27974f996a0e4c08.

* Removing CU dimension

* Fixing merge conflicts

* Revert "Disabling agent profiling tests"

This reverts commit 7e01518ed8c51fbb0c3b2575e1e0b8f9ddfa8237.

* Fixing merge conflicts

* Fix parser tests

* Adding accumulate metric documentation

* Update counter_collection_services.md

* Update index.md

* fix nested expression use

* Update source/lib/rocprofiler-sdk/counters/evaluate_ast.cpp



* Doc update

---------








* Fix kernel trace gaps (#961)

- source/lib/rocprofiler-sdk/hsa/queue.cpp
  - Optimize WriteInterceptor to eliminate extra barrier packets causing gaps between kernels in kernel tracing
  - increase timeout_hint in hsa_signal_wait in set_profiler_active_on_queue
  - misc logging improvements
- source/lib/rocprofiler-sdk/counters/agent_profiling.cpp
  - increase timeout_hint in hsa_signal_wait in set_profiler_active_on_queue
- tests/rocprofv3/hsa-queue-dependency/CMakeLists.txt
  - add TIMEOUT for rocprofv3-test-hsa-multiqueue-execute

* PC sampling: integration test with instruction decoding (#929)

* PC sampling: integration test with instruction decoding

* PC sampling: verifying internal and external CIDs

The PC sampling integration test has been extended
to verify internal and external correlation IDs.

* tmp solution of using Instructions as keys

* wrapper for HIP call

* PCS integration test: ld_addr as instruction id

For the sake of the integration test, use  as the
instruction identifier. To support code object unloading
and relocations, use  as the identifier
(the change in the decoder is required).

* PCS integration test: removing shared_ptr

Completely removing usage of shared pointers.

* PCS integration test: removing decoder

When a code object has been unloaded, ensure all PC samples
corresponding to that object are decoded, prior to removing
the decoder.

* PCS integration test: fixing build flags and imports

* PCS integration test: fixing labels

* PCS integration test: cmake flags fix

* PC sampling cmake labels renamed

* PCS integration test refactoring

* PCS integration test: minimize usage of raw pointers

* PCS integration test: at least one sample should be delivered.

* PC sampling lables: pc-sampling

* General fixes to ATT, packets and event ID retrieval (#960)

* General fixes to ATT, packets and event ID retrieval

* Update source/lib/rocprofiler-sdk/hsa/aql_packet.hpp



---------



* Returning code object id information in code_printing.cpp:Instruction (#965)

* Returning code object id information in code_printing.cpp:Instruction

* Adding assertions

* Simplifying decoder library

* Miscellaneous updates (#959)

- missing-new-line CI job: ensures all source files end with new line
- logging updates
- add new line to the end of many files
- fix header include ordering is misc places
- transition to use hsa::get_core_table() and hsa::get_amd_ext_table() in various places instead of making copies

* Update HIP API tracing (#958)

- support HipDispatchTable additions for HIP_RUNTIME_API_TABLE_STEP_VERSION 1 thru 4

* Fix agent shutdown destructor errors (#969)

* Update lib/rocprofiler-sdk/agent.cpp

- use static_object wrapper for vector of agent_pair (rocp agent <-> hsa agent)

* Fix get_aql_handles() shutdown error

- use `static_object` wrapper for vector of `aqlprofile_agent_handle_t`

---------

Co-authored-by: Jonathan R. Madsen <[email protected]>
Co-authored-by: Benjamin Welton <[email protected]>
Co-authored-by: Benjamin Welton <[email protected]>
Co-authored-by: Manjunath P Jakaraddi <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Gopesh Bhardwaj <[email protected]>
Co-authored-by: Giovanni Lenzi Baraldi <[email protected]>
Co-authored-by: Ammar ELWazir <[email protected]>
Co-authored-by: Sam Wu <[email protected]>
Co-authored-by: Manjunath-Jakaraddi <[email protected]>
Co-authored-by: jrmadsen <[email protected]>
Co-authored-by: Manjunath-Jakaraddi <[email protected]>
  • Loading branch information
13 people authored Jul 12, 2024
1 parent 2f3a8b0 commit b2b99d4
Show file tree
Hide file tree
Showing 192 changed files with 8,867 additions and 2,911 deletions.
36 changes: 34 additions & 2 deletions .github/workflows/continuous_integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,15 @@ env:
ROCM_PATH: "/opt/rocm"
GPU_TARGETS: "gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942 gfx1030 gfx1100 gfx1101 gfx1102"
PATH: "/usr/bin:$PATH"
PC_SAMPLING_TESTS_REGEX: ".*pc_sampling.*"
PC_SAMPLING_TESTS_REGEX: ".*pc-sampling.*"

jobs:
core:
# See: https://docs.github.com/en/free-pro-team@latest/actions/learn-github-actions/managing-complex-workflows#using-a-build-matrix
strategy:
fail-fast: false
matrix:
runner: ['navi3', 'vega20', 'mi200', 'mi300']
runner: ['navi3', 'vega20', 'mi200', 'mi300', 'rhel', 'sles']
os: ['ubuntu-22.04']
build-type: ['RelWithDebInfo']
ci-flags: ['--linter clang-tidy']
Expand All @@ -45,6 +45,7 @@ jobs:
- uses: actions/checkout@v4

- name: Install requirements
if: ${{ !contains(matrix.runner, 'rhel') && !contains(matrix.runner, 'sles') }}
timeout-minutes: 10
shell: bash
run: |
Expand All @@ -55,6 +56,13 @@ jobs:
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 20 --slave /usr/bin/g++ g++ /usr/bin/g++-12 --slave /usr/bin/gcov gcov /usr/bin/gcov-12
python3 -m pip install -r requirements.txt
- name: Install requirements For RHEL & SLES
if: ${{ contains(matrix.runner, 'rhel') || contains(matrix.runner, 'sles') }}
timeout-minutes: 10
shell: bash
run: |
python3 -m pip install -r requirements.txt
- name: List Files
shell: bash
run: |
Expand All @@ -77,6 +85,7 @@ jobs:
echo 'ROCPROFILER_PC_SAMPLING_BETA_ENABLED=1' >> $GITHUB_ENV
- name: Configure, Build, and Test
if: ${{ !contains(matrix.runner, 'rhel') && !contains(matrix.runner, 'sles') }}
timeout-minutes: 30
shell: bash
run:
Expand All @@ -98,6 +107,29 @@ jobs:
--
-LE "${EXCLUDED_TESTS}"

- name: Configure, Build, and Test
if: ${{ contains(matrix.runner, 'rhel') || contains(matrix.runner, 'sles') }}
timeout-minutes: 30
shell: bash
run:
sudo LD_LIBRARY_PATH=./build/lib:$LD_LIBRARY_PATH python3 ./source/scripts/run-ci.py -B build
--name ${{ github.repository }}-${{ github.ref_name }}-${{ matrix.runner }}-mi300-core
--build-jobs 16
--site $(echo $RUNNER_HOSTNAME)-$(/opt/rocm/bin/rocm_agent_enumerator | sed -n '2 p')
--gpu-targets ${{ env.GPU_TARGETS }}
--run-attempt ${{ github.run_attempt }}
--
-DROCPROFILER_DEP_ROCMCORE=ON
-DROCPROFILER_BUILD_DOCS=OFF
-DROCPROFILER_BUILD_CI=OFF
-DCMAKE_BUILD_TYPE=${{ matrix.build-type }}
-DCMAKE_INSTALL_PREFIX=/opt/rocprofiler-sdk
-DCPACK_GENERATOR='DEB;RPM;TGZ'
-DCPACK_PACKAGING_INSTALL_PREFIX="$(realpath /opt/rocm)"
-DPython3_EXECUTABLE=$(which python3)
--
-LE "${EXCLUDED_TESTS}"

- name: Install
if: ${{ contains(matrix.runner, env.CORE_EXT_RUNNER) }}
timeout-minutes: 10
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ jobs:
python3 -m pip install -r requirements.txt
- name: Configure, Build, Install, and Package
timeout-minutes: 30
timeout-minutes: 60
shell: bash
run:
export CMAKE_PREFIX_PATH=/opt/rocm:${CMAKE_PREFIX_PATH};
Expand Down
18 changes: 18 additions & 0 deletions .github/workflows/formatting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -143,3 +143,21 @@ jobs:
command: review
pull_number: ${{ github.event.pull_request.number }}
git_dir: '.'

missing-new-line:
runs-on: ubuntu-22.04

steps:
- uses: actions/checkout@v4

- name: Find missing new line
shell: bash
run: |
OUTFILE=missing_newline.txt
for i in $(find source/lib source/include tests samples cmake -type f | egrep -v '\.bin$'); do VAL=$(tail -c 1 ${i}); if [ -n "${VAL}" ]; then echo "- ${i}" >> ${OUTFILE}; fi; done
if [[ -f ${OUTFILE} && $(cat ${OUTFILE} | wc -l) -gt 0 ]]; then
echo -e "\nError! Source code missing new line at end of file...\n"
echo -e "\nFiles:\n"
cat ${OUTFILE}
exit 1
fi
21 changes: 21 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

version: 2

sphinx:
configuration: source/docs/conf.py

formats: [htmlzip, pdf, epub]

python:
install:
- requirements: source/docs/sphinx/requirements.txt

build:
os: ubuntu-22.04
tools:
python: "mambaforge-22.9"

conda:
environment: source/docs/environment.yml
19 changes: 9 additions & 10 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Full documentation for ROCprofiler-SDK is available at [Click Here](source/docs/

## ROCprofiler-SDK for AFAR I

## Added
## Additions

- HSA API Tracing
- Kernel Dispatch Tracing
Expand All @@ -14,7 +14,7 @@ Full documentation for ROCprofiler-SDK is available at [Click Here](source/docs/

## ROCprofiler-SDK for AFAR II

## Added
## Additions

- HIP API Tracing
- ROCTx Tracing
Expand All @@ -23,10 +23,9 @@ Full documentation for ROCprofiler-SDK is available at [Click Here](source/docs/
- ROCTx start/stop
- Memory Copy Tracing


## ROCprofiler-SDK for AFAR III

## Added
## Additions

- Kernel Dispatch Counter Collection – (includes serialization and multidimensional instances)
- Kernel serialization
Expand All @@ -44,7 +43,7 @@ Full documentation for ROCprofiler-SDK is available at [Click Here](source/docs/

## ROCprofiler-SDK for AFAR IV

## Added
## Additions

- Page Migration Reporting (API)
- Scratch Memory Reporting (API)
Expand All @@ -56,22 +55,22 @@ Full documentation for ROCprofiler-SDK is available at [Click Here](source/docs/

## ROCprofiler-SDK for AFAR V

## Added
## Additions

- Agent/Device Counter Collection (API)
- JSON output format support (tool)
- Single JSON output format support (tool)
- Perfetto output format support(.pftrace) (tool)
- Input YAML support for counter collection (tool)
- Input JSON support for counter collection (tool)
- Application Replay (Counter collection)
- PC Sampling (Beta)(API)
- ROCProf V3 Multi-GPU Support:
- Merged files
- Multi-process (multiple files)

## Fixed
## Fixes

- SQ_ACCUM_PREV and SQ_ACCUM_PREV_HIRE overwriting issue

## Changed
## Changes

- rocprofv3 tool now needs `--` in front of application. For detailed uses, please [Click Here](source/docs/rocprofv3.md)
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,11 @@
# ROCprofiler-SDK: Application Profiling, Tracing, and Performance Analysis

***
Note: rocprofiler-sdk is currently `not` supported as part of the public ROCm software stack and is only distributed as a beta
release to customers.
***
> [!NOTE]
Note: rocprofiler-sdk is currently considered a beta version and is subject to change in future releases

## Overview

ROCProfiler-SDK is AMD’s new and improved tooling infrastructure, providing a hardware-specific low-level performance analysis interface for profiling and tracing GPU compute applications. To see what's changed [Click Here](source/docs/about.md)
ROCProfiler-SDK is AMD’s new and improved tooling infrastructure, providing a hardware-specific low-level performance analysis interface for profiling and tracing GPU compute applications. To see what's changed [Click Here](source/docs/index.md)

## GPU Metrics

Expand Down Expand Up @@ -57,7 +55,7 @@ To install ROCprofiler, run:
cmake --build rocprofiler-sdk-build --target install
```

Please see the detailed section on build and installation here: [Click Here](/source/docs/installation.md)
Please see the detailed section on build and installation here: [Click Here](source/docs/installation.md)

## Support

Expand All @@ -80,3 +78,6 @@ Please report in the Github Issues.
- Timestamps in PC sampling records might not be 100% accurate.

- Using PC sampling on multi-threaded applications might fail with `HSA_STATUS_ERROR_EXCEPTION`.Furthermore, if three or more threads launch operations to the same agent, and if PC sampling is enabled, the `HSA_STATUS_ERROR_EXCEPTION` might appear.

> [!WARNING]
> The latest mainline version of AQLprofile can be found at [https://repo.radeon.com/rocm/misc/aqlprofile/](https://repo.radeon.com/rocm/misc/aqlprofile/). However, it's important to note that updates to the public AQLProfile may not occur as frequently as updates to the rocprofiler-sdk. This discrepancy could lead to a potential mismatch between the AQLprofile binary and the rocprofiler-sdk source.
Loading

0 comments on commit b2b99d4

Please sign in to comment.