Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NVTX support via DrHook #23

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

pmarguinaud
Copy link
Contributor

@pmarguinaud pmarguinaud commented May 31, 2024

This PR enables NVTX instrumentation using DrHook (PGI only). This is enabled with the two following environment variables :

export DR_HOOK=1
export DR_NVTX=1

This code has been provided courtesy of Louis Stuber (NVIDIA) and Lukas Mosimann (NVIDIA).

Example of an ARPEGE nsys profile with NVTX enabled :

nvtx-color

@wdeconinck
Copy link
Collaborator

Hi Philippe thanks for pushing this, looks like a great feature!!!

A comment perhaps is to rename the module names and subroutines (C,Fortran) with prefix "my..." to something more unambiguous like "fiat_", or better "drhook_" ?

Another suggestion is to keep this feature optional even when using the NVIDIA compiler:

ecbuild_add_option( DR_HOOK_NVTX  ... )
# or instead : ecbuild_add_option( NVTX  ... )

The DEFAULT could be ON for this option when NVIDIA compiler is detected if you wish. The nvtx files should then be compiled only when this feature is ON (HAVE_DR_HOOK_NVTX or HAVE_NVTX).
Then add the definition:

if( HAVE_DR_HOOK_NVTX OR HAVE_NVTX )
   list( APPEND FIAT_DEFINITIONS DR_HOOK_NVTX )
endif()

And guard the specific code that is currently guarded with __PGI with DR_HOOK_NVTX

A slightly larger refactoring, but useful, however is that drhook is also to be used from C code.
This means not to call from dr_hook_util.F90, but from within drhook.c
Then we should be enabling this via the environment variable DR_HOOK_OPT=NVTX,...,... which is currently used to enable/disable DR_HOOK options.
The Fortran drivers for nvtx are then perhaps not even be required, unless you want to expose this outside of DR_HOOK as well.

@ioanhadade , @marsdeno could you also review please?

@pmarguinaud
Copy link
Contributor Author

Hello Willem,

I followed your advice and renamed my* to dr_nvtx, then added the DR_NVTX option and I now use the HAVE_DR_NVTX macro everywhere instead of __PGI.

All of this does not work with nvhpc/21.9 (I tried it on our cluster), but it works with nvhpc/24.3; I tried changing that in ./.github/tools/install-nvhpc.sh, but github tests fail with "No space left on device".

What should I do ?

@reuterbal
Copy link

I have seen a similar issue with nvhpc in Github actions. For CLOUDSC we use this "hack":
https://github.com/ecmwf-ifs/dwarf-p-cloudsc/blob/95125c267e5baed113ef7671c9f346979bd84029/.github/workflows/build.yml#L110-L127

@pmarguinaud
Copy link
Contributor Author

I followed Balthasar advice, but it did not work. So CI is failing for nvhpc builds and I do not know what to do about it.

@wdeconinck
Copy link
Collaborator

wdeconinck commented Jun 11, 2024

I followed Balthasar advice, but it did not work. So CI is failing for nvhpc builds and I do not know what to do about it.

@pmarguinaud I will have a look at making this work. In the mean time, please, could you look into getting nvtx to work via drhook.c ? We could then e.g. use DR_HOOK_OPT=NVTX also for C/C++ programs using DR_HOOK.
And also a unit-test would be great, at least to catch catastrophic errors.

@wdeconinck wdeconinck changed the base branch from main to develop June 11, 2024 07:51
@wdeconinck
Copy link
Collaborator

Hi @pmarguinaud , in the mean time I have updated the develop branch to use nvhpc-24.3 in github actions. It also sets "NVHPC_ROOT" environment variable which you need for the find_package(NVHPC) to work.

You can now rebase this branch, minus the added GitHub actions changes, on develop.

@pmarguinaud
Copy link
Contributor Author

pmarguinaud commented Jun 12, 2024

I have merged develop and added a test case for NVTX.

Instrumenting C code is not a priority for me, but you can do it yourself if it is important for you.

@wdeconinck
Copy link
Collaborator

Hi @Andrew-Beggs-ECMWF discussing offline with @ioanhadade, @ioanhadade suggested for you to have a look at the possibility of integrating this within drhook itself.

ilast++;
stack[ilast] = elem;
elem->calls ++;
if (elem->calls >= 11 && elem->elapsed < 0.0001)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pmarguinaud, I've been doing some work on this patch to integrate it into the core of drhook so that it's faster, works with all support drhook languages, and is easier for us to maintain.

This condition is giving me a bit of confusion though. My understanding is that it's meant to eliminate high call but low cumulative runtime drhook regions from the nvtx trace, so as to reduce noise in the trace. However, the way it's implemented (particularly lines 72-75) means that once you go over 10 calls, it will always be skipped. This also adds a fun situation where the following recursive code won't close any nvtx regions it opens:

void foo(void) {
    static int calls = 0;
    calls++;
    
    int skipstart = dr_hook_nvtx_map_start("foo");
    std::cout<<"In  call "<<calls<<" Skipped? "<<!skipstart<<std::endl;
    
    
    if (calls < 11) {
        foo();
    }
    
    int skipend = dr_hook_nvtx_map_stop();
    std::cout<<"Out call "<<calls<<" Skipped? "<<!skipend<<std::endl;
    
    calls--;
    return;
}

This gave the following output:

In  call 1 Skipped? 0
In  call 2 Skipped? 0
In  call 3 Skipped? 0
In  call 4 Skipped? 0
In  call 5 Skipped? 0
In  call 6 Skipped? 0
In  call 7 Skipped? 0
In  call 8 Skipped? 0
In  call 9 Skipped? 0
In  call 10 Skipped? 0
In  call 11 Skipped? 1
Out call 11 Skipped? 1
Out call 10 Skipped? 1
Out call 9 Skipped? 1
Out call 8 Skipped? 1
Out call 7 Skipped? 1
Out call 6 Skipped? 1
Out call 5 Skipped? 1
Out call 4 Skipped? 1
Out call 3 Skipped? 1
Out call 2 Skipped? 1
Out call 1 Skipped? 1

Am I correct in thinking this is not the intended behaviour and I should attempt to fix it in my patch?

@Andrew-Beggs-ECMWF
Copy link

I've now finished my patch and have it as a branch on my own fork of fiat. Do we want to have it merged into pmarguinaud's branch (which should then make it part of this pr once it's merged) or start a new pr directly to ecmwf-ifs:develop?

The outcome is the same, just a matter of how it's merged : )

@wdeconinck
Copy link
Collaborator

Let's make a new PR for it please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants