Skip to content

VTuneFocusedConnector

Vivek Kale edited this page Jan 15, 2024 · 8 revisions

Tool Description

VTuneFocusedConnector works very similar to the VTuneConnector tool. The only difference is that it turns profiling via vtune off outside of frames. Note that the allowable frequency for this is not extremely high. It usually should only happen every few ms. Therefore this tool is often used in conjunction with the KernelFilter.

The tool is located at: https://github.com/kokkos/kokkos-tools/tree/develop/profiling/vtune-focused-connector

Compilation

The Makefile needs to know where VTune's home directory is. Other than that simply type "make" inside the source directory. When compiling for specific platforms modify the simple Makefile to use the correct compiler and link flags.

Usage

This is a standard tool which does not yet support tool chaining. Modify your VTune run environment to include:

KOKKOS_TOOLS_LIBS={PATH_TO_TOOL_DIRECTORY}/kp_vtune_focused_connector.so

When using it in conjunction with KernelFilter, you must provide a filter file:

export KOKKOSP_KERNEL_FILTER=kernels.lst
export KOKKOS_TOOLS_LIBS="{PATH_TO_TOOL_DIRECTORY}/kernel-filter/kp_kernel_timer.so;{PATH_TO_TOOL_DIRECTORY}/vtune-focused-connector/kp_vtune_focused_connector.so"
./application COMMANDS

You must put double quotes around the right-hand side of the export KOKKOS_TOOLS_LIBS=... expression. Otherwise, bash will interpret the semicolon as the end of the expression, and attempt to run the second .so file as an executable (likely resulting in a segmentation fault, as it's not a proper executable).

This tool's additional memory footprint is dwarfed by the memory usage of VTune during profiling.

Output

Switch to the domain/frame based view inside of VTune to analyze your applications kernel focused.

Example Output

Consider the following code:

#include<Kokkos_Core.hpp>

int main(int argc, char* argv[]) {
  Kokkos::initialize(argc,argv);
  int N = 100000000;
  
  Kokkos::View<double*> a("A",N);
  Kokkos::View<double*> b("B",N);
  Kokkos::View<double*> c("C",N);
  
  Kokkos::parallel_for(N, KOKKOS_LAMBDA (const int& i) {
    a(i) = 1.0*i;
    b(i) = 1.5*i;
    c(i) = 0.0;
  });
  
  double result = 0.0;
  for(int k = 0; k<50; k++) {
    
    Kokkos::parallel_for("AXPB", N, KOKKOS_LAMBDA (const int& i) {
      c(i) = 1.0*k*a(i) + b(i);
    });
    
    double dot;
    Kokkos::parallel_reduce("Dot", N, KOKKOS_LAMBDA (const int& i, double& lsum) {
      lsum += c(i)*c(i);
    },dot);
    result += dot;
  
  }
  printf("Result: %lf\n",result);
  Kokkos::finalize();
}

We now run VTune using with this tool and the KernelFilter using the following filter list:

AX(.*)
Dot

Here is a screenshot in VTune of the Bottom-up Frame/Domain view. The Kernel names are used for the domains, and individual calls with the same name are frames in that domain. Note how after an initial phase profiling halts. At the beginning of the pause the profiling tool is initialized and VTune profiling is turned off. It is turned on when the application enters either of the two matching kernels.

VTuneDomainFrame

Clone this wiki locally