Skip to content

Jaccard weight calculation with an ML-based work distribution model.

Notifications You must be signed in to change notification settings

SU-HPC/Jaccard-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jaccard-ML

Degree-aware kernels for calculating Jaccard Weights, and a script demonstrating the effectiveness of using machine learning for tuning the work distribution on the GPU on calculation efficiency.

In addition to the degree-aware kernels, the repository contains a collection of Jaccard Weight calculation kernels on the CPU and on the GPU.

Jaccard Weights Calculation Executable

Compilation

Compiling the CUDA code is done using CMake.

mkdir build
cd build
cmake ..
make

This will only compile the binning-based kernels (degree-aware kernels), and will assume that graphs are undirected. However, you can modify this behavior using CMake options. For instance, to add to the compiled kernels the CPU kernel, and to tell the executable to assume that graphs are directed, we would do the following:

mkdir build
cd build
cmake .. -D_CPU=ON -D_DIRECTED=ON
make

The following are the available options with their default values:

//Run binning experiment
_BINNING=ON

//Use CPU kernels
_CPU=ON

//use cuGraph
_CUGRAPH=OFF

//Treat graphs as directed
_DIRECTED=OFF

//use dongarra
_DONGARRA=OFF

//Use GPU kernels
_GPU=ON

//use inhouse cuGraph
_INHOUSE_CUGRAPH=OFF

//Simple GPU kernel
_SIMPLE_GPU=OFF

//Edge-based GPU kernel
_SIMPLE_GPU_EDGE=OFF

Usage

After compiling the code, the executable will be located in the build directory with the name jaccard. The executable runs many experiments and reports their timings.

The executable takes two required parameters and three optional parameters.

-i: input graph edge list file (required)

Path to the input graph edge list file. Each line in the file comprises an edge in the graph. It must contain two vertex IDs seperated by whitespace. Example:

0 1
2 1
0 3
4 1

-e: JSON file containing binning experiment parameters (required)

Path to the JSON file containing the experimentation parameters used with the binning kernels. A sample JSON file is given in parameters/experiment.json. Each file must contain a list called ranges containing the edge boundaries at which bins are seperated. In addition, it will contain an object for each different binning kernel that will be executed. In each one of these objects, lists g and a must be given, which define the number of threads calculating a single Jaccard weight value (search group size), and the number of thread teams calculating Jaccard weights for a single vertex (search assembly size; number of search groups for each vertex), respectively.

-a: number of averages

The number of times each experiment is run. The output timing of each experiment is the average of all its runs.

--o: output JSON file containing the experiment timing results

A JSON file containing the timing results of all the experiments executed.

Important note: the actual Jaccard Weights are stored in the file {INPUT_GRAPH}.corr.bin and it is presented as a binary array of values. The ith element in the array is the Jaccard Weight of the ith edge in the CSR representation of the graph.

Machine Learning model

We supply a python script and an iPython notebook that train and evaluate a random forest machine learning model to predict execution parameters. These scripts are only used for evaluating the quality of the ML model, and don't actually interface with the CUDA executable described in the previous section.

Usage

Both the Python script and the iPython notebook are used in exactly the same way:

  1. Place a set of JSON outputs generated by the CUDA executable in a folder; these experiments will act as the training set for the model.
  2. Place a second set of JSON outputs generated by the CUDA executable in a different folder; these experiments will be the test set.
  3. Change the values of the variables train_folder and test_folder in the script/notebook to reflect the locations of these folders.
  4. Execute the scripts and observe the results. It will train the model using the training set, and predict the correct execution parameters for the test set. It will report the best time possible for each graph, the time it would take that same graph using the predicted parameters, and the loss in speed in %.

Acknowledgements

We use the argument parsing code written by Jesse Laning located here under the Apache-2.0 license.

About

Jaccard weight calculation with an ML-based work distribution model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages