gMSR: A multi-GPU algorithm to accelerate a massive validation of biclusters

https://www.mdpi.com/2079-9292/9/11/1782

Abstract

Biclustering is nowadays one of the most widely used machine learning techniques to discover local patterns in datasets from different areas such as energy consumption, marketing, social networks or bioinformatics, among them. Particularly in bioinformatics, Biclustering techniques have become extremely time-consuming, also being huge the number of results generated, due to the continuous increase in the size of the databases over the last few years. For this reason, validation techniques must be adapted to this new environment in order to help researchers focus their efforts on a specific subset of results in an efficient, fast and reliable way. The aforementioned situation may well be considered as Big Data context. In this sense, multiple machine learning techniques have been implemented by the application of GPU technology and CUDA architecture to accelerate the processing of large databases. However, as far as we know, this technology has not yet been applied to any bicluster validation technique. In this work, a multi-GPU version of one of the most used bicluster validation measure, MSR, is presented. It takes advantage of all the hardware and memory resources offered by GPU devices. Due to this, gMSR is able to validate a massive number of biclusters in any Biclustering-based study within a Big Data context.

Requirements

It is recommended to have installed at least:

CUDA 11.0 Toolkit (11.0.171).
NVIDIA Driver version: 450.36.06
Host compile executable: gcc and g++ on Linux, clang and clang++ on Mac OS X or cl.exe on Windows.

Compilation

Firstly, get the version of the compute capability (CC) of your NVIDIA graphics card: https://developer.nvidia.com/cuda-gpus
Go to src/CUDA folder.
Replace compute_61 and sm_61 to your compute capability version and the host compiler according to your operating system in the following commands.

Example of a compilation on GNU/Linux with a CC version of 6.1:

nvcc -G -g -O0 -std=c++11 -gencode arch=compute_61,code=sm_61  -odir "src" -M -o "src/CUDA/gMsr.d" "src/CUDA/gMsr.cu"
nvcc -G -g -O0 -std=c++11 --compile --relocatable-device-code=false -gencode arch=compute_61,code=compute_61 -gencode arch=compute_61,code=sm_61  -x cu -o  "src/CUDA/gMsr.o" "src/CUDA/gMsr.cu"
nvcc --cudart static --relocatable-device-code=false -gencode arch=compute_61,code=compute_61 -link -o "gMsr" ./src/CUDA/gMsr.o

Running the above commands generate an executable called gMSR.

Execution

1. Input parameters

biclustersFile (Character string): Absolute path of the input biclusters dataset file.
matrixFile (Character string): Absolute path of the input gene-expression matrix dataset file.
delta (integer number): Maximum MSR value allowed.
biclustersOutput (integer number): Number of biclusters to be included in the ordered list returned as result.
deviceCount (integer number): Number of GPU devices you want to use.
outputFile (Character string): Absolute path of the output file.

2. Execute

./gMsr [biclustersFile] [matrixFile] [delta] [biclustersOutput] [deviceCount][outputFile]

The following command is an execution example with the following properties:

The biclusters dataset is named bicDataset.csv
The gene expression matrix is called: geneMatrix.matrix.
It only takes into account those biclusters whose MSR value is less than 2000.
Build an ordered list with the 100 best biclusters according to their MSR.
This run will use two GPU devices in parallel.
The results will be stored in an output.csv file.

./gMsr /home/MyUser/Tests/bicDataset.csv /home/MyUser/Tests/geneMatrix.matrix 2000 100 2 /home/MyUser/Tests/output.csv

Output

The gMSR output for this example is at: https://github.com/aureliolfdez/gmsr/blob/master/resources/output.csv

Authors

Aurelio Lopez-Fernandez - DATAi Research Group (Pablo de Olavide University)
Domingo S. Rodriguez-Baena - DATAi Research Group (Pablo de Olavide University)
Francisco Gómez-Vela - DATAi Research Group (Pablo de Olavide University)

Contact

If you have comments or questions, or if you would like to contribute to the further development of gBiBit, please send us an email at [email protected]

License

This projected is licensed under the terms of the GNU General Public License v3.0.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
resources		resources
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gMSR: A multi-GPU algorithm to accelerate a massive validation of biclusters

Abstract

Requirements

Compilation

Execution

1. Input parameters

2. Execute

Output

Authors

Contact

License

About

Releases

Packages

Languages

License

aureliolfdez/gmsr

Folders and files

Latest commit

History

Repository files navigation

gMSR: A multi-GPU algorithm to accelerate a massive validation of biclusters

Abstract

Requirements

Compilation

Execution

1. Input parameters

2. Execute

Output

Authors

Contact

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages