Parallel Kernel Execution on GPUs

Final Project for CS259 Spring 2019

Matt and I evaluated and wrote a paper on the modern performance of executing parallel kernels on GPUs compared to the golden standard of batching jobs together. The report is in the report directory, and an abstract of the report is as follows:

While Moore’s law may be slowing down, GPU’s continue to get considerably better with each generation. In 2010, Nvidia began to support executing multiple kernels at once (concurrently), initially allowing only 4-16 kernels to be executed concurrently but increasing it to 128 today. Kernels and machine learning problems that used to take up all the resources of a GPU several years ago now only take a fraction of compute power. In this report, we specifically focus on comparing the concurrent execution of kernels to their sequential counterparts to investigate whether the overhead of launching kernels in parallel defeats any performance gains from the concurrency we are exploiting. We also compare these kernels to their batched versions, the industry norm for maximizing performance. The results that we obtained suggest that concurrent kernel execution can increase performance anywhere from 1.25x to 15x, with smaller problems seeing larger performance gains. For smaller problems, batched kernels beat the performance of concurrent kernels greatly, but for large problems, the performances are approximately the same with concurrent kernels actually being about 5% to 10% faster. Our findings suggest that if there is a level of parallelism that can be added, such as in the classifier where not all dimensions of threads are used, concurrent kernels can provide performance on par or even better than batched kernels. Combined with the the paper of Jiao et al. where concurrent kernels provided almost 34.5% better energy efficiency, perhaps these may be the future of running many kernels [4]. Furthermore, concurrent kernel execution may be used over batching is if different types of problems were mixed together, as batching requires all the batches to be of the same size/type, but heterogeneous kernels can be concurrently executed.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
graphs		graphs
report		report
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
batched-class.sh		batched-class.sh
batched-classifier.csv		batched-classifier.csv
batched-classifier.cu		batched-classifier.cu
batched-conv.csv		batched-conv.csv
batched-conv.sh		batched-conv.sh
batched-convolution.cu		batched-convolution.cu
class-batched		class-batched
class-blocked		class-blocked
class1		class1
class1c		class1c
classifier.cu		classifier.cu
conc-batch-res-class.csv		conc-batch-res-class.csv
conc-batch-res.csv		conc-batch-res.csv
conc2-batch-res.csv		conc2-batch-res.csv
conv-batched		conv-batched
conv1		conv1
conv1c		conv1c
conv2		conv2
convolution.cu		convolution.cu
convolution2.cu		convolution2.cu
dnn.hpp		dnn.hpp
m60_info.txt		m60_info.txt
make-conv		make-conv
max-conv2.sh		max-conv2.sh
opt-class-batched		opt-class-batched
opt-class-blocked		opt-class-blocked
opt-class1		opt-class1
opt-class1c		opt-class1c
opt-conv-batched		opt-conv-batched
opt-conv1		opt-conv1
opt-conv1c		opt-conv1c
opt-conv2		opt-conv2
opt-conv2c		opt-conv2c
perf-maximize-class.sh		perf-maximize-class.sh
perf-maximize-conv.sh		perf-maximize-conv.sh
seq-batch-res-class.csv		seq-batch-res-class.csv
seq-batch-res.csv		seq-batch-res.csv
seq2-batch-res.csv		seq2-batch-res.csv
tesla-batched-classifier.csv		tesla-batched-classifier.csv
tesla-batched-conv.csv		tesla-batched-conv.csv
tesla-batched-conv2.csv		tesla-batched-conv2.csv
tesla-conc-batch-res-class.csv		tesla-conc-batch-res-class.csv
tesla-conc-batch-res.csv		tesla-conc-batch-res.csv
tesla-conc-res-class-notiling.csv		tesla-conc-res-class-notiling.csv
tesla-conc2-batch-res.csv		tesla-conc2-batch-res.csv
tesla-seq-batch-res-class.csv		tesla-seq-batch-res-class.csv
tesla-seq-batch-res.csv		tesla-seq-batch-res.csv
tesla-seq-res-class-notiling.csv		tesla-seq-res-class-notiling.csv
tesla-seq2-batch-res.csv		tesla-seq2-batch-res.csv
test1		test1
watch-gpu.sh		watch-gpu.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel Kernel Execution on GPUs

About

Releases

Packages

Contributors 2

Languages

sahilmgandhi/gpu-parallel-kernel-execution

Folders and files

Latest commit

History

Repository files navigation

Parallel Kernel Execution on GPUs

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages