In this repo, we will show the code about how to optimize sgemm in single-thread ARM cpu, mutli-threads ARM cpu and Nvidia gpu.
In each subdirectory, use make
to compile the program. And there will be a benchmark executable program to test the gemm. You can read the makefile files for detail.