-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Intel MKL dependency #16
Comments
Thanks for your work on this! I have not had a chance to look at this in detail yet, but I can say this is not redundant with current efforts. I'll check back when I've had a closer look, but I look forward to seeing this as a pull request once it's polished. |
Intel MKL cannot be used on some kind liunx. |
In src/caffe/util/math_functions.cpp line 289
No, boost:: and std::uniform_real interval is [a, b), while Intel MKL is [a, b]. Besides, boost::uniform_real is deprecated by uniform_real_distribution. How about this work around: using boost::variate_generator;
using boost::mt19937;
using boost::random::uniform_real_distribution;
Caffe::random_generator_t &generator = Caffe::vsl_stream();
Dtype epsilon = 1e-5; // or 1e-4, 1e-6, different values may cause some tests to fail or pass
variate_generator<mt19937, uniform_real_distribution<Dtype> > rng(generator, uniform_real_distribution<Dtype>(a, b + epsilon));
do {
r[i] = rng();
} while (r[i] > b); |
Great to see this moving, and glad that you found/understood the source of http://stackoverflow.com/questions/16224446/stduniform-real-distribution-inclusive-range I am not a git guru (I am more of a hg guy), in which branch is On Wed, Jan 8, 2014 at 11:58 AM, kloudkl [email protected] wrote:
|
This is good progress. Thanks for the commit @rodrigob and debugging @kloudkl! Let's develop this port in the boost-eigen branch I have just pushed. I have included the initial commit by @rodrigod. To continue development, please make commits in your fork then pull request to this branch. I will review and merge the requests. Please rebase any work on the latest bvlc/caffe boost-eigen before requesting a pull–I'd rather keep the history clean from merge noise. |
Is the plan to completely get rid of MKL? |
you can change the makefile include and library to make it work on 2014/1/23 Tobias Domhan [email protected]
|
Please note that on debian systems selecting the blas implementation is done via Such decision is certainly not meant to be done during the runtime of an application. |
The ideal case for integration is that performance of the MKL and boost-eigen implementations are comparable and boost-eigen is made the default. If the MKL vs. boost/eigen differences can be insulated cleanly enough it would be nice to offer both by a build switch. We need benchmarking to move forward and comparisons by anyone with both MKL and boost/eigen would be welcome. @Yangqing @jeffdonahue should comparing train/test of the imagenet model do it, or is are there more comparisons to be done? |
CPU is too slow to train such a large dataset as ImageNet. Most possible use case is to first train on GPU and deploy the model on devices without GPU. Beside benchmarking the runtime of a complete pipeline, microbenchmarking of math methods/functions and profiling to find out the hotspot codes are also helpful. |
Agreed, real training of ImageNet / any contemporary architecture and data set is infeasible on CPU. Sorry my suggestion was not more precise. I think benchmarking training minibatches or epochs is still indicative of performance. I second microbenchmarking too, as a further detail. If the speed of the full pipeline is close enough that suffices. |
I have just benchmarked on the MNIST dataset using both the heads of the boost-eigen branch and the master. The three experiments used CPU mode with boost-eigen, CPU mode with MKL and GPU mode respectively. The CPU is Intel® Core™ i7-3770 CPU @ 3.40GHz × 8 and the GPU is NVIDIA GTX 560 Ti. But the CPU code under-utilized the available cores using only a single thread. After training 10000 iterations, the final learning rate, training loss, testing accuracy (Test score 0) and testing loss (Test score 1) of boost-eigen and MKL were all exactly the same. The training time of boost-eigen was 26m25.259s and that of MKL was 26m43.919s. Considering the fluctuations of data IO costs, there was actually no significant performance difference. The results were a little surprising. So you may want to double check it on your own machine. On GTX 560 Ti, it took 85.5% less time than the faster CPU mode with boost-eigen to train a slightly better model in terms of training loss, testing accuracy and testing loss. Because the training processes also included testing iterations, this benchmark demonstrate that there is no need to further depend on a proprietary library which brings no benefit but excess codes and redundant maintenance burdens. It is time to merge this branch directly into the master.
CPU boost-eigen
CPU MKL
GPU
|
It would be good to have a benchmark with larger networks such as imagenet, Yangqing On Thu, Feb 6, 2014 at 10:22 PM, kloudkl [email protected] wrote:
|
Would it help to replace some of the code with parallel for loops. Eigen does not exploit the several cores present in most workstations except for matrix matrix multiplication. For example, the relu layer (or any simple activation function) does an independent operation for every neuron. It can be made fast using #pragma omp parallel for. |
@aravindhm, I had the same idea as you just after observing that the training on CPU is single-threaded and experimented parallelizing with OpenMP. But the test accuracy turned out to be staying at the random guess level. Then I realized that there was conflict between OpenMP and BLAS and the correct solution is to take advantage of a multi-threaded BLAS library such as OpenBLAS. See my reference from #79 above. |
The updated benchmark exploiting multi-threaded OpenBLAS showed great speed-up that training on a multi-core CPU can be as fast as or even faster than that on a GPU. Now, it becomes more realistic to benchmark with a larger scale dataset.
CPU boost-eigen
CPU MKL
|
A proposal has been made at #97 - please kindly discuss there. Closing this to reduce duplicates. |
update from updstream
It is mentioned in the install instruction that this is work in progress.
While at ICCV I quickly implemented a branch where I remplace matrix operations with Eigen3 calls, and random generators by Boost::random generators.
I hope this is not redundant with ongoing work on private branches.
The branch can be found at
https://github.com/rodrigob/caffe
I got things to compile, however I noticed that some tests fails (thanks for creating a non-trivial set of unit tests !).
I have not been able to compile a version with MKL to compare, but I can only assume that tests should not fail.
Current fails are
[ FAILED ] FlattenLayerTest/1.TestCPUGradient, where TypeParam = double
[ FAILED ] StochasticPoolingLayerTest/0.TestGradientGPU, where TypeParam = float
[ FAILED ] StochasticPoolingLayerTest/1.TestGradientGPU, where TypeParam = double
[ FAILED ] MultinomialLogisticLossLayerTest/1.TestGradientCPU, where TypeParam = double
which all sounds nasty (gradient computation errors in neural networks, big no no).
I will spend some time inspecting to see what goes wrong there, but any suggestion/comment/idea is welcome.
The text was updated successfully, but these errors were encountered: