-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The speed of Deep_Residual_Learning_CIFAR-10.py #57
Comments
I don't know for this particular case, but in general, if you're in doubt, you can run the script with |
For me on Titan X with cuDNN v4, epoch time is about 126s. |
Hi,
|
Interesting -- could you please check what I posted before (#57 (comment))? |
Hi, thanks for looking into this. I think my system is correctly set up (python 2.7.11, 64bit, lasagne version 0.2.dev1). I verified that:
I ran the profiling (1 epoch, using 1,000 instead of all 100,000 training examples), and looks like it only uses GPU operations:
|
As I said above (#57 (comment)),
There are no CPU Ops, right. But the runtime is dominated by GpuAllocEmpty -- this is not expected.
|
Wow, that was it: after adding the cnmem=0.3 flag, the time per epoch went down to 95 seconds for CIFAR, and 0.4 seconds for MNIST. Thanks a lot! |
I'd still like to see the profile for the MNIST example without cnmem, if possible: For me, this increases per-epoch time from 1.1s to 1.6s, but not to 9s as for your machine. There's still something fishy! |
OMG, I even didn't notice the cnmem flag before... when cnmem is disabled, the time per epoch for @PatrickBue , you get 95 seconds one epoch, faster than me... It's better to enable cnmem when train CNN, am I right? @f0k |
@kli-nlpr: Try these flags here. Brings the 95 seconds per epoch down to 82 on my system:
|
It should be set by default in future, but they didn't do it yet.
Yes, as we recommend here: https://github.com/Lasagne/Lasagne/wiki/From-Zero-to-Lasagne-on-Ubuntu-14.04#cudnn If you can spare the memory, you can also set |
@f0k, this is the profile output for mnist.py using the flags you suggested:
|
Hey guys, I aggressively change cnmem=0.8,
The time one epoch went down to 76 seconds! |
Interesting. The first function profile is the one for the training function. It seems that not only
This should only make a difference for the very first epoch. If you set it to a smaller fraction of memory than ultimately needed, it will have to do extra allocations when it hits the boundary of reserved memory. After the first epoch, the memory pool should be large enough for all subsequent epochs so there won't be any more allocations (from the perspective of the driver). You can try to add |
The new back-end have another allocator that is enabled and preallocate 0 So the transition should fix that. There is a flag to make it reserve more On Thu, May 19, 2016 at 12:44 PM, Jan Schlüter [email protected]
|
add
Now the time for one epoch went down to 74 seconds! |
I am using Ubuntu 14.04 with a titan x GPU and cudnn v4.
When I run the Deep_Residual_Learning_CIFAR-10.py script
The results are as follows
Is this running speed normal?
The text was updated successfully, but these errors were encountered: