-
Notifications
You must be signed in to change notification settings - Fork 6.8k
CUDA 10 w/ cuDNN 7.5 Support #14652
Comments
Hey, this is the MXNet Label Bot. |
My suggestion would be to maybe merge the PR with cuDNN v7.3.1.20 (which would at least ensure that mxnet works with cuDNN up to this version), then whoever tackles the v7.5 issue can just update the CI image to use the latest version of cuDNN. |
only for to know the latest version of cuda is 10.1.105_418.39 and cudnn 10.1-linux-x64-v7.5.0.56 why not use this version? greetings |
Hey, This requires a bit more work on the AMI side. I'm also no convinced that it will solve the problem. Cheers |
Hey @perdasilva |
@mxnet-label-bot add [CUDA, CI] |
@stu1130 thank you. It seems that the nvidia drivers on the linux nodes has been bumped to 418 because of the tensorrt issues. This means we should be able to use CUDA 10.1 =) (let me know if it doesn't work) |
@perdasilva any updates? Thanks |
@stu1130 I'm currently on leave until Thursday. I totally missed that you wanted me to merge the other PR first. I will do that as soon as I'm back. I'm sorry missed that. I'll see about already bumping CI to 10.1 as well - then that's done. |
@perdasilva no rush! Thanks a lot for this awesome job!!! |
@stu1130 There's no cudnn 7.3 package for cuda 10.1, so I won't be able to update CI to 10.1 in my PR. |
@stu1130 it's been merged! Feel free to take it away and let me know if I can help you =) |
@perdasilva Awesome Thanks a lot!!! |
Here are what I found
|
This has since been fixed ^^ thx to @stu1130 |
Description
Currently, the CI tests fail when running mxnet on top of CUDA 10 and cuDNN 7.5 as demonstrated in this PR.
The tests pass when using CUDA 10 and cuDNN 7.3.1.20, as demonstrated in this PR.
Environment info (Required)
g3.8xlarge with CUDA 10 and nvidia driver 410.73 installed.
The code is running inside the CI GPU container based on
nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04
.Error Message:
Usually:
src/operator/./cudnn_rnn-inl.h:759: Check failed: e == CUDNN_STATUS_SUCCESS (6 vs. 0) cuDNN: CUDNN_STATUS_ARCH_MISMATCH
Here are some example logs:
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fcentos-gpu/detail/PR-14611/1/pipeline/
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-14611/12/pipeline/
Steps to reproduce
The text was updated successfully, but these errors were encountered: