-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] CUDA 10.0 and tensorflow 1.14 for docker install #682
Conversation
@@ -15,4 +15,4 @@ services: | |||
environment: | |||
NVIDIA_VISIBLE_DEVICES: all | |||
NVIDIA_DRIVER_CAPABILITIES: compute,utility | |||
NVIDIA_REQUIRE_CUDA: "cuda>=9.0" | |||
NVIDIA_REQUIRE_CUDA: "cuda>=10.0 brand=tesla,driver>=384,driver<385 brand=tesla,driver>=410,driver<411" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the string work for "Tesla" only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. It's not clear if you need to have tesla
for using cuda for cuda 10.0 https://docs.nvidia.com/cuda/archive/10.0/cuda-toolkit-release-notes/index.html
But it may be more of a requirement of running nvidia-docker
https://github.com/NVIDIA/nvidia-docker/wiki/CUDA
I got this line from: https://gitlab.com/nvidia/container-images/cuda/blob/master/dist/ubuntu16.04/10.0/base/Dockerfile
components/cuda/install-cuda10-0.sh
Outdated
# | ||
# cuda 10.0 base - https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/10.0/base/Dockerfile | ||
# cuda 10.0 runtime - https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/10.0/runtime/Dockerfile | ||
# cudnn7 - https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/10.0/runtime/cudnn7/Dockerfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you combined all these docker files together and put here instructions. It will be difficult for us to support them because Nvidia can changes original files in the future and it could break our code.
@azhavoro , I'm thinking about a separate container which can execute some registered functions (e.g. TF annotation) using https://docs.python.org/3/library/xmlrpc.html. Could you please recommend something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nmanovic I agree. I was following the previous example https://github.com/opencv/cvat/blob/develop/components/cuda/install.sh
I looked like to me the base
runtime
and cudnn
was combined into one file. I was unsure if this was desired on your part, maybe having less docker layers was desired.
If there is interest in this I can write it this way. It may help with composing different cuda versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started this pull request not necessarily just to get it in, but to start a conversation about how you guys were thinking about support for new versions. My main motivation was to get full round trip tf_annotation, training using Tensorflow ObjectDetection Api, and manual adjustment.
Also as of CUDA 9 and above multiple cuda versions are backwards compatible.
https://docs.nvidia.com/deploy/cuda-compatibility/index.html
If Codacy flags my single quote use with variables, this was intended. As I'm adding appending statements to variables to the |
@dustindorroh , internally we discussed a separate container for CUDA functionality long time ago. Thus it will be easy to add similar features in the future or modify the current one. It is a good approach from many point of views. Also as a container we can add "training" part. Just need to invent an interface using https://docs.python.org/3/library/xmlrpc.html. Do you think you have time to help us with the feature? It is a long way. I believe discussion and implementation can take a couple of months but the result should be promising. |
@dustindorroh , we made our next release and now have more time to discuss the feature. Could you please help us to move TF annotation into a separate container? In the separate container we can have any version of CUDA (versions of CUDA and TF can be build arguments). I hope you can come up with a proposal how to do that based on discuss in the thread. Don't hesitate to contact me directly if you have any questions. Do you think you can contribute the feature? |
@dustindorroh , once again thanks for the contribution. We are not going to access the PR because it can lead to regressions. In the future we will have CUDA functionality into a separate container. |
No description provided.