Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] CUDA 10.0 and tensorflow 1.14 for docker install #682

Closed
wants to merge 3 commits into from

Conversation

dustindorroh
Copy link
Contributor

No description provided.

@nmanovic nmanovic requested a review from azhavoro September 1, 2019 17:51
Dockerfile Outdated Show resolved Hide resolved
@@ -15,4 +15,4 @@ services:
environment:
NVIDIA_VISIBLE_DEVICES: all
NVIDIA_DRIVER_CAPABILITIES: compute,utility
NVIDIA_REQUIRE_CUDA: "cuda>=9.0"
NVIDIA_REQUIRE_CUDA: "cuda>=10.0 brand=tesla,driver>=384,driver<385 brand=tesla,driver>=410,driver<411"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the string work for "Tesla" only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. It's not clear if you need to have tesla for using cuda for cuda 10.0 https://docs.nvidia.com/cuda/archive/10.0/cuda-toolkit-release-notes/index.html

But it may be more of a requirement of running nvidia-docker
https://github.com/NVIDIA/nvidia-docker/wiki/CUDA

I got this line from: https://gitlab.com/nvidia/container-images/cuda/blob/master/dist/ubuntu16.04/10.0/base/Dockerfile

#
# cuda 10.0 base - https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/10.0/base/Dockerfile
# cuda 10.0 runtime - https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/10.0/runtime/Dockerfile
# cudnn7 - https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/10.0/runtime/cudnn7/Dockerfile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you combined all these docker files together and put here instructions. It will be difficult for us to support them because Nvidia can changes original files in the future and it could break our code.

@azhavoro , I'm thinking about a separate container which can execute some registered functions (e.g. TF annotation) using https://docs.python.org/3/library/xmlrpc.html. Could you please recommend something here?

Copy link
Contributor Author

@dustindorroh dustindorroh Sep 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nmanovic I agree. I was following the previous example https://github.com/opencv/cvat/blob/develop/components/cuda/install.sh
I looked like to me the base runtime and cudnn was combined into one file. I was unsure if this was desired on your part, maybe having less docker layers was desired.
If there is interest in this I can write it this way. It may help with composing different cuda versions.

Copy link
Contributor Author

@dustindorroh dustindorroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started this pull request not necessarily just to get it in, but to start a conversation about how you guys were thinking about support for new versions. My main motivation was to get full round trip tf_annotation, training using Tensorflow ObjectDetection Api, and manual adjustment.

Also as of CUDA 9 and above multiple cuda versions are backwards compatible.
https://docs.nvidia.com/deploy/cuda-compatibility/index.html

@dustindorroh
Copy link
Contributor Author

If Codacy flags my single quote use with variables, this was intended. As I'm adding appending statements to variables to the ~/.bashrc.

@nmanovic
Copy link
Contributor

nmanovic commented Sep 5, 2019

@dustindorroh , internally we discussed a separate container for CUDA functionality long time ago. Thus it will be easy to add similar features in the future or modify the current one. It is a good approach from many point of views.

Also as a container we can add "training" part. Just need to invent an interface using https://docs.python.org/3/library/xmlrpc.html. Do you think you have time to help us with the feature? It is a long way. I believe discussion and implementation can take a couple of months but the result should be promising.

@nmanovic nmanovic changed the title CUDA 10.0 and tensorflow 1.14 for docker install [WIP] CUDA 10.0 and tensorflow 1.14 for docker install Sep 5, 2019
@nmanovic
Copy link
Contributor

@dustindorroh , we made our next release and now have more time to discuss the feature. Could you please help us to move TF annotation into a separate container? In the separate container we can have any version of CUDA (versions of CUDA and TF can be build arguments). I hope you can come up with a proposal how to do that based on discuss in the thread. Don't hesitate to contact me directly if you have any questions. Do you think you can contribute the feature?

@nmanovic
Copy link
Contributor

@dustindorroh , once again thanks for the contribution. We are not going to access the PR because it can lead to regressions. In the future we will have CUDA functionality into a separate container.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants