We recommend pytorch 1.7.1 with cuda 11.0, while older versions of pytorch should work we have not tested and so cannot confirm this.
If you are using AWS then the latest AWS deep learning AMI already has cuda versions 10.0 to 11.1 installed, so we can just set 11.0 as the cuda version to use by setting these variables in the terminal:
export CUDA_HOME="/usr/local/cuda-11.0"
export CUDA_PATH="$CUDA_HOME"
export PATH="$CUDA_HOME/bin:$PATH"
export LD_LIBRARY_PATH="$CUDA_HOME/lib64:$CUDA_HOME/lib:$CUDA_HOME/extras/CUPTI/lib64:$CUDA_HOME/efa/lib:$LD_LIBRARY_PATH"
You can skip over to Python Environment
If you are not on the deep learning AMI then in order to install cuda 11.0 you can use the official cuda toolkit archive: https://developer.nvidia.com/cuda-toolkit-archive . You can find version 11.0 here: https://developer.nvidia.com/cuda-11.0-update1-download-archive , you can use the deb(network).
For ---- Ubuntu 20.04 ---- this is just:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
For ---- Ubuntu 18.04 ----
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
Then for both run:
sudo apt-get update
sudo apt-get -y install cuda-11-0
# Try running nvidia-smi afterward, if it returns 'Failed to initialize NVML: Driver/library version mismatch'
# then reboot before proceeding
nvidia-smi
For other OSes please check nvidia's page for instructions: https://developer.nvidia.com/cuda-11.0-update1-download-archive
You can install multiple cuda versions if you need another version for a different project, see: https://medium.com/@peterjussi/multicuda-multiple-versions-of-cuda-on-one-machine-4b6ccda6faae and https://stackoverflow.com/questions/41330798/install-multiple-versions-of-cuda-and-cudnn
If you're using the AWS deep learning AMI we recommend using the base
environment as it avoids some of the pip issues
listed below that can happen with the pytorch_latest_p37
environment. You can clone the base environment for this:
conda create --name siammot --clone base
conda activate siammot
Otherwise we advise using a virtualenv with something like python3 -m venv VENV_PATH && source VENV_PATH/bin/activate
In order to install the cuda 11.0 version of pytorch 1.7.1 we can use the previous versions instructions: https://pytorch.org/get-started/previous-versions/
# Removed torchaudio since it's not needed for this project
pip3 install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html
Note that if the cuda version your pytorch installation was compiled with doesn't match your local cuda
version you might have issues building apex, so verify this (can check local cuda version with
nvcc --version # try /usr/local/cuda/bin/nvcc if not found
)
Install python requirements:
pip3 install -r requirements.txt
NOTE If your install is taking a long time with a bunch of messages about "pip is looking at multiple versions"
you can try adding --use-deprecated=legacy-resolver
to the install command to use the old pip behavior, see:
https://stackoverflow.com/questions/65122957/resolving-new-pip-backtracking-runtime-issue .
If you get an error like: AttributeError: type object 'Callable' has no attribute '_abc_registry'
then try pip3 uninstall typing
, see here: https://stackoverflow.com/questions/55833509/attributeerror-type-object-callable-has-no-attribute-abc-registry
Note that the exact versions of software used were relaxed to avoid unnecessary conflicts, but if you want to see the
exact versions we used to validate results you can check requirements_exact.txt
.
The project is built on maskrcnn-benchmark, which you can install following the instructions from the official repo (see below if using pytorch >=1.5.0): https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/INSTALL.md
NOTE: When building apex on a deep learning AMI remember to set the CUDA_HOME as mentioned above if you get an error about mismatched pytorch and cuda versions
NOTE: If using pytorch >= 1.5.0 you will need to patch maskrcnn-benchmark in order for it to compile. After cloning the repo change into the git root folder and run these commands in the terminal:
cuda_dir="maskrcnn_benchmark/csrc/cuda"
perl -i -pe 's/AT_CHECK/TORCH_CHECK/' $cuda_dir/deform_pool_cuda.cu $cuda_dir/deform_conv_cuda.cu
# You can then run the regular setup command
python3 setup.py build develop
Then you can proceed with the setup.py command as normal.
Afer setting up maskrcnn-benchmark and the python dependencies you should be able to run the project code. Make sure though that you can run nvidia-smi before trying to run the project code and reboot if necessary.