TensorFlow 2.7 does not detect CUDA installed through conda #52988

drasmuss · 2021-11-08T14:54:48Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 2.7.0
Python version: 3.8
Bazel version (if compiling from source): N/A
GCC/Compiler version (if compiling from source): N/A
CUDA/cuDNN version: 11.2/8.1
GPU model and memory: GTX 2080Ti

Describe the current behavior

After installing cuda/cudnn through conda (conda install cudatoolkit=11.2 cudnn=8.1), TensorFlow 2.7 reports that it cannot find the cuda libraries.

2021-11-08 14:49:16.412959: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-08 14:49:16.413006: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-11-08 14:49:22.640508: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.640617: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.640698: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.640776: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.640853: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.640941: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.641022: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.641099: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2021-11-08 14:49:22.641120: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and se
tup the required libraries for your platform.

Installing TensorFlow 2.6 (or earlier) in the same environment, with the same cuda/cudnn installation, doesn't show any problem, it detects the libraries and GPU support works as expected.

The problem can be worked around by manually adding the conda lib directory to LD_LIBRARY_PATH (export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib). However, obviously this is not ideal, as it needs to be repeated/adjusted for every new conda environment. It would be better if TensorFlow just detected the conda installed libraries, as it did in TensorFlow < 2.7.

Describe the expected behavior

TensorFlow should detect cuda/cudnn libraries installed through conda, as it did in TensorFlow<2.7.

Contributing

Do you want to contribute a PR? (yes/no): no
Briefly describe your candidate solution(if contributing):

Standalone code to reproduce the issue

conda create -n tmp python=3.8
conda activate tmp
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1
pip install "tensorflow==2.7.0"
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"  # displays []
LD_LIBRARY_PATH=LD_LIBRARY_PATH:$CONDA_PREFIX/lib python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"  # displays [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
pip install "tensorflow<2.7.0"
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"  # displays [[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]]

The text was updated successfully, but these errors were encountered:

tilakrayal · 2021-11-09T06:24:43Z

@drasmuss ,
We can see that you have installed tensorflow from conda environment.Installation issues within the Anaconda environment are tracked in the Anaconda repo.Please try to install in new virtual environment from this link and let us know if it is still an issue.Thanks!

drasmuss · 2021-11-09T14:57:06Z

I'm not installing tensorflow from conda, just cuda/cudnn. Tensorflow is being installed from pip like normal. And you can see in the reproduction steps I posted above that we're starting from a new virtual environment (repeated below for convenience).

conda create -n tmp python=3.8
conda activate tmp
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1
pip install "tensorflow==2.7.0"
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"  # displays []
LD_LIBRARY_PATH=LD_LIBRARY_PATH:$CONDA_PREFIX/lib python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"  # displays [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
pip install "tensorflow<2.7.0"
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"  # displays [[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]]

Also note that nothing has changed on the conda side of things; we're still using the exact same environment with the same cuda/cudnn libraries, but it works in TF 2.6 and fails in TF 2.7. So I don't think the issue is on the conda side, something has changed in TensorFlow that has made this stop working.

pradyyadav · 2021-11-12T08:12:48Z

Open the terminal and type

nano ~/.bashrc

at the end of the file add the following two lines

export PATH=$PATH:/usr/local/cuda-11.2/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2/lib64

ensure no spaces on both side of '=' sign.

if it still does not works, try adding for version 11.0

export PATH=$PATH:/usr/local/cuda-11.0/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.0/lib64

drasmuss · 2021-11-12T13:44:23Z

As mentioned, CUDA is being installed through conda, so /usr/local/cuda- is not the correct path (the correct path is given in the original post: $CONDA_PREFIX/lib). However, hard coding that into .bashrc isn't a solution, because $CONDA_PREFIX changes depending on which conda environment you have active.

mihaimaruseac · 2021-11-16T21:17:13Z

Conda installs are not officially supported by Google

ddaspit · 2021-11-29T04:23:32Z

I installed Tensorflow 2.7 on Windows with CUDA 11.2 and cuDNN 8.1 (no conda involved). I received the same Could not load dynamic library errors. I switched to CUDA to 11.0 and it worked. I am guessing that the pip packages for Tensorflow 2.7 were accidentally built against CUDA 11.0 instead of 11.2.

janniksinz · 2021-11-29T21:22:25Z

Open the terminal and type

nano ~/.bashrc

at the end of the file add the following two lines

export PATH=$PATH:/usr/local/cuda-11.2/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2/lib64

ensure no spaces on both side of '=' sign.

if it still does not works, try adding for version 11.0

export PATH=$PATH:/usr/local/cuda-11.0/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.0/lib64

Thank you, this also works with cuda-11.4. But how would you fix this issue in a jupyter notebook? For the pretty niche use case that you would need tf=2.7.0 features.

When I start a jupyter server within a env that has these PATHs exported, it only shows the CPU. When exporting Paths in the notebook it doesn't work either.

jesusdpa1 · 2021-12-01T20:23:24Z

This seems to solve the issue:

conda activate ENVNAME

cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh

Edit ./etc/conda/activate.d/env_vars.sh as follows:

#!/bin/sh

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib

Edit ./etc/conda/deactivate.d/env_vars.sh as follows:

#!/bin/sh

unset LD_LIBRARY_PATH

Source

https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#macos-and-linux

holongate · 2021-12-12T11:10:18Z

I don't want to be dismissive here, but there is a lack of understanding of the problem specifically introduced by TF 2.7:

A conda environment does install native libraries and does ensure they will be found by the os dynamic loader mechanism for the programs that want to find these libraries.
Until TF 2.7 this was the way it worked, like the gazillion other native apps (including cuda ones)
TF 2.7, not conda, specifically broke that by ignoring the os loading mechanism for an unknown/undocumented reason

This problem is not just a techie point, it does have deep implication for businesses that do real products.
This method of working is the only reliable one for teams that work on more than one TF project, require multiple TF/CUDA/Python combinations on the same workstation (without root access).
By the way, the CUDA stack from the official nvidia channel, like nvcc/ptxas perfectly work in conda and is recommended by Nvidia itself.

For my suffering peers, if you don't have access to root, you can use this small poorly-documented feature in your environment.yml:

name: base-tf-cuda-env
channel:
  - nvidia
  - conda-forge
  - defaults
dependencies:
  - python=3.8
# Install cuda libs + ptxas compiler from nvidia channel
# This will accelerate the compilation of kernels for your specific card
  - cudatoolkit=11
  - cudnn=8
  - cupti=11
  - cuda-nvcc
...
  - pip
  - pip:
     - tensorflow==2.7.*
variables:
  # In case you want to see your own logs and tame the TF loggorrhea
  TF_CPP_MIN_LOG_LEVEL: 3
  # Adjust to point to your local env path:
  LD_LIBRARY_PATH: /home/me/.conda/envs/thisenvname/lib

Upon conda activate, the env variables will be set for you, and unset on deactivation.
Better than nothing, but might interfere with some other configuration...

chainyo · 2022-01-19T08:04:19Z

Upon conda activate, the env variables will be set for you, and unset on deactivation.
Better than nothing, but might interfere with some other configuration...

Really appreciate the file you provided!
There is a typo for channelS part, but that's awesome, thanks 👍

filippocastelli · 2022-01-26T13:19:54Z

Upon conda activate, the env variables will be set for you, and unset on deactivation. Better than nothing, but might interfere with some other configuration...

@holongate 's env is a good workaround and solves the problem for me.

I'm quite astonished by how little thought was given on the issue - which is clearly a problem with TF 2.7 itself, and not with conda - and by how much time you waste on commenting that conda installs are not supported by Google.

drasmuss · 2022-01-28T20:15:46Z

For anyone looking for a one-liner solution, you can do

conda env config vars set LD_LIBRARY_PATH=$CONDA_PREFIX/lib

(with the environment you want to modify activated). This has a similar effect as @jesusdpa1's solution here #52988 (comment), it'll set LD_LIBRARY_PATH when the environment is activated and unset it when it's deactivated.

You still need to repeat that for every new conda environment though. It would be better if TensorFlow just detected the conda installed libraries, as it did in TensorFlow<=2.6.

drasmuss · 2022-11-22T15:08:28Z

The official documentation suggests manually doing export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/ every time you want to use TensorFlow, which obviously isn't really a feasible solution.

This is discussed above, but I'll reiterate the main points here for anyone coming across this thread:

Currently the best solution is to use the community-maintained TensorFlow installation from conda-forge (e.g. conda install -c conda-forge tensorflow). Generally speaking that should just work.
If 1. isn't possible/working for some reason (e.g. because you need to use a very recent release of TensorFlow that isn't yet available on conda-forge), the easiest solution is here TensorFlow 2.7 does not detect CUDA installed through conda #52988 (comment).
However, sometimes 2. can cause problems with other system packages, since you're modifying the global LD_LIBRARY_PATH (note that this is also a problem with the approach recommended in the official documentation). If you run into issues like that, you can try this approach TensorFlow 2.7 does not detect CUDA installed through conda #52988 (comment), with the caveats mentioned there that this might break in future updates.

I'll reiterate again, that all of these solutions are a downgrade in the user experience from TensorFlow < 2.7, when TensorFlow just correctly detected the conda-installed CUDA libraries without any fiddling required from the user.

TuanBC · 2022-11-23T04:08:13Z

A kind-of semi-automated snippet for solving cudatoolkit PATH problem in conda environment that I am using:

conda activate tf_env
conda install –c conda-forge cudatoolkit cudnn

mkdir -p $CONDA_PREFIX/etc/conda/activate.d
mkdir -p $CONDA_PREFIX/etc/conda/deactivate.d

printf '#!/bin/sh\nexport OLD_LD_LIBRARY_PATH=$LD_LIBRARY_PATH\nexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/\n' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh 
printf '#!/bin/sh\nexport LD_LIBRARY_PATH=$OLD_LD_LIBRARY_PATH\nunset OLD_LD_LIBRARY_PATH\n' > $CONDA_PREFIX/etc/conda/deactivate.d/env_vars.sh

This snippet automatically set and unset neccessary environment variables when you activate or deactivate conda environment. It could be useful not only for TF users, but for some other library where it needs CUDA dependencies to be built manually from source.

SuryanarayanaY · 2023-03-01T13:10:43Z

Hi @drasmuss ,
Could you please refer this documentation source.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/

For your convenience it is recommended that you automate it with the following commands. The system paths will be automatically configured when you activate this conda environment.

mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/'$CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

With the above two lines of code it is not required to use the command export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/ every time you want to use tensorflow.Its one time setup and after that you can use the environment any no of times.

I hope this shall address the issue.Please confirm if still missing anything here. Thanks!

drasmuss · 2023-03-01T13:53:07Z

Hi @SuryanarayanaY,

See #52988 (comment) for a summary of the discussion in this thread. The short answer is that no, that solution doesn't address the issue.

Longer answer: The solution you describe from the docs is basically a worse version of idea 2 from that summary above. Worse in that it's more complicated, and it won't unset LD_LIBRARY_PATH when the environment is deactivated. But as mentioned above, idea 2 is not really a viable solution because LD_LIBRARY_PATH is a global environment variable, and modifying it has negative side effects on lots of other system packages besides TensorFlow.

And, to reiterate again, all of these "solutions" are downgrades from the behaviour prior to TensorFlow 2.7, where TensorFlow just correctly detected the CUDA libraries without requiring any manual intervention from users.

sachinprasadhs · 2023-03-13T17:28:57Z

@drasmuss , I'm just curious if you have observed the same behavior in 2.11 version.
Also, since 2.12 release is around the corner, you can wait for few days and check it, since we are bumping the CUDA supported version to 11.8. Thanks!

drasmuss · 2023-03-13T17:43:59Z

Yes, the behaviour is the same in 2.11 and 2.12.0rc1 (I wouldn't expect it to change between rc1 and the full 2.12 release).

Note that in 2.12 the error message has changed, so it displays

2023-03-13 14:41:41.580759: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-03-13 14:41:41.602435: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.

instead of the old "Could not load dynamic library..." errors, but it's the same issue.

githubskiy · 2023-05-14T20:33:32Z

did you guys solve this problem?

Venkat6871 · 2024-10-11T14:48:15Z

Hi,

Thank you for opening this issue. Since this issue has been open for a long time, the code/debug information for this issue may not be relevant with the current state of the code base.

The Tensorflow team is constantly improving the framework by fixing bugs and adding new features. We suggest you try the latest TensorFlow version with the latest compatible hardware configuration which could potentially resolve the issue. If you are still facing the issue, please create a new GitHub issue with your latest findings, with all the debugging information which could help us investigate.

Please follow the release notes to stay up to date with the latest developments which are happening in the Tensorflow space.

github-actions · 2024-10-19T02:01:53Z

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

github-actions · 2024-10-27T02:06:53Z

This issue was closed because it has been inactive for 7 days since being marked as stale. Please reopen if you'd like to work on this further.

google-ml-butler · 2024-10-27T02:06:57Z

Are you satisfied with the resolution of your issue?
Yes
No

drasmuss added the type:bug Bug label Nov 8, 2021

google-ml-butler bot assigned tilakrayal Nov 8, 2021

tilakrayal added TF 2.7 Issues related to TF 2.7.0 subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues type:build/install Build and install issues labels Nov 9, 2021

tilakrayal added the stat:awaiting response Status - Awaiting response from author label Nov 9, 2021

tilakrayal removed the stat:awaiting response Status - Awaiting response from author label Nov 9, 2021

tilakrayal assigned Saduf2019 and sanatmpa1 and unassigned Saduf2019 and tilakrayal Nov 9, 2021

sanatmpa1 assigned jvishnuvardhan and unassigned sanatmpa1 Nov 9, 2021

jvishnuvardhan assigned mihaimaruseac and unassigned jvishnuvardhan Nov 15, 2021

jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Nov 15, 2021

mihaimaruseac assigned sanjoy and unassigned mihaimaruseac Nov 16, 2021

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Nov 22, 2022

mohantym removed their assignment Nov 22, 2022

Shuhul24 mentioned this issue Dec 8, 2022

Does tensorflow-quantum supports multiple GPU? tensorflow/quantum#732

Open

SuryanarayanaY self-assigned this Feb 28, 2023

SuryanarayanaY added the stat:awaiting response Status - Awaiting response from author label Mar 1, 2023

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Mar 1, 2023

SuryanarayanaY assigned sachinprasadhs and unassigned SuryanarayanaY Mar 5, 2023

sachinprasadhs unassigned sanjoy Mar 13, 2023

sachinprasadhs added the stat:awaiting response Status - Awaiting response from author label Mar 13, 2023

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Mar 13, 2023

sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Mar 14, 2023

talmo mentioned this issue Jul 17, 2023

Add instructions for CUDA issues with new TensorFlow versions talmolab/sleap#1390

Open

Venkat6871 added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Oct 11, 2024

github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Oct 19, 2024

github-actions bot closed this as completed Oct 27, 2024

Venkat6871 assigned Venkat6871 and unassigned sachinprasadhs Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorFlow 2.7 does not detect CUDA installed through conda #52988

TensorFlow 2.7 does not detect CUDA installed through conda #52988

drasmuss commented Nov 8, 2021 •

edited

Loading

tilakrayal commented Nov 9, 2021

drasmuss commented Nov 9, 2021

pradyyadav commented Nov 12, 2021 •

edited

Loading

drasmuss commented Nov 12, 2021

mihaimaruseac commented Nov 16, 2021

ddaspit commented Nov 29, 2021

janniksinz commented Nov 29, 2021 •

edited

Loading

jesusdpa1 commented Dec 1, 2021 •

edited

Loading

holongate commented Dec 12, 2021 •

edited by mihaimaruseac

Loading

chainyo commented Jan 19, 2022

filippocastelli commented Jan 26, 2022

drasmuss commented Jan 28, 2022

drasmuss commented Nov 22, 2022 •

edited

Loading

TuanBC commented Nov 23, 2022 •

edited

Loading

SuryanarayanaY commented Mar 1, 2023

drasmuss commented Mar 1, 2023 •

edited

Loading

sachinprasadhs commented Mar 13, 2023

drasmuss commented Mar 13, 2023 •

edited

Loading

githubskiy commented May 14, 2023

Venkat6871 commented Oct 11, 2024

github-actions bot commented Oct 19, 2024

github-actions bot commented Oct 27, 2024

google-ml-butler bot commented Oct 27, 2024

TensorFlow 2.7 does not detect CUDA installed through conda #52988

TensorFlow 2.7 does not detect CUDA installed through conda #52988

Comments

drasmuss commented Nov 8, 2021 • edited Loading

tilakrayal commented Nov 9, 2021

drasmuss commented Nov 9, 2021

pradyyadav commented Nov 12, 2021 • edited Loading

drasmuss commented Nov 12, 2021

mihaimaruseac commented Nov 16, 2021

ddaspit commented Nov 29, 2021

janniksinz commented Nov 29, 2021 • edited Loading

jesusdpa1 commented Dec 1, 2021 • edited Loading

Source

holongate commented Dec 12, 2021 • edited by mihaimaruseac Loading

chainyo commented Jan 19, 2022

filippocastelli commented Jan 26, 2022

drasmuss commented Jan 28, 2022

drasmuss commented Nov 22, 2022 • edited Loading

TuanBC commented Nov 23, 2022 • edited Loading

SuryanarayanaY commented Mar 1, 2023

drasmuss commented Mar 1, 2023 • edited Loading

sachinprasadhs commented Mar 13, 2023

drasmuss commented Mar 13, 2023 • edited Loading

githubskiy commented May 14, 2023

Venkat6871 commented Oct 11, 2024

github-actions bot commented Oct 19, 2024

github-actions bot commented Oct 27, 2024

google-ml-butler bot commented Oct 27, 2024

drasmuss commented Nov 8, 2021 •

edited

Loading

pradyyadav commented Nov 12, 2021 •

edited

Loading

janniksinz commented Nov 29, 2021 •

edited

Loading

jesusdpa1 commented Dec 1, 2021 •

edited

Loading

holongate commented Dec 12, 2021 •

edited by mihaimaruseac

Loading

drasmuss commented Nov 22, 2022 •

edited

Loading

TuanBC commented Nov 23, 2022 •

edited

Loading

drasmuss commented Mar 1, 2023 •

edited

Loading

drasmuss commented Mar 13, 2023 •

edited

Loading