Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker installation error #646

Closed
AlexandderGorodetski opened this issue Nov 2, 2022 · 17 comments · Fixed by #653
Closed

Docker installation error #646

AlexandderGorodetski opened this issue Nov 2, 2022 · 17 comments · Fixed by #653

Comments

@AlexandderGorodetski
Copy link

Hi All,

After the following check list I have a below error. Could you help please

  1. git clone https://github.com/k2-fsa/icefall
  2. cd docker/Ubuntu18.04-pytorch1.12.1-cuda11.3-cudnn8
  3. docker build -t icefall/pytorch1.12.1 .

During the running I got following warning and error:

CMake Warning at /opt/conda/lib/python3.7/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
/opt/conda/lib/python3.7/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
cmake/torch.cmake:11 (find_package)
CMakeLists.txt:292 (include)

-- Found Torch: /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch.so
-- PyTorch version: 1.13.0
-- PyTorch cuda version: None
CMake Error at cmake/torch.cmake:52 (message):
PyTorch 1.13.0 is compiled with CUDA None.

But you are using CUDA 11.3 to compile k2.

Please try to use the same CUDA version for PyTorch and k2.

You can remove this check if you are sure this will not cause problems

Call Stack (most recent call first):
CMakeLists.txt:292 (include)

@csukuangfj
Copy link
Collaborator

Could you please post the complete logs?

@AlexandderGorodetski
Copy link
Author

ubuntu@ip-10-12-2-14:/fsx_new/Alex/tmp1/icefall/docker/Ubuntu18.04-pytorch1.12.1-cuda11.3-cudnn8$ docker build -t icefall/pytorch1.12.1 .
Sending build context to Docker daemon 3.584kB
Step 1/10 : FROM pytorch/pytorch:1.12.1-cuda11.3-cudnn8-devel
---> fa50f7fed43a
Step 2/10 : RUN apt-get update && apt-get install -y --no-install-recommends g++ make automake autoconf bzip2 unzip wget sox libtool git subversion zlib1g-dev gfortran ca-certificates patch ffmpeg valgrind libssl-dev vim curl
---> Using cache
---> bf6960138e11
Step 3/10 : RUN wget -P /opt https://cmake.org/files/v3.18/cmake-3.18.0.tar.gz && cd /opt && tar -zxvf cmake-3.18.0.tar.gz && cd cmake-3.18.0 && ./bootstrap && make && make install && rm -rf cmake-3.18.0.tar.gz && find /opt/cmake-3.18.0 -type f ( -name ".o" -o -name ".la" -o -name ".a" ) -exec rm {} ; && cd -
---> Using cache
---> fcf64904a68e
Step 4/10 : RUN wget -P /opt https://downloads.xiph.org/releases/flac/flac-1.3.2.tar.xz && cd /opt && xz -d flac-1.3.2.tar.xz && tar -xvf flac-1.3.2.tar && cd flac-1.3.2 && ./configure && make && make install && rm -rf flac-1.3.2.tar && find /opt/flac-1.3.2 -type f ( -name "
.o" -o -name ".la" -o -name ".a" ) -exec rm {} ; && cd -
---> Using cache
---> 109ffa0a5c19
Step 5/10 : RUN pip install kaldiio graphviz && conda install -y -c pytorch torchaudio
---> Using cache
---> f98e65e2b2ba
Step 6/10 : RUN git clone https://github.com/k2-fsa/k2.git /opt/k2 && cd /opt/k2 && python3 setup.py install && cd -
---> Running in 174b6c2243b7
Cloning into '/opt/k2'...
running install
running bdist_egg
running egg_info
creating k2.egg-info
writing k2.egg-info/PKG-INFO
writing dependency_links to k2.egg-info/dependency_links.txt
writing requirements to k2.egg-info/requires.txt
writing top-level names to k2.egg-info/top_level.txt
writing manifest file 'k2.egg-info/SOURCES.txt'
reading manifest file 'k2.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'k2.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/k2
copying k2/python/k2/symbol_table.py -> build/lib.linux-x86_64-3.7/k2
copying k2/python/k2/dense_fsa_vec.py -> build/lib.linux-x86_64-3.7/k2
copying k2/python/k2/autograd_utils.py -> build/lib.linux-x86_64-3.7/k2
copying k2/python/k2/utils.py -> build/lib.linux-x86_64-3.7/k2
copying k2/python/k2/autograd.py -> build/lib.linux-x86_64-3.7/k2
copying k2/python/k2/fsa_algo.py -> build/lib.linux-x86_64-3.7/k2
copying k2/python/k2/mutual_information.py -> build/lib.linux-x86_64-3.7/k2
copying k2/python/k2/nbest.py -> build/lib.linux-x86_64-3.7/k2
copying k2/python/k2/rnnt_decode.py -> build/lib.linux-x86_64-3.7/k2
copying k2/python/k2/init.py -> build/lib.linux-x86_64-3.7/k2
copying k2/python/k2/ops.py -> build/lib.linux-x86_64-3.7/k2
copying k2/python/k2/fsa_properties.py -> build/lib.linux-x86_64-3.7/k2
copying k2/python/k2/ctc_loss.py -> build/lib.linux-x86_64-3.7/k2
copying k2/python/k2/fsa.py -> build/lib.linux-x86_64-3.7/k2
copying k2/python/k2/rnnt_loss.py -> build/lib.linux-x86_64-3.7/k2
creating build/lib.linux-x86_64-3.7/k2/ragged
copying k2/python/k2/ragged/init.py -> build/lib.linux-x86_64-3.7/k2/ragged
creating build/lib.linux-x86_64-3.7/k2/sparse
copying k2/python/k2/sparse/autograd.py -> build/lib.linux-x86_64-3.7/k2/sparse
copying k2/python/k2/sparse/init.py -> build/lib.linux-x86_64-3.7/k2/sparse
creating build/lib.linux-x86_64-3.7/k2/version
copying k2/python/k2/version/main.py -> build/lib.linux-x86_64-3.7/k2/version
copying k2/python/k2/version/version.py -> build/lib.linux-x86_64-3.7/k2/version
copying k2/python/k2/version/init.py -> build/lib.linux-x86_64-3.7/k2/version
running build_ext
-- CMAKE_VERSION: 3.18.0
-- Enabled languages: CXX;CUDA
-- The CXX compiler identification is GNU 7.5.0
-- The CUDA compiler identification is NVIDIA 11.3.109
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- K2_OS:
-- Found Git: /usr/bin/git (found version "2.17.1")
-- Looking for C++ include cxxabi.h
-- Looking for C++ include cxxabi.h - found
-- Looking for C++ include execinfo.h
-- Looking for C++ include execinfo.h - found
-- Performing Test K2_COMPILER_SUPPORTS_CXX14
-- Performing Test K2_COMPILER_SUPPORTS_CXX14 - Success
-- C++ Standard version: 14
-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX
-- K2_COMPUTE_ARCH_FLAGS: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
-- K2_COMPUTE_ARCH_CANDIDATES 35;50;60;61;70;75;80;86
-- Adding arch 35
-- Adding arch 50
-- Adding arch 60
-- Adding arch 61
-- Adding arch 70
-- Adding arch 75
-- Adding arch 80
-- Adding arch 86
-- K2_COMPUTE_ARCHS: 35;50;60;61;70;75;80;86
-- Found Valgrind: /usr/bin
-- Found Valgrind: /usr/bin/valgrind
-- To check memory, run ctest -R <NAME> -D ExperimentalMemCheck
-- Downloading pybind11
-- pybind11 is downloaded to /opt/k2/build/temp.linux-x86_64-3.7/_deps/pybind11-src
-- pybind11 v2.11.0 dev1
-- Found PythonInterp: /opt/conda/bin/python3 (found suitable version "3.7.13", minimum required is "3.6")
-- Found PythonLibs: /opt/conda/lib/libpython3.7m.so
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- Python executable: /opt/conda/bin/python3
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
CMake Warning at /opt/conda/lib/python3.7/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
/opt/conda/lib/python3.7/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
cmake/torch.cmake:11 (find_package)
CMakeLists.txt:292 (include)

-- Found Torch: /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch.so
-- PyTorch version: 1.13.0
-- PyTorch cuda version: None
CMake Error at cmake/torch.cmake:52 (message):
PyTorch 1.13.0 is compiled with CUDA None.

But you are using CUDA 11.3 to compile k2.

Please try to use the same CUDA version for PyTorch and k2.

You can remove this check if you are sure this will not cause problems

Call Stack (most recent call first):
CMakeLists.txt:292 (include)

-- Configuring incomplete, errors occurred!
See also "/opt/k2/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeOutput.log".
See also "/opt/k2/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeError.log".
cat: k2/csrc/version.h: No such file or directory
make: *** No rule to make target '_k2'. Stop.
/opt/conda/lib/python3.7/site-packages/setuptools/dist.py:516: UserWarning: Normalizing '1.21.dev20221102+cudaNone.torch1.13.0' to '1.21.dev20221102+cudanone.torch1.13.0'
warnings.warn(tmpl.format(**locals()))
/opt/conda/lib/python3.7/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
setuptools.SetuptoolsDeprecationWarning,
/opt/conda/lib/python3.7/site-packages/setuptools/command/easy_install.py:147: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
EasyInstallDeprecationWarning,
cmake_path: /usr/local/bin/cmake
For fast compilation, run:
export K2_MAKE_ARGS="-j"; python setup.py install
Setting make_args to '-j4'
Setting PYTHON_EXECUTABLE to /opt/conda/bin/python3
build command is:

            cd build/temp.linux-x86_64-3.7

            cmake -DCMAKE_BUILD_TYPE=Release -DPYTHON_EXECUTABLE=/opt/conda/bin/python3 -DK2_ENABLE_BENCHMARK=OFF  -DK2_ENABLE_TESTS=OFF  -DCMAKE_INSTALL_PREFIX=/opt/k2/build/lib.linux-x86_64-3.7/k2  /opt/k2

            cat k2/csrc/version.h

            make  -j4  _k2 k2_torch_api install

Traceback (most recent call last):
File "setup.py", line 237, in
'Topic :: Scientific/Engineering :: Artificial Intelligence',
File "/opt/conda/lib/python3.7/site-packages/setuptools/init.py", line 87, in setup
return distutils.core.setup(**attrs)
File "/opt/conda/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 148, in setup
return run_commands(dist)
File "/opt/conda/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 163, in run_commands
dist.run_commands()
File "/opt/conda/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 967, in run_commands
self.run_command(cmd)
File "/opt/conda/lib/python3.7/site-packages/setuptools/dist.py", line 1214, in run_command
super().run_command(command)
File "/opt/conda/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.7/site-packages/setuptools/command/install.py", line 74, in run
self.do_egg_install()
File "/opt/conda/lib/python3.7/site-packages/setuptools/command/install.py", line 123, in do_egg_install
self.run_command('bdist_egg')
File "/opt/conda/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/opt/conda/lib/python3.7/site-packages/setuptools/dist.py", line 1214, in run_command
super().run_command(command)
File "/opt/conda/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", line 165, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "/opt/conda/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", line 151, in call_command
self.run_command(cmdname)
File "/opt/conda/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/opt/conda/lib/python3.7/site-packages/setuptools/dist.py", line 1214, in run_command
super().run_command(command)
File "/opt/conda/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.7/site-packages/setuptools/command/install_lib.py", line 11, in run
self.build()
File "/opt/conda/lib/python3.7/site-packages/setuptools/_distutils/command/install_lib.py", line 107, in build
self.run_command('build_ext')
File "/opt/conda/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/opt/conda/lib/python3.7/site-packages/setuptools/dist.py", line 1214, in run_command
super().run_command(command)
File "/opt/conda/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/opt/conda/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 339, in run
self.build_extensions()
File "/opt/conda/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 448, in build_extensions
self._build_extensions_serial()
File "/opt/conda/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 473, in _build_extensions_serial
self.build_extension(ext)
File "setup.py", line 179, in build_extension
raise Exception('Failed to build k2')
Exception: Failed to build k2
The command '/bin/sh -c git clone https://github.com/k2-fsa/k2.git /opt/k2 && cd /opt/k2 && python3 setup.py install && cd -' returned a non-zero code: 1
ubuntu@ip-10-12-2-14:/fsx_new/Alex/tmp1/icefall/docker/Ubuntu18.04-pytorch1.12.1-cuda11.3-cudnn8$

@csukuangfj
Copy link
Collaborator

I think there might be some issues with the docker image.

What is the output of the following command when it is run from within the docker container before installing k2?

python3 -m torch.utils.collect_env

@AlexandderGorodetski
Copy link
Author

Unfortunately I cannot start the docker.

I do following:

docker start funny_einstein

Now, I am trying to run

docker ps

And I do not see that the docker funny_enstein was started.

@csukuangfj
Copy link
Collaborator

Please try

docker pull pytorch/pytorch:1.12.1-cuda11.3-cudnn8-devel
docker run -it pytorch/pytorch:1.12.1-cuda11.3-cudnn8-devel

and it should present you a terminal. Please enter the following command in the terminal

python3 -m torch.utils.collect_env

@AlexandderGorodetski
Copy link
Author

Before I doing that. Could you please look on the output of nvidia-smi. Maybe I should use another pytorch docker (something suitable for 11.4) ?

ubuntu@ip-10-12-2-14:/fsx_new$ nvidia-smi
Wed Nov 2 13:56:14 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:00:1B.0 Off | 0 |
| N/A 26C P0 34W / 300W | 0MiB / 16160MiB | 0% E. Process |
| | | N/A |

@AlexandderGorodetski
Copy link
Author

In any case following is the output that you requested

ubuntu@ip-10-12-2-14:/fsx_new/Alex/tmp1/icefall/docker/Ubuntu18.04-pytorch1.12.1-cuda11.3-cudnn8$ docker exec -it icefall /bin/bash
root@185d009a60c8:/workspace# python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.12.1
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.17

Python version: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.4.0-1060-aws-x86_64-with-debian-buster-sid
Is CUDA available: False
CUDA runtime version: 11.3.109
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.21.5
[pip3] torch==1.12.1
[pip3] torchtext==0.13.1
[pip3] torchvision==0.13.1
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 ha36c431_9 nvidia
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py37h7f8727e_0
[conda] mkl_fft 1.3.1 py37hd3c417c_0
[conda] mkl_random 1.2.2 py37h51133e4_0
[conda] numpy 1.21.5 py37he7a7128_2
[conda] numpy-base 1.21.5 py37hf524024_2
[conda] pytorch 1.12.1 py3.7_cuda11.3_cudnn8.3.2_0 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torchtext 0.13.1 py37 pytorch
[conda] torchvision 0.13.1 py37_cu113 pytorch
root@185d009a60c8:/workspace#

@csukuangfj
Copy link
Collaborator

It is fine to use CUDA 11.3 as it is lower than the version displayed by nvidia-smi.

I think there are no PyTorch versions that support CUDA 11.4.

@AlexandderGorodetski
Copy link
Author

Is there some problem with the following output?

ubuntu@ip-10-12-2-14:/fsx_new/Alex/tmp1/icefall/docker/Ubuntu18.04-pytorch1.12.1-cuda11.3-cudnn8$ docker exec -it icefall /bin/bash
root@185d009a60c8:/workspace# python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.12.1
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.17

Python version: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.4.0-1060-aws-x86_64-with-debian-buster-sid
Is CUDA available: False
CUDA runtime version: 11.3.109
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.21.5
[pip3] torch==1.12.1
[pip3] torchtext==0.13.1
[pip3] torchvision==0.13.1
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 ha36c431_9 nvidia
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py37h7f8727e_0
[conda] mkl_fft 1.3.1 py37hd3c417c_0
[conda] mkl_random 1.2.2 py37h51133e4_0
[conda] numpy 1.21.5 py37he7a7128_2
[conda] numpy-base 1.21.5 py37hf524024_2
[conda] pytorch 1.12.1 py3.7_cuda11.3_cudnn8.3.2_0 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torchtext 0.13.1 py37 pytorch
[conda] torchvision 0.13.1 py37_cu113 pytorch
root@185d009a60c8:/workspace#

@csukuangfj
Copy link
Collaborator

Please pay attention to the output

Is CUDA available: False

That may explain why it complained when installing k2.

@teowenshen
Copy link
Contributor

Can you run the container that was built successfully before the dockerfile hit the error?

docker ps -a

Get the container ID of the top or second top container that should be dead.

docker exec -it XXX bash

then share your conda list output.

@AlexandderGorodetski
Copy link
Author

How can I solve the problem of

Is CUDA available: False

?

@teowenshen
Copy link
Contributor

One of the steps might have uninstalled the cuda available PyTorch and installed the CPU PyTorch instead. The image by default ships with cuda available PyTorch.

It could be the installation of torchaudio as part of Lhotse's requirement that have uninstalled the cuda version of PyTorch.

In this dockerfile I have hard-coded torchaudio version to 0.12. If you still can't fix it, can you try this?

FROM pytorch/pytorch:1.12.1-cuda11.3-cudnn8-devel

# install normal source
RUN apt-get update && \
    apt-get install -yq --no-install-recommends \
        g++ \
        make \
        automake \
        autoconf \
        bzip2 \
        unzip \
        wget \
        sox \
        libtool \
        git \
        subversion \
        zlib1g-dev \
        gfortran \
        ca-certificates \
        patch \
        ffmpeg \
        valgrind \
		libssl-dev \
	    vim \
		curl 

# cmake
RUN wget -P /opt https://cmake.org/files/v3.18/cmake-3.18.0.tar.gz && \
    cd /opt && \
    tar -zxvf cmake-3.18.0.tar.gz && \
    cd cmake-3.18.0 && \
    ./bootstrap && \
    make && \
    make install && \
    rm -rf cmake-3.18.0.tar.gz && \
    find /opt/cmake-3.18.0 -type f \( -name "*.o" -o -name "*.la" -o -name "*.a" \) -exec rm {} \; && \
    cd -
	
# flac 
RUN wget -P /opt https://downloads.xiph.org/releases/flac/flac-1.3.2.tar.xz  && \
    cd /opt && \ 
    xz -d flac-1.3.2.tar.xz && \
    tar -xvf flac-1.3.2.tar && \
    cd flac-1.3.2 && \
    ./configure && \
    make && make install && \
    rm -rf flac-1.3.2.tar && \
    find /opt/flac-1.3.2  -type f \( -name "*.o" -o -name "*.la" -o -name "*.a" \) -exec rm {} \; && \
    cd - 

RUN conda install -y -c pytorch torchaudio=0.12 && \
	pip install kaldiio graphviz 

#install k2 from source
RUN git clone https://github.com/k2-fsa/k2.git /opt/k2 && \
    cd /opt/k2 && \
    python3 setup.py install 

# install  lhotse
RUN pip install git+https://github.com/lhotse-speech/lhotse

RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
	cd /workspace/icefall && \
	pip install -r requirements.txt
	
RUN git clone https://github.com/Minami-Lab-UEC/sherpa.git /workspace/sherpa && \
	cd /workspace/sherpa && \
	pip install -r ./requirements.txt && \
	python3 setup.py install 

ENV PYTHONPATH=/workspace/icefall:$PYTHONPATH

WORKDIR /workspace/icefall

@AlexandderGorodetski
Copy link
Author

@teowenshen , thanks a lot.

It seems that your upper script solved the problem.

Thank you so much.

@teowenshen
Copy link
Contributor

Great. I will change the dockerfile tomorrow to close this issue.

There's another typo on the README.md that I've been meaning to correct too.

The correct syntax for docker run to mount a volume is {path/in/host/machine}:{path/in/container}.

@ahmedalbahnasawy
Copy link
Contributor

ahmedalbahnasawy commented Nov 14, 2022

@csukuangfj Kindly add

pip3 install kaldifeat

to the Dockerfile

@csukuangfj
Copy link
Collaborator

@csukuangfj Kindly add


pip3 install kaldifeat

to the Dockerfile

Could you please make a PR to add kaldifeat?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants