Skip to content
This repository has been archived by the owner on Apr 1, 2021. It is now read-only.

Error Building torch_tvm [NGC Container] #112

Closed
SrivastavaKshitij opened this issue Sep 6, 2019 · 14 comments
Closed

Error Building torch_tvm [NGC Container] #112

SrivastavaKshitij opened this issue Sep 6, 2019 · 14 comments

Comments

@SrivastavaKshitij
Copy link

I am trying to build torch_tvm inside pytorch ngc container [19.08-py3]. However, I am encountering the same error as in #77 .

CMakeFiles/_torch_tvm.dir/build.make:218: recipe for target 'CMakeFiles/_torch_tvm.dir/torch_tvm/fusion_pass.cpp.o' failed
make[2]: *** [CMakeFiles/_torch_tvm.dir/torch_tvm/fusion_pass.cpp.o] Error 1
In file included from /tvm/torch_tvm/compiler.h:13:0,
                 from /tvm/torch_tvm/register.cpp:8:
/tvm/torch_tvm/memory_utils.h: In member function ‘void torch_tvm::utils::DLManagedTensorDeleter::operator()(DLManagedTensor*)’:
/tvm/torch_tvm/memory_utils.h:22:24: warning: deleting ‘void*’ is undefined [-Wdelete-incomplete]
       delete dl_tensor.data;
                        ^~~~
CMakeFiles/Makefile2:73: recipe for target 'CMakeFiles/_torch_tvm.dir/all' failed
make[1]: *** [CMakeFiles/_torch_tvm.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2
Traceback (most recent call last):
  File "setup.py", line 273, in <module>
    url='https://github.com/pytorch/tvm',
  File "/opt/conda/lib/python3.6/site-packages/setuptools/__init__.py", line 145, in setup
    return distutils.core.setup(**attrs)
  File "/opt/conda/lib/python3.6/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/opt/conda/lib/python3.6/distutils/dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "/opt/conda/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "setup.py", line 203, in run
    setuptools.command.install.install.run(self)
  File "/opt/conda/lib/python3.6/site-packages/setuptools/command/install.py", line 65, in run
    orig.install.run(self)
  File "/opt/conda/lib/python3.6/distutils/command/install.py", line 545, in run
    self.run_command('build')
  File "/opt/conda/lib/python3.6/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.6/distutils/command/build.py", line 135, in run
    self.run_command(cmd_name)
  File "/opt/conda/lib/python3.6/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "setup.py", line 187, in run
    self.run_command('cmake_build')
  File "/opt/conda/lib/python3.6/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "setup.py", line 176, in run
    self._run_build()
  File "setup.py", line 165, in _run_build
    subprocess.check_call(build_args)
  File "/opt/conda/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/cmake', '--build', '.', '--', '-j', '12']' returned non-zero exit status 2.

I tried different methods described here , here and here but I havent had any success.

How can this issue be fixed ?

@SrivastavaKshitij
Copy link
Author

Downloaded llvm using wget http://releases.llvm.org/8.0.0/clang+llvm-8.0.0-x86_64-linux-gnu-ubuntu-16.04.tar.xz and made sure its in path when building torch_tvm

@SrivastavaKshitij
Copy link
Author

I tried nightly build container from Pytorch docker hub pytorch/pytorch:nightly-devel-cuda10.0-cudnn7 and I encounter same errors.

@kimishpatel
Copy link
Contributor

Can you paste repro instructions? It is not clear where the error is coming from. The memory_utils.h stuff seems like a warning and it does not seem that treating warning as error is enabled either.

@SrivastavaKshitij
Copy link
Author

Repro instructions:

docker pull nvcr.io/nvidia/pytorch:19.06-py3
docker_image=nvcr.io/nvidia/pytorch:19.06-py3
docker run -e NVIDIA_VISIBLE_DEVICES=0 --gpus 0 -it --shm-size=1g --ulimit memlock=-1  --rm  -v $PWD:/workspace/work $docker_image

[Inside the container], I go to the base directory : cd /
wget http://releases.llvm.org/8.0.0/clang+llvm-8.0.0-x86_64-linux-gnu-ubuntu-16.04.tar.xz
tar -xf clang+llvm-8.0.0-x86_64-linux-gnu-ubuntu-16.04.tar.xz
export PATH=$PATH:/clang+llvm-8.0.0-x86_64-linux-gnu-ubuntu-16.04/bin/
ln -s /clang+llvm-8.0.0-x86_64-linux-gnu-ubuntu-16.04/bin/llvm-config /usr/bin/llvm-config
git clone --recursive https://github.com/pytorch/tvm.git
cd tvm/
python setup.py install --cmake

I have attached the full output:

build.txt

@kimishpatel
Copy link
Contributor

@SrivastavaKshitij, error seems to be coming from change in pytorch API.

/tvm/torch_tvm/compiler.cpp: In static member function ‘static tvm::relay::Var TVMCompiler::convertToRelay(torch::jit::Value*, TVMContext)’:
/tvm/torch_tvm/compiler.cpp:130:39: error: ‘using element_type = struct c10::TensorType {aka struct c10::TensorType}’ has no member named ‘device’
     auto optional_device_type = pt_t->device();
                                       ^~~~~~

Maybe try with latest release?

@kimishpatel
Copy link
Contributor

@bwasti ^^

@SrivastavaKshitij
Copy link
Author

@kimishpatel : I tried the latest ngc container [19.08-py3] and have the same error

@SrivastavaKshitij
Copy link
Author

I was wondering if there is any update ?

@bwasti
Copy link
Contributor

bwasti commented Sep 17, 2019

I'm not entirely sure what version of PT NGC containers are shipping, but we've kept this repo up to date with PyTorch's master branch. Would you be able to try building PyTorch from source first? There is an API mismatch in the build that indicates you are using too old a version of PT.

@SrivastavaKshitij
Copy link
Author

SrivastavaKshitij commented Sep 17, 2019

I have to try torch_tvm on different gpus present in different workstations and so the feasible way for me is to build one docker image and pass it around. There is a latest docker image from pytorch on Docker Hub that was released 4 days ago. I used 1.2-cuda10.0-cudnn7-devel tag and I still get the same error.

@bwasti
Copy link
Contributor

bwasti commented Sep 17, 2019

that image is shipped with PT 1.2, which is unfortunately not compatible with torch_tvm. Can you build a docker image with PT built from source with a recent master checkout instead?

@SrivastavaKshitij
Copy link
Author

SrivastavaKshitij commented Sep 18, 2019

Hey @bwasti : I was able to create a docker image as you suggested. It works. Here are the steps if anybody wants to install torch_tvm inside a container.

Also, is it possible to package torch_tvm as a part of pytorch container in future ? Reason: It's a very cumbersome process to install torch_tvm inside a container , phew !!

@doublejtoh
Copy link

doublejtoh commented Dec 19, 2019

Hi, @SrivastavaKshitij
Thanks to your steps to install torch tvm,
while following your suggestions, i successfully installed torch tvm,

but i got below import error, as you previously suffered.

Can you inform me the exact version of pytorch you built?

@SrivastavaKshitij
Copy link
Author

I did it many months ago but i think it was pytorch 1.2 from master.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants