Restoring TVMOp tests #18542

jinboci · 2020-06-12T03:51:39Z

Description

(Brief description on what this PR is about)
Restoring TVMOp tests. #18204 #18526 #17840

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

mxnet-bot · 2020-06-12T03:51:40Z

Hey @jinboci , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

To trigger all jobs: @mxnet-bot run ci [all]
To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [clang, centos-cpu, miscellaneous, sanity, windows-gpu, windows-cpu, unix-gpu, centos-gpu, unix-cpu, website, edge]

Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

jinboci · 2020-06-12T10:07:00Z

@mxnet-bot run ci [unix-cpu, unix-gpu]

mxnet-bot · 2020-06-12T10:07:09Z

Jenkins CI successfully triggered : [unix-gpu, unix-cpu]

jinboci · 2020-06-12T17:50:17Z

@mxnet-bot run ci [centos-cpu]

mxnet-bot · 2020-06-12T17:50:22Z

Jenkins CI successfully triggered : [centos-cpu]

leezu · 2020-06-12T20:01:40Z

You need to investigate why libcuda is not found in the container. Previously there was a hack of putting /usr/local/cuda/compat on the path, but that may not be the correct solution. AFAIK libcuda will be provided by https://github.com/NVIDIA/nvidia-docker/ inside the container based on the host system libcuda, typically only on a host system with gpus.

yzhliu · 2020-06-18T04:31:25Z

@leezu Just check whether my understanding is correct. libcuda.so exists on the hosts which build mxnet, while it does not exist on hosts which run the tests. libcudart.so exist on both hosts, is it correct?

leezu · 2020-06-18T05:47:41Z

@yzhliu It should be the other way round. Let's open the CI Docker container: docker run -it mxnetci/build.ubuntu_gpu_cu102 /bin/bash and look at the shared libraries in /usr/local/cuda:

root@de49f0e1966c:/work/mxnet# find /usr/local/cuda-10.2 -name "*.so*"
/usr/local/cuda-10.2/compat/libnvidia-ptxjitcompiler.so.440.33.01
/usr/local/cuda-10.2/compat/libcuda.so
/usr/local/cuda-10.2/compat/libcuda.so.1
/usr/local/cuda-10.2/compat/libcuda.so.440.33.01
/usr/local/cuda-10.2/compat/libnvidia-fatbinaryloader.so.440.33.01
/usr/local/cuda-10.2/compat/libnvidia-ptxjitcompiler.so
/usr/local/cuda-10.2/compat/libnvidia-ptxjitcompiler.so.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppim.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcurand.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnpps.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppial.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libOpenCL.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppist.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcuinj64.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libOpenCL.so.1.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppig.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppidei.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolver.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libaccinj64.so.10.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicom.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libaccinj64.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libOpenCL.so.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppif.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufftw.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libaccinj64.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolverMg.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcuinj64.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusparse.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvgraph.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppim.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppicc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcurand.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnpps.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppial.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvrtc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppist.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcuda.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppig.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppidei.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcusolver.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppicom.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppif.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcufftw.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcusolverMg.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcusparse.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvgraph.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvjpeg.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppisu.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppitc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcufft.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvjpeg.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcuinj64.so.10.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvperf_target.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppisu.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppitc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so.10.2.75
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvperf_host.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufft.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvToolsExt.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppisu.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppist.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvjpeg.so.10.3.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppitc.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusparse.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcurand.so.10.1.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc.so.10.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppif.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnpps.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppc.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppial.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnpps.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppidei.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppc.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicom.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc-builtins.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvToolsExt.so.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufftw.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolverMg.so.10.3.0.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufftw.so.10.1.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicc.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicc.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcurand.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicom.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolverMg.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppial.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppist.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusparse.so.10.3.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc-builtins.so.10.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolver.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvgraph.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppim.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvToolsExt.so.1.0.0
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc-builtins.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufft.so.10.1.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppidei.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufft.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppig.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvjpeg.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppig.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolver.so.10.3.0.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppif.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppisu.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppitc.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppim.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvgraph.so.10.2.89
/usr/local/cuda-10.2/nvvm/lib64/libnvvm.so.3.3.0
/usr/local/cuda-10.2/nvvm/lib64/libnvvm.so
/usr/local/cuda-10.2/nvvm/lib64/libnvvm.so.3
/usr/local/cuda-10.2/nvvmx/lib64/libnvvm.so.3.3.0
/usr/local/cuda-10.2/nvvmx/lib64/libnvvm.so
/usr/local/cuda-10.2/nvvmx/lib64/libnvvm.so.3
/usr/local/cuda-10.2/extras/Sanitizer/libsanitizer-public.so

Because we don't use the nvidia docker command to run the container, only stubs/libcuda.so is available. If we're on a host with GPUs, we can use docker run --gpus all -it mxnetci/build.ubuntu_gpu_cu102 /bin/bash and the libcuda.so from the host as well as the host GPUs will be available inside the container. But on a CPU host this just leads to

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded\\\\n\\\"\"": unknown.
ERRO[0000] error waiting for container: context canceled

The problem is that some part of the tvmop setup currenly requires libcuda.so to be available (it's listed as shared library dependency of some shared library that is opened). We need to check which library is introducing the dependency and consider how to fix it. Ideally there shouldn't be a dependency on libcuda.so as it's only available on GPU hosts.

You can also refer to NVIDIA/nvidia-container-toolkit#185 for a little background. The problem with the compat/libcuda.so AFAIK is that it does not necessarily fit the driver version of the host system.

jinboci · 2020-06-18T16:31:34Z

@yzhliu @leezu Thank you for your suggestions. I tried to directly disable the linkage of libcuda.so with

diff --git a/cmake/modules/CUDA.cmake b/cmake/modules/CUDA.cmake
index 936bb681b..32d13de38 100644
--- a/cmake/modules/CUDA.cmake
+++ b/cmake/modules/CUDA.cmake
@@ -35,7 +35,7 @@ if(USE_CUDA)
 
   list(APPEND TVM_LINKER_LIBS ${CUDA_NVRTC_LIBRARY})
   list(APPEND TVM_RUNTIME_LINKER_LIBS ${CUDA_CUDART_LIBRARY})
-  list(APPEND TVM_RUNTIME_LINKER_LIBS ${CUDA_CUDA_LIBRARY})
+  #list(APPEND TVM_RUNTIME_LINKER_LIBS ${CUDA_CUDA_LIBRARY})
   list(APPEND TVM_RUNTIME_LINKER_LIBS ${CUDA_NVRTC_LIBRARY})
 
   if(USE_CUDNN)
diff --git a/cmake/util/FindCUDA.cmake b/cmake/util/FindCUDA.cmake
index f971c87f2..5e2118148 100644
--- a/cmake/util/FindCUDA.cmake
+++ b/cmake/util/FindCUDA.cmake
@@ -58,9 +58,9 @@ macro(find_cuda use_cuda)
   # additional libraries
   if(CUDA_FOUND)
     if(MSVC)
-      find_library(CUDA_CUDA_LIBRARY cuda
-        ${CUDA_TOOLKIT_ROOT_DIR}/lib/x64
-        ${CUDA_TOOLKIT_ROOT_DIR}/lib/Win32)
+      #find_library(CUDA_CUDA_LIBRARY cudart
+        #${CUDA_TOOLKIT_ROOT_DIR}/lib/x64
+        #${CUDA_TOOLKIT_ROOT_DIR}/lib/Win32)
       find_library(CUDA_NVRTC_LIBRARY nvrtc
         ${CUDA_TOOLKIT_ROOT_DIR}/lib/x64
         ${CUDA_TOOLKIT_ROOT_DIR}/lib/Win32)
@@ -74,13 +74,13 @@ macro(find_cuda use_cuda)
         ${CUDA_TOOLKIT_ROOT_DIR}/lib/x64
         ${CUDA_TOOLKIT_ROOT_DIR}/lib/Win32)
     else(MSVC)
-      find_library(_CUDA_CUDA_LIBRARY cuda
-        PATHS ${CUDA_TOOLKIT_ROOT_DIR}
-        PATH_SUFFIXES lib lib64 targets/x86_64-linux/lib targets/x86_64-linux/lib/stubs lib64/stubs
-        NO_DEFAULT_PATH)
-      if(_CUDA_CUDA_LIBRARY)
-        set(CUDA_CUDA_LIBRARY ${_CUDA_CUDA_LIBRARY})
-      endif()
+      #find_library(_CUDA_CUDA_LIBRARY cudart
+        #PATHS ${CUDA_TOOLKIT_ROOT_DIR}
+        #PATH_SUFFIXES lib lib64 targets/x86_64-linux/lib targets/x86_64-linux/lib/stubs lib64/stubs
+        #NO_DEFAULT_PATH)
+      #if(_CUDA_CUDA_LIBRARY)
+        #set(CUDA_CUDA_LIBRARY ${_CUDA_CUDA_LIBRARY})
+      #endif()
       find_library(CUDA_NVRTC_LIBRARY nvrtc
         PATHS ${CUDA_TOOLKIT_ROOT_DIR}
         PATH_SUFFIXES lib lib64 targets/x86_64-linux/lib targets/x86_64-linux/lib/stubs lib64/stubs lib/x86_64-linux-gnu

However, getting errors while building tvm:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/Documents/tvm/python/tvm/__init__.py", line 25, in <module>
    from ._ffi.base import TVMError, __version__
  File "/home/ubuntu/Documents/tvm/python/tvm/_ffi/__init__.py", line 28, in <module>
    from .base import register_error
  File "/home/ubuntu/Documents/tvm/python/tvm/_ffi/base.py", line 62, in <module>
    _LIB, _LIB_NAME = _load_lib()
  File "/home/ubuntu/Documents/tvm/python/tvm/_ffi/base.py", line 50, in _load_lib
    lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL)
  File "/home/ubuntu/anaconda3/lib/python3.7/ctypes/__init__.py", line 364, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/ubuntu/Documents/tvm/build/libtvm.so: undefined symbol: cuLaunchKernel

It seems that cuLaunchKernel is one function needed from libcuda.so (I am not sure if it is). How could we call this function without linking libcuda.so?

leezu · 2020-06-18T18:11:08Z

@jinboci would it be possible to dlopen libcuda at runtime?

jinboci · 2020-06-18T18:29:02Z

@leezu @yzhliu Hi, I am still unclear about:

Does the machine in CI that builds mxnet provide libcuda.so?
When USE_TVM_OP is OFF, does building mxnet require the dependencies on libcuda.so?

I compiled mxnet with USE_TVM_OP OFF and USE_CUDA USE_CUDNN ON, and got:

(base) ubuntu@ip-172-31-37-194:~/Documents/mxnet/build$ ldd libmxnet.so
        linux-vdso.so.1 (0x00007ffda2ae3000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f68de615000)
        libopenblas.so.0 => /usr/local/lib/libopenblas.so.0 (0x00007f68dd688000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f68dd480000)
        libomp.so => /home/ubuntu/Documents/mxnet/build/3rdparty/openmp/runtime/src/libomp.so (0x00007f68dd19a000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f68dcf7b000)
        libcudnn.so.7 => /usr/local/cuda/lib64/libcudnn.so.7 (0x00007f68c795c000)
        libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007f68c6774000)
        libnvidia-ml.so.1 => /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 (0x00007f68c614e000)
        libnccl.so.2 => /usr/local/cuda/lib/libnccl.so.2 (0x00007f68bf6fa000)
        libopencv_imgcodecs.so.4.2 => /usr/local/lib/libopencv_imgcodecs.so.4.2 (0x00007f68bed0d000)
        libopencv_imgproc.so.4.2 => /usr/local/lib/libopencv_imgproc.so.4.2 (0x00007f68bd409000)
        libopencv_core.so.4.2 => /usr/local/lib/libopencv_core.so.4.2 (0x00007f68bc124000)
        libcudart.so.10.0 => /usr/local/cuda/lib64/libcudart.so.10.0 (0x00007f68bbeaa000)
        libcufft.so.10.0 => /usr/local/cuda/lib64/libcufft.so.10.0 (0x00007f68b59f6000)
        libcublas.so.10.0 => /usr/local/cuda/lib64/libcublas.so.10.0 (0x00007f68b1460000)
        libcusolver.so.10.0 => /usr/local/cuda/lib64/libcusolver.so.10.0 (0x00007f68a8d79000)
        libcurand.so.10.0 => /usr/local/cuda/lib64/libcurand.so.10.0 (0x00007f68a4c12000)
        libnvrtc.so.10.0 => /usr/local/cuda/lib64/libnvrtc.so.10.0 (0x00007f68a35f6000)
        libnvToolsExt.so.1 => /usr/local/cuda/lib64/libnvToolsExt.so.1 (0x00007f68a33ed000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f68a3064000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f68a2cc6000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f68a2aae000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f68a26bd000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f6905ee6000)
        libgfortran.so.4 => /usr/lib/x86_64-linux-gnu/libgfortran.so.4 (0x00007f68a22de000)
        libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f68a20af000)
        libjpeg.so.8 => /usr/lib/x86_64-linux-gnu/libjpeg.so.8 (0x00007f68a1e47000)
        libpng16.so.16 => /usr/lib/x86_64-linux-gnu/libpng16.so.16 (0x00007f68a1c15000)
        libtiff.so.5 => /usr/lib/x86_64-linux-gnu/libtiff.so.5 (0x00007f68a199e000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f68a1781000)
        libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f68a1541000)
        liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f68a131b000)
        libjbig.so.0 => /usr/lib/x86_64-linux-gnu/libjbig.so.0 (0x00007f68a110d000)

jinboci · 2020-06-18T18:37:39Z

@leezu I set some breakpoints. I am not sure if this is okay. By only building TVM:

>>> import tvm
> /home/ubuntu/anaconda3/lib/python3.7/ctypes/__init__.py(365)__init__()
-> self._handle = _dlopen(self._name, mode)
(Pdb) c
> /home/ubuntu/Documents/tvm/python/tvm/_ffi/base.py(51)_load_lib()
-> lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL)
(Pdb) c
> /home/ubuntu/anaconda3/lib/python3.7/ctypes/__init__.py(365)__init__()
-> self._handle = _dlopen(self._name, mode)
(Pdb) _dlopen("libcuda.so")
94685554591904

leezu · 2020-06-18T20:22:43Z

@jinboci for the libmxnet.so, it currently has a libcuda dependency when compiled with nvrtc. This will be fixed eventually (#17858), but if it blocks the TVMOp tests, I suggest you simply disable nvrtc feature in the tvmop builds. Then the dependency on libcuda.so.1 in libmxnet.so will disappear.

You need to check if the error is due to libmxnet.so or libtvm.so. Once you have identified the cause, the next step is to look into fixing it.

yzhliu · 2020-06-19T00:59:18Z

@leezu in CI mxnet is built without nvrtc?

leezu · 2020-06-19T05:10:38Z

@yzhliu NVRTC is enabled by default and thus built by the CI unless disabled: https://github.com/apache/incubator-mxnet/blob/497bf7efb403a9174817f07ab3d2f9be033845ad/CMakeLists.txt#L82

If libmxnet's dependency is causing the issue, we can just disable this flag in the TVMOp builds, until libmxnet.so is fixed. Based on the error logs posted in this issue, I'm not sure though if the error is due to libtvm or libmxnet

Ubuntu and others added 13 commits May 7, 2020 06:37

fix the error message of reshape()

6774474

Fixing issue apache#16655 reshape() error message

29ddefe

test pr

7627eaa

fixing apache#17840

087a100

fixing issue apache#17840

75f975b

Merge remote-tracking branch 'upstream/master' into new_branch

aafaa99

fixing issue apache#17840

84c389a

remove blankspace

47ba73f

fixing bugs

060a3b1

fixing bugs

46256d4

fixing cpu errors

bb8727b

fixing review issues

1e406c7

restoring tvmop tests

c4dbc64

Update multiarray.py

50bf350

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restoring TVMOp tests #18542

Restoring TVMOp tests #18542

jinboci commented Jun 12, 2020

mxnet-bot commented Jun 12, 2020

jinboci commented Jun 12, 2020

mxnet-bot commented Jun 12, 2020

jinboci commented Jun 12, 2020

mxnet-bot commented Jun 12, 2020

leezu commented Jun 12, 2020

yzhliu commented Jun 18, 2020

leezu commented Jun 18, 2020 •

edited

Loading

jinboci commented Jun 18, 2020

leezu commented Jun 18, 2020

jinboci commented Jun 18, 2020 •

edited

Loading

jinboci commented Jun 18, 2020

leezu commented Jun 18, 2020 •

edited

Loading

yzhliu commented Jun 19, 2020

leezu commented Jun 19, 2020 •

edited

Loading

Restoring TVMOp tests #18542

Are you sure you want to change the base?

Restoring TVMOp tests #18542

Conversation

jinboci commented Jun 12, 2020

Description

Checklist

Essentials

Changes

Comments

mxnet-bot commented Jun 12, 2020

jinboci commented Jun 12, 2020

mxnet-bot commented Jun 12, 2020

jinboci commented Jun 12, 2020

mxnet-bot commented Jun 12, 2020

leezu commented Jun 12, 2020

yzhliu commented Jun 18, 2020

leezu commented Jun 18, 2020 • edited Loading

jinboci commented Jun 18, 2020

leezu commented Jun 18, 2020

jinboci commented Jun 18, 2020 • edited Loading

jinboci commented Jun 18, 2020

leezu commented Jun 18, 2020 • edited Loading

yzhliu commented Jun 19, 2020

leezu commented Jun 19, 2020 • edited Loading

leezu commented Jun 18, 2020 •

edited

Loading

jinboci commented Jun 18, 2020 •

edited

Loading

leezu commented Jun 18, 2020 •

edited

Loading

leezu commented Jun 19, 2020 •

edited

Loading