Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TVMError: src/runtime/cuda/cuda_module.cc:93: CUDAError: cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_INVALID_PTX #1027

Closed
arassadin opened this issue Mar 20, 2018 · 14 comments

Comments

@arassadin
Copy link

Hi everyone.

I got such error reproducing toy example from nnvm but with my own model. Calling

m.run()

I get the error similar to #315 (comment):

TVMError: [09:11:33] src/runtime/cuda/cuda_module.cc:93: CUDAError: cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_INVALID_PTX

Can you clarify me what can be wrong now?

Thanks in advance!


BTW, I'm a bit confused by tvm.gpu() docstring 😃:

Construct a CPU device

@arassadin
Copy link
Author

@tqchen already commented in #315 (comment):

it is likely the gpu schedule for nchw did not work for your specific shape of conv2d and the nvcc compiler failed to compile

But it's still not really clear for me where to looking further.

@eqy
Copy link
Contributor

eqy commented Mar 21, 2018

Are you using a custom schedule for your model? Usually this is caused by a schedule not being able to handle a specific input shape e.g., the input shape causes too much local memory to be used or too many threads per blocked to be allocated.

@arassadin
Copy link
Author

Hi,

My code is exactly the same as here with the only difference - the model is my own. Its input is 288x512x3 which, I suppose, not too much for the GTX 1070 with 8 Gb.

@eqy
Copy link
Contributor

eqy commented Mar 21, 2018

Ok, then the issue is likely due to an operator not handling one or more of the shapes in your model correctly. One way to verify this is to temporarily try out more common shapes, e.g., those in Resnet-18 such as (224x224x3) and see that works.

@arassadin
Copy link
Author

Ok, thanks for the tip, I'll try it

@arassadin
Copy link
Author

Actually, following the original example with Keras ResNet-50 model, I got the error even earlier, on

with nnvm.compiler.build_config(opt_level=2):
    graph, lib, params = nnvm.compiler.build(sym, 'cuda', {'data': (1, 3, 224, 224)}, params=params)
---------------------------------------------------------------------------
NNVMError                                 Traceback (most recent call last)
<ipython-input-10-3c6e3b9f11a9> in <module>()
      1 with nnvm.compiler.build_config(opt_level=2):
----> 2     graph, lib, params = nnvm.compiler.build(sym, 'cuda', {'data': (1, 3, 224, 224)}, params=params)

/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/compiler/build_module.pyc in build(graph, target, shape, dtype, params, target_host)
    235     # Precompute prune
    236     if params and cfg.pass_enabled("PrecomputePrune"):
--> 237         graph, params = precompute_prune(graph, params)
    238         shape, dtype = _update_shape_dtype(shape, dtype, params)
    239     # Operator Fusion and generation

/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/compiler/build_module.pyc in precompute_prune(graph, params)
    328         return graph, params
    329     with tvm.build_config(auto_unroll_max_step=0):
--> 330         out_arrs = _run_graph(pre_graph, params)
    331     return graph, dict(zip(out_names, out_arrs))

/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/compiler/build_module.pyc in _run_graph(graph, params)
    277     _, oshape = graph_util.infer_shape(graph, **shape)
    278     _, odtype = graph_util.infer_dtype(graph, **dtype)
--> 279     graph, libmod, _ = build(graph, target, shape, dtype)
    280     m = graph_runtime.create(graph, libmod, ctx)
    281     set_input, run, get_output = m["set_input"], m["run"], m["get_output"]

/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/compiler/build_module.pyc in build(graph, target, shape, dtype, params, target_host)
    249     graph = graph.apply("InferShape").apply("InferType")
    250     with target:
--> 251         graph = graph.apply("GraphFusePartition").apply("GraphFuseCompile")
    252     libmod = graph_attr._move_out_module(graph, "module")
    253     return graph, libmod, params

/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/graph.pyc in apply(self, passes)
    232         ghandle = GraphHandle()
    233         npass = nn_uint(len(passes))
--> 234         check_call(_LIB.NNGraphApplyPasses(self.handle, npass, cpass, ctypes.byref(ghandle)))
    235         return Graph(ghandle)
    236 

/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/_base.pyc in check_call(ret)
     70     """
     71     if ret != 0:
---> 72         raise NNVMError(py_str(_LIB.NNGetLastError()))
     73 
     74 def c_str(string):

NNVMError: TVMCall CFunc Error:
Traceback (most recent call last):
  File "tvm/_ffi/_cython/./function.pxi", line 39, in core.tvm_callback
  File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/compiler/build_module.py", line 119, in _build
    return tvm.build(funcs, target=target, target_host=target_host)
  File "/usr/local/lib/python2.7/dist-packages/tvm-0.2.0-py2.7-linux-x86_64.egg/tvm/build_module.py", line 471, in build
    mhost = codegen.build_module(fhost, str(target_host))
  File "/usr/local/lib/python2.7/dist-packages/tvm-0.2.0-py2.7-linux-x86_64.egg/tvm/codegen.py", line 20, in build_module
    return _Build(lowered_func, target)
  File "/usr/local/lib/python2.7/dist-packages/tvm-0.2.0-py2.7-linux-x86_64.egg/tvm/_ffi/function.py", line 280, in my_api_func
    return flocal(*args)
  File "tvm/_ffi/_cython/./function.pxi", line 264, in core.FunctionBase.__call__
  File "tvm/_ffi/_cython/./function.pxi", line 213, in core.FuncCall
  File "tvm/_ffi/_cython/./function.pxi", line 205, in core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 131, in core.CALL
TVMError: [12:04:22] src/codegen/codegen.cc:27: Check failed: bf != nullptr Target llvm is not enabled

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/tvm-0.2.0-py2.7-linux-x86_64.egg/tvm/libtvm.so(dmlc::StackTrace[abi:cxx11]()+0x5a) [0x7fbbd930701a]
[bt] (1) /usr/local/lib/python2.7/dist-packages/tvm-0.2.0-py2.7-linux-x86_64.egg/tvm/libtvm.so(tvm::codegen::Build(tvm::Array<tvm::LoweredFunc, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xdac) [0x7fbbd94e4e9c]
[bt] (2) /usr/local/lib/python2.7/dist-packages/tvm-0.2.0-py2.7-linux-x86_64.egg/tvm/libtvm.so(+0x341449) [0x7fbbd9491449]
[bt] (3) /usr/local/lib/python2.7/dist-packages/tvm-0.2.0-py2.7-linux-x86_64.egg/tvm/libtvm.so(TVMFuncCall+0x5e) [0x7fbbd96ab8fe]
[bt] (4) /usr/local/lib/python2.7/dist-packages/tvm-0.2.0-py2.7-linux-x86_64.egg/tvm/_ffi/_cy2/core.so(+0x18be7) [0x7fbbc59d8be7]
[bt] (5) /usr/bin/python2(PyObject_Call+0x3e) [0x4a577e]
[bt] (6) /usr/bin/python2(PyEval_EvalFrameEx+0x2f0d) [0x4bed3d]
[bt] (7) /usr/bin/python2(PyEval_EvalCodeEx+0x306) [0x4b9ab6]
[bt] (8) /usr/bin/python2(PyEval_EvalFrameEx+0x603f) [0x4c1e6f]
[bt] (9) /usr/bin/python2(PyEval_EvalFrameEx+0x553f) [0x4c136f]

but probably it more about nnvm. What do you think about it?.. Will it be correct to answer "no, common shape not really works"?..

@eqy
Copy link
Contributor

eqy commented Mar 22, 2018

This is confusing because the error is complaining about llvm not being enabled, though llvm should not be a requirement for CUDA codegen (https://github.com/dmlc/tvm/blob/master/docs/how_to/install.md).

Can you verify that the target is the CUDA GPU?

@arassadin
Copy link
Author

The built-in example fails before the ctx = tvm.gpu(0)...

@arassadin
Copy link
Author

BTW, with CPU context and my own model, I got such trace

---------------------------------------------------------------------------
TVMError                                  Traceback (most recent call last)
<ipython-input-17-02f7defa23aa> in <module>()
----> 1 m.run()

/usr/local/lib/python2.7/dist-packages/tvm-0.2.0-py2.7-linux-x86_64.egg/tvm/contrib/graph_runtime.pyc in run(self, **input_dict)
    111         if input_dict:
    112             self.set_input(**input_dict)
--> 113         self._run()
    114 
    115     def get_input(self, index, out):

tvm/_ffi/_cython/./function.pxi in core.FunctionBase.__call__()

tvm/_ffi/_cython/./function.pxi in core.FuncCall()

tvm/_ffi/_cython/./function.pxi in core.FuncCall3()

tvm/_ffi/_cython/./base.pxi in core.CALL()

TVMError: [22:40:15] src/codegen/stack_vm/stack_vm.cc:287: Check failed: stack[sp].v_int64 device_type need to be 2

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/tvm-0.2.0-py2.7-linux-x86_64.egg/tvm/libtvm.so(dmlc::StackTrace[abi:cxx11]()+0x5a) [0x7f469448703a]
[bt] (1) /usr/local/lib/python2.7/dist-packages/tvm-0.2.0-py2.7-linux-x86_64.egg/tvm/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f4694487c28]
[bt] (2) /usr/local/lib/python2.7/dist-packages/tvm-0.2.0-py2.7-linux-x86_64.egg/tvm/libtvm.so(tvm::codegen::StackVM::Run(tvm::codegen::StackVM::State*) const+0x2109) [0x7f46946e4b69]
[bt] (3) /usr/local/lib/python2.7/dist-packages/tvm-0.2.0-py2.7-linux-x86_64.egg/tvm/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::codegen::StackVMModuleNode::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<tvm::runtime::ModuleNode> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x38) [0x7f46946e5f58]
[bt] (4) /usr/local/lib/python2.7/dist-packages/tvm-0.2.0-py2.7-linux-x86_64.egg/tvm/libtvm.so(+0x5abac7) [0x7f469487bac7]
[bt] (5) /usr/local/lib/python2.7/dist-packages/tvm-0.2.0-py2.7-linux-x86_64.egg/tvm/libtvm.so(+0x5aa267) [0x7f469487a267]
[bt] (6) /usr/local/lib/python2.7/dist-packages/tvm-0.2.0-py2.7-linux-x86_64.egg/tvm/libtvm.so(TVMFuncCall+0x5e) [0x7f469482b91e]
[bt] (7) /usr/local/lib/python2.7/dist-packages/tvm-0.2.0-py2.7-linux-x86_64.egg/tvm/_ffi/_cy2/core.so(+0x18be7) [0x7f468108abe7]
[bt] (8) /usr/bin/python2(PyEval_EvalFrameEx+0x578f) [0x4c15bf]
[bt] (9) /usr/bin/python2(PyEval_EvalCodeEx+0x306) [0x4b9ab6]

@tqchen
Copy link
Member

tqchen commented Mar 25, 2018

The error for passing in cpu context is correct because we expect gpu. try the latest version in master and it might throw a error and tell you which graph causes the problem

@arassadin
Copy link
Author

Hi,

According to your suggestion, I rebuilt latest nnvm (c8832cc1c57fc35d8f1e8042c258ac32c0309ebc) with the latest tvm (567a10bb0947180b067f39a97c76d7fe7a3ca1f2) and traceback doesn't changed:

src/runtime/cuda/cuda_module.cc:93: CUDAError: cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_INVALID_PTX

for my own model and

src/codegen/codegen.cc:27: Check failed: bf != nullptr Target llvm is not enabled

for Keras ResNet-50 example.

@tqchen
Copy link
Member

tqchen commented May 19, 2018

close due to we are not able to further act on this, discussions are moved to https://discuss.tvmlang.org/

@tqchen tqchen closed this as completed May 19, 2018
@expectopatronm
Copy link

I get the exact same issue.

jetson@jetson:~/fast-depth/deploy$ python3 tx2_run_tvm.py --input-fp data/rgb.npy --output-fp data/pred.npy --model-dir ../results/tvm_compiled/tx2_gpu_mobilenet_nnconv5dw_skipadd_pruned/ --cuda True
=> [TVM on TX2] using model files in ../results/tvm_compiled/tx2_gpu_mobilenet_nnconv5dw_skipadd_pruned/
=> [TVM on TX2] loading model lib and ptx
=> [TVM on TX2] loading model graph and params
=> [TVM on TX2] creating TVM runtime module
=> [TVM on TX2] feeding inputs and params into TVM module
=> [TVM on TX2] running TVM module, saving output
Traceback (most recent call last):

File "tx2_run_tvm.py", line 91, in
main()

File "tx2_run_tvm.py", line 88, in main
run_model(args.model_dir, args.input_fp, args.output_fp, args.warmup, args.run, args.cuda, try_randin=args.randin)

File "tx2_run_tvm.py", line 36, in run_model
run() # not gmodule.run()

File "/home/jetson/tvm/python/tvm/_ffi/_ctypes/function.py", line 207, in call
raise get_last_ffi_error()

tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (3) /home/jetson/tvm/build/libtvm.so(TVMFuncCall+0x70) [0x7fad7ccec0]
[bt] (2) /home/jetson/tvm/build/libtvm.so(std::Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::detail::PackFuncVoidAddr<4, tvm::runtime::CUDAWrappedFunc>(tvm::runtime::CUDAWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocatortvm::runtime::detail::ArgConvertCode > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::M_invoke(std::Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0xe8) [0x7fad850b08]
[bt] (1) /home/jetson/tvm/build/libtvm.so(tvm::runtime::CUDAWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void**) const+0x6cc) [0x7fad85093c]
[bt] (0) /home/jetson/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4c) [0x7facfdebac]
File "/home/jetson/tvm/src/runtime/cuda/cuda_module.cc", line 110
File "/home/jetson/tvm/src/runtime/library_module.cc", line 91
CUDAError: Check failed: ret == 0 (-1 vs. 0) : cuModuleLoadData(&(module
[device_id]), data
.c_str()) failed with error: CUDA_ERROR_INVALID_PTX

Still haven't found a solution to it. I am runnig it on a Jetson Nano. Please help.

@tiandiao123
Copy link
Contributor

I get the exact same issue.

jetson@jetson:~/fast-depth/deploy$ python3 tx2_run_tvm.py --input-fp data/rgb.npy --output-fp data/pred.npy --model-dir ../results/tvm_compiled/tx2_gpu_mobilenet_nnconv5dw_skipadd_pruned/ --cuda True
=> [TVM on TX2] using model files in ../results/tvm_compiled/tx2_gpu_mobilenet_nnconv5dw_skipadd_pruned/
=> [TVM on TX2] loading model lib and ptx
=> [TVM on TX2] loading model graph and params
=> [TVM on TX2] creating TVM runtime module
=> [TVM on TX2] feeding inputs and params into TVM module
=> [TVM on TX2] running TVM module, saving output
Traceback (most recent call last):

File "tx2_run_tvm.py", line 91, in
main()

File "tx2_run_tvm.py", line 88, in main
run_model(args.model_dir, args.input_fp, args.output_fp, args.warmup, args.run, args.cuda, try_randin=args.randin)

File "tx2_run_tvm.py", line 36, in run_model
run() # not gmodule.run()

File "/home/jetson/tvm/python/tvm/_ffi/_ctypes/function.py", line 207, in call
raise get_last_ffi_error()

tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (3) /home/jetson/tvm/build/libtvm.so(TVMFuncCall+0x70) [0x7fad7ccec0]
[bt] (2) /home/jetson/tvm/build/libtvm.so(std::Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::detail::PackFuncVoidAddr<4, tvm::runtime::CUDAWrappedFunc>(tvm::runtime::CUDAWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocatortvm::runtime::detail::ArgConvertCode > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::M_invoke(std::Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0xe8) [0x7fad850b08] [bt] (1) /home/jetson/tvm/build/libtvm.so(tvm::runtime::CUDAWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void**) const+0x6cc) [0x7fad85093c] [bt] (0) /home/jetson/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4c) [0x7facfdebac] File "/home/jetson/tvm/src/runtime/cuda/cuda_module.cc", line 110 File "/home/jetson/tvm/src/runtime/library_module.cc", line 91 CUDAError: Check failed: ret == 0 (-1 vs. 0) : cuModuleLoadData(&(module[device_id]), data.c_str()) failed with error: CUDA_ERROR_INVALID_PTX

Still haven't found a solution to it. I am runnig it on a Jetson Nano. Please help.

did you find some solution? I have exact same issue. I don't know how to fix it, could you help me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants