Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RUNTIME][PackedFunc][Registry::Manager] Segfault with Registry::Manager and PackedFunc on x86_64 GNU/Linux. #2067

Closed
denis0x0D opened this issue Nov 5, 2018 · 3 comments

Comments

@denis0x0D
Copy link
Contributor

Recently I was working on trace primitive #1973 and found the strange bug related to hashtable which store the "PACKED" functions.

1. Python example.

$cat test_packed.py

  1 import tvm
  2 import numpy as np
  3 
  4 @tvm.register_func
  5 def my_debug(x):
  6     print("my_debug")
  7     return 0
  8 
  9 x = tvm.placeholder((1024, 1024), name="x", dtype="float32")
 10 xbuffer = tvm.decl_buffer(x.shape, dtype=x.dtype)
 11 y = tvm.compute(x.shape, lambda i, j: tvm.call_packed("my_debug", xbuffer))
 12 s = tvm.create_schedule(y.op)
 13 
 14 f = tvm.build(s, [xbuffer, y], binds={x:xbuffer})
 15 xnd = tvm.nd.array(np.ones((1024, 1024), dtype=x.dtype))
 16 ynd = tvm.nd.array(np.zeros((1024, 1024), dtype=y.dtype))
 17 f(xnd, ynd)

$python3 test_packed.py
Output:
my_debug
...
Segmentation fault (core dumped)

2. C++ example.

$cat test_packed.cc

 1 #include <tvm/tvm.h>
 2 #include <tvm/operation.h>
 3 #include <tvm/tensor.h>
 4 #include <tvm/build_module.h>
 5 #include <topi/broadcast.h>
 6 #include <topi/detail/extern.h>
 7 
 8 using namespace std;
 9 #define TYPE Int
10 #define bits 32
11 
12 using namespace topi::detail;
13 using namespace tvm;
14 
15 int main()
16 {
17   auto n = tvm::var("n");
18   tvm::Array<tvm::Expr> shape = {n, n};
19   tvm::Tensor A = tvm::placeholder(shape, tvm::TYPE(bits), "A");
20   tvm::Tensor B = tvm::placeholder(shape, tvm::TYPE(bits), "B");
21   tvm::Buffer xbuffer = tvm::decl_buffer(shape, tvm::TYPE(bits), "buffer");
22   tvm::Array<Expr> call_args{std::string("my_debug"), pack_buffer(xbuffer)};
23 
24   auto C = tvm::compute(
25       shape, FCompute([=](auto i) { return call_packed(call_args); }));
26 
27   tvm::Schedule s = tvm::create_schedule({C->op});
28   tvm::BuildConfig config = tvm::build_config();
29   std::unordered_map<tvm::Tensor, tvm::Buffer> binds;
30   binds.insert(pair<tvm::Tensor, tvm::Buffer>(A, xbuffer));
31   auto args = tvm::Array<tvm::Tensor>({A, B, C});
32   auto lowered = tvm::lower(s, args, "debug", binds, config);
33 
34   auto target = tvm::Target::create("llvm");
35   auto target_host = tvm::Target::create("llvm");
36   tvm::runtime::Module mod = tvm::build(lowered, target, target_host, config);
37   cout << mod->GetSource("asm") << endl;
38   return 0;
39 }

$cat test_packed_runtime.cc

  1 #include <iostream>
  2 #include <cstdio>
  3 #include <dlpack/dlpack.h>
  4 #include <tvm/runtime/module.h>
  5 #include <tvm/runtime/registry.h>
  6 #include <tvm/runtime/packed_func.h>
  7 
  8 using namespace std;
  9 using namespace tvm;
 10 using namespace runtime;
 11 TVM_REGISTER_GLOBAL("my_debug").set_body([](TVMArgs args, TVMRetValue *rv) {
 12   std::cout << "my debug" << std::endl;
 13   return 0;
 14 });
 15 
 16 int main(void) {
 17   tvm::runtime::Module mod = tvm::runtime::Module::LoadFromFile("libtest.so");
 18   tvm::runtime::PackedFunc f = mod.GetFunction("debug");
 19   DLTensor *a, *b, *c;
 20   int ndim = 2; 
 21   int dtype_code = kDLInt;
 22   int dtype_bits = 32;
 23   int dtype_lanes = 1;  
 24   int device_type = kDLCPU;
 25   int device_id = 0;
 26   int64_t shape[2] = {1024, 1024};
 27 
 28   TVMArrayAlloc(shape, ndim, dtype_code, dtype_bits, dtype_lanes, device_type,
 29                 device_id, &a);
 30   TVMArrayAlloc(shape, ndim, dtype_code, dtype_bits, dtype_lanes, device_type,
 31                 device_id, &b);
 32   TVMArrayAlloc(shape, ndim, dtype_code, dtype_bits, dtype_lanes, device_type,
 33                 device_id, &c);
 34 
 35   using dtype = int32_t;
 36 
 37   for (int i = 0; i < shape[0]; ++i) {
 38     for (int j = 0; j < shape[1]; ++j) {
 39       static_cast<dtype *>(a->data)[i * shape[0] + j] = 1;
 40       static_cast<dtype *>(b->data)[i * shape[0] + j] = 2;
 41     }
 42   }
 43   f(a, b, c);
 44   return 0;
 45 }
               

$ g++ -o test test_packed.cc -std=c++14
$ ./test > test.s
$g++ -o libtest.so test.s -shared
$g++ -o run_packed test_packed_runtime.cc -std=c++14
$./run_packed
Output:
my_debug
...
Segmentation fault (core dumped)

The backtrace:

 AddressSanitizer:DEADLYSIGNAL
  =================================================================
==16154==ERROR: AddressSanitizer: stack-overflow on address 0x7ffe39107ff8 (pc 0x7f5083c80998 bp 0x7ffe39108890 sp 0x7ffe39108000 T0)
#0 0x7f5083c80997 in __interceptor_memcmp ../../../../libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:772
#1 0x7f508382aadf in std::char_traits<char>::compare(char const*, char const*, unsigned long) /usr/include/c++/7/bits/char_traits.h:310
#2 0x7f508382aadf in __gnu_cxx::__enable_if<std::__is_char<char>::__value, bool>::__type std::operator==<char>(std::__cxx11::basic_string<char, std::char_traits<    char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /usr/include/c++/7/bits/basic_string.h:    6008
...
#9 0x7f5083829fff in std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::Registry*, std::hash<std::    __cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<c    har> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, tvm::runtime::Registry*> > >::find(std::__c    xx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /usr/include/c++/7/bits/unordered_map.h:923
#10 0x7f5083829fff in tvm::runtime::Registry::Get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /home//tvm/src/runtime/registry.cc:78
#11 0x7f5083817c87 in tvm::runtime::ModuleNode::GetFuncFromEnv(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /home//tvm/src/runtime/module.cc:91
#12 0x7f5083800a70 in TVMBackendGetFuncFromEnv /home//tvm/src/runtime/c_runtime_api.cc:175
#13 0x7f507abfdba7  (/home//tvm/test/libtest.so+0xba7)
#14 0x7f507abfd8c7 in debug (/home//tvm/test/libtest.so+0x8c7)
#15 0x7f508381a2ff in operator() /home//tvm/src/runtime/module_util.cc:52
#16 0x55f70de526a1 in std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) cons    t /usr/include/c++/7/bits/std_function.h:706
#17 0x55f70de52959 in tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<DLTensor*&, DLTensor*&, DLTensor*&>(DLTensor*&, DLTensor*&, DLTensor*&) cons    t /home//tvm/include/tvm/runtime/packed_func.h:1111
#18 0x55f70de51840 in main /home//tvm/test/test_packed_runtime.cc:43
#19 0x7f5082233b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
#20 0x55f70de51389 in _start (/home//tvm/test/run_packed+0x1389)

The error happens even with the latest version of libstdc++ from gcc source tree.

In other case it might be that I am using the API in wrong way, it would be good if someone else confirms the same bug.
Thanks.

@denis0x0D denis0x0D changed the title [RUNTIME][PackedFunc][Reistry::Manager] Segfault with Rgistry::Manager and PackedFunc on x86_64 GNU/Linux. [RUNTIME][PackedFunc][Reistry::Manager] Segfault with Registry::Manager and PackedFunc on x86_64 GNU/Linux. Nov 5, 2018
@denis0x0D denis0x0D changed the title [RUNTIME][PackedFunc][Reistry::Manager] Segfault with Registry::Manager and PackedFunc on x86_64 GNU/Linux. [RUNTIME][PackedFunc][Registry::Manager] Segfault with Registry::Manager and PackedFunc on x86_64 GNU/Linux. Nov 5, 2018
@tqchen
Copy link
Member

tqchen commented Nov 5, 2018

This is a pretty interesting bug that I can also confirm from my side. I can confirm it on my local machine. interestingly things are fine if we use a small array.

@tqchen
Copy link
Member

tqchen commented Nov 5, 2018

To investigate this a bit further, maybe we could make use of ir builder to build simple functions and run them through LLVM, to see if there is a particular thing that we did wrong.

https://github.com/dmlc/tvm/blob/master/tests/python/unittest/test_codegen_vm_basic.py#L37

My guess is there is something wrong with the code we generate(which corrupts the memory) instead of the runtime part, but I am not sure what it is

@tqchen
Copy link
Member

tqchen commented Nov 6, 2018

Should be resolved by #2070. This bug is caused by a compound of two problems:

  • We did not cache the result of packed_func, resulting calls into EnvGetFunc 1m times in the case of 1k * 1k
  • We did not lift alloca of the result temporary memory to top of the function, resulting calling alloca 1k * 1k times and stack overflows.

@tqchen tqchen closed this as completed Nov 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants