Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] [TEST] test_conv2d_int8_intrinsics #5455

Closed
tqchen opened this issue Apr 27, 2020 · 8 comments
Closed

[CI] [TEST] test_conv2d_int8_intrinsics #5455

tqchen opened this issue Apr 27, 2020 · 8 comments
Assignees

Comments

@tqchen
Copy link
Member

tqchen commented Apr 27, 2020

In the past week I attempted to upgrade the docker image CPU to bionic(ubuntu 18.04), during the time, a new unittest error occurs(note that the master CI was fine) in the int8 intrinsic test,

it would be great it we can look into it. To reproduce, use the docker image tvmai/ci-cpu:v0.62-t0

http://ci.tvm.ai:8080/job/temp-ci-docker-staging/job/ci-stage/30/execution/node/131/log/

@tqchen
Copy link
Member Author

tqchen commented Apr 27, 2020

cc @anijain2305

@tqchen
Copy link
Member Author

tqchen commented Apr 27, 2020

self = <tvm.runtime.packed_func.PackedFunc object at 0x7fe439b9f630>
args = (IRModuleNode( {GlobalVar(main): FunctionNode([Var(x, ty=TensorType([1, 1, 64, 64], uint8))], TensorType([1, 16, 64, 6...Type([1, 1, 64, 64], uint8), TensorType([16, 1, 3, 3], int8)]), [], (nullptr))}), {1: llvm -mcpu=skylake-avx512}, None)
temp_args = [{1: llvm -mcpu=skylake-avx512}]
values = <tvm._ffi._ctypes.packed_func.TVMValue_Array_3 object at 0x7fe43985c510>
tcodes = <tvm._ffi._ctypes.packed_func.c_int_Array_3 object at 0x7fe43986fea0>

    def __call__(self, *args):
        """Call the function with positional arguments
    
        args : list
           The positional arguments to the function call.
        """
        temp_args = []
        values, tcodes, num_args = _make_tvm_args(args, temp_args)
        ret_val = TVMValue()
        ret_tcode = ctypes.c_int()
        if _LIB.TVMFuncCall(
                self.handle, values, tcodes, ctypes.c_int(num_args),
                ctypes.byref(ret_val), ctypes.byref(ret_tcode)) != 0:
>           raise get_last_ffi_error()
E           tvm._ffi.base.TVMError: Traceback (most recent call last):
E             [bt] (8) /workspace/build/libtvm.so(+0x95e727) [0x7fe426761727]
E             [bt] (7) /workspace/build/libtvm.so(+0x96e3b4) [0x7fe4267713b4]
E             [bt] (6) /workspace/build/libtvm.so(+0x969334) [0x7fe42676c334]
E             [bt] (5) /workspace/build/libtvm.so(+0x96fd39) [0x7fe426772d39]
E             [bt] (4) /workspace/build/libtvm.so(+0x95e727) [0x7fe426761727]
E             [bt] (3) /workspace/build/libtvm.so(+0x96e1e9) [0x7fe4267711e9]
E             [bt] (2) /workspace/build/libtvm.so(+0x940f54) [0x7fe426743f54]
E             [bt] (1) /workspace/build/libtvm.so(+0x94c913) [0x7fe42674f913]
E             [bt] (0) /workspace/build/libtvm.so(+0xa9f52b) [0x7fe4268a252b]
E             File "/workspace/python/tvm/relay/backend/_backend.py", line 49, in lower
E               f = tvm.driver.lower(sch, inputs, name=func_name)
E             File "/workspace/python/tvm/driver/build_module.py", line 215, in lower
E               mod = optimize(mod)
E             File "/workspace/python/tvm/ir/transform.py", line 141, in __call__
E               return _ffi_transform_api.RunPass(self, mod)
E             File "/workspace/python/tvm/_ffi/_ctypes/packed_func.py", line 219, in __call__
E               raise get_last_ffi_error()
E             [bt] (8) /workspace/build/libtvm.so(tvm::tir::ExprVisitor::VisitExpr_(tvm::tir::LoadNode const*)+0x16) [0x7fe4261e0cf6]
E             [bt] (7) /workspace/build/libtvm.so(+0x4bf027) [0x7fe4262c2027]
E             [bt] (6) /workspace/build/libtvm.so(tvm::arith::ConstIntBoundAnalyzer::operator()(tvm::PrimExpr const&, std::unordered_map<tvm::PrimExprNode const*, tvm::arith::ConstIntBound, std::hash<tvm::PrimExprNode const*>, std::equal_to<tvm::PrimExprNode const*>, std::allocator<std::pair<tvm::PrimExprNode const* const, tvm::arith::ConstIntBound> > >*)+0x1b) [0x7fe42600882b]
E             [bt] (5) /workspace/build/libtvm.so(+0x20da92) [0x7fe426010a92]
E             [bt] (4) /workspace/build/libtvm.so(+0x20f1a9) [0x7fe4260121a9]
E             [bt] (3) /workspace/build/libtvm.so(+0x20da92) [0x7fe426010a92]
E             [bt] (2) /workspace/build/libtvm.so(+0x20e27b) [0x7fe42601127b]
E             [bt] (1) /workspace/build/libtvm.so(+0x20dd2f) [0x7fe426010d2f]
E             [bt] (0) /workspace/build/libtvm.so(+0x12a65c) [0x7fe425f2d65c]
E             File "/workspace/src/arith/const_int_bound.cc", line 157
E             File "/workspace/python/tvm/_ffi/_ctypes/packed_func.py", line 78, in cfun
E               rv = local_pyfunc(*pyargs)
E             File "/workspace/python/tvm/relay/backend/_backend.py", line 57, in lower
E               raise RuntimeError(msg)
E             File "/workspace/python/tvm/relay/backend/_backend.py", line 49, in lower
E               f = tvm.driver.lower(sch, inputs, name=func_name)
E             File "/workspace/python/tvm/driver/build_module.py", line 215, in lower
E               mod = optimize(mod)
E             File "/workspace/python/tvm/ir/transform.py", line 141, in __call__
E               return _ffi_transform_api.RunPass(self, mod)
E             File "/workspace/python/tvm/_ffi/_ctypes/packed_func.py", line 219, in __call__
E               raise get_last_ffi_error()
E             [bt] (8) /workspace/build/libtvm.so(tvm::tir::ExprVisitor::VisitExpr_(tvm::tir::LoadNode const*)+0x16) [0x7fe4261e0cf6]
E             [bt] (7) /workspace/build/libtvm.so(+0x4bf027) [0x7fe4262c2027]
E             [bt] (6) /workspace/build/libtvm.so(tvm::arith::ConstIntBoundAnalyzer::operator()(tvm::PrimExpr const&, std::unordered_map<tvm::PrimExprNode const*, tvm::arith::ConstIntBound, std::hash<tvm::PrimExprNode const*>, std::equal_to<tvm::PrimExprNode const*>, std::allocator<std::pair<tvm::PrimExprNode const* const, tvm::arith::ConstIntBound> > >*)+0x1b) [0x7fe42600882b]
E             [bt] (5) /workspace/build/libtvm.so(+0x20da92) [0x7fe426010a92]
E             [bt] (4) /workspace/build/libtvm.so(+0x20f1a9) [0x7fe4260121a9]
E             [bt] (3) /workspace/build/libtvm.so(+0x20da92) [0x7fe426010a92]
E             [bt] (2) /workspace/build/libtvm.so(+0x20e27b) [0x7fe42601127b]
E             [bt] (1) /workspace/build/libtvm.so(+0x20dd2f) [0x7fe426010d2f]
E             [bt] (0) /workspace/build/libtvm.so(+0x12a65c) [0x7fe425f2d65c]
E             File "/workspace/src/arith/const_int_bound.cc", line 157
E           TVMError: Check failed: (val->second->min_value == res.min_value && val->second->max_value == res.max_value) || (val->second->min_value == everything.min_value && val->second->max_value == everything.max_value): Detected bound for 15conflicts with memorization
E           During handling of the above exception, another exception occurred:
E           
E           TVMError: Check failed: (val->second->min_value == res.min_value && val->second->max_value == res.max_value) || (val->second->min_value == everything.min_value && val->second->max_value == everything.max_value): Detected bound for 15conflicts with memorization
E           Error during compile function
E           -----------------------------
E           v0.0.4
E           fn (%p0: Tensor[(1, 1, 64, 64, 4), uint8], %p1: Tensor[(1, 1, 3, 3, 1, 16, 4), int8], Primitive=1) -> Tensor[(1, 1, 64, 64, 16), int32] {
E             nn.contrib_conv2d_NCHWc(%p0, %p1, padding=[1, 1, 1, 1], channels=16, kernel_size=[3, 3], data_layout="NCHW4c", out_layout="NCHW16c", out_dtype="int32") /* ty=Tensor[(1, 1, 64, 64, 16), int32] */
E           }

The error has things to do with constant int bound rebound to a different result.

@tqchen
Copy link
Member Author

tqchen commented Apr 27, 2020

also cc @yongfeng-nv @hzfan since both of you have touched/reviewed the const int bound recently

@masahi
Copy link
Member

masahi commented Apr 27, 2020

I also saw this error when running test_op_level2.py locally. It happens randomly.

@tqchen tqchen changed the title [CI] QNN Compilation Error During Bionic Docker Update [CI] [TEST] test_conv2d_int8_intrinsics Apr 27, 2020
@yongfeng-nv
Copy link
Contributor

Ran the test in tvmai/ci-cpu:v0.62-t0, but couldn't reproduce the failure.
I cloned and built TVM by myself, as I I didn't find TVM in the docker image.

@anijain2305
Copy link
Contributor

anijain2305 commented May 2, 2020

I was able to reproduce the failure. I have not been able to solve it yet. The only pointer that I have till now is that if I disable tensorize (this test uses tensorize to use Intel VNNI), the test progresses.

I am not familiar with const int bound. I will try to get familiar with it and see how tensorize impacts const int bounds.

[19:48:06] /home/ubuntu/workplace/tvm/t1/tvm/src/arith/const_int_bound.cc:153: Expr = 15
[19:48:06] /home/ubuntu/workplace/tvm/t1/tvm/src/arith/const_int_bound.cc:154: Bounds = ConstIntBound[63,63]

@anijain2305
Copy link
Contributor

Dug little bit deeper, but still not able to root cause.

I think the const int bounds are good.

https://github.com/apache/incubator-tvm/blob/7e88030a804357421b766d7309f9085ca2b83378/src/arith/const_int_bound.cc#L149-L161

The problem here is that two different PrimExpr have same PrimExprNode* at L150

tvm/src/arith/const_int_bound.cc:166: Expr --> 63, op = 0x3b73800
tvm/src/arith/const_int_bound.cc:154: Expr --> 15, op = 0x3b73800

And therefore, Expr(63) whose ConstIntBound is set correctly to [63, 63] is used to compare the bounds for Expr(15), causing failure. Will pursue this direction, please let me know of any suggestions @tqchen

@anijain2305
Copy link
Contributor

@tqchen I verified locally. This is resolved now. You can close this.

@anijain2305 anijain2305 assigned tqchen and unassigned anijain2305 May 10, 2020
@tqchen tqchen closed this as completed May 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants