[BugFix] fix nvptx not supported by device_enabled error #9585

ZQPei · 2021-11-25T03:43:16Z

Fix Issue #9513.

I think this bug was mistakenly introduced by #8032.

Signed-off-by: ZQPei <[email protected]>

junrushao

LGTM. Thanks!

Signed-off-by: ZQPei <[email protected]>

ZQPei · 2021-11-25T08:23:54Z

@junrushao1994
I think thir pr needs a further review, because another error occurs in ci https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-9585/1/pipeline/267.

Quoted error is

E File "/workspace/src/target/llvm/codegen_nvptx.cc", line 146
E TVMError: Do not support sync shared.dyn

This error is raised because of shared.dyn not supported by "nvptx" target.
And it was previously covered by device_enabled("nvptx") returning False in the last few months.

tvm/tests/python/unittest/test_tir_ir_builder.py

Lines 499 to 552 in 0a4cc89

    
           @tvm.testing.requires_gpu 
        
           def test_dyn_shared(): 
        
               n = te.size_var("n") 
        
               dtype = "float32" 
        
               A = te.placeholder((n,), name="A") 
        
               def test_device_ir(A, B): 
        
                   n = A.shape[0] 
        
                   ib = tvm.tir.ir_builder.create() 
        
                   tx = te.thread_axis("threadIdx.x") 
        
                   ib.scope_attr(tx, "thread_extent", n) 
        
                   temp = ib.allocate(dtype, (n,), scope="shared.dyn")  # n is symbolic size 
        
                   Aptr = ib.buffer_ptr(A) 
        
                   Bptr = ib.buffer_ptr(B) 
        
                   temp[tx] = Aptr[tx] 
        
                   depth = tvm.tir.log2(cast(n, "float32")) 
        
                   with ib.for_range(0, depth) as i: 
        
                       ib.emit(tvm.tir.Call(None, "tir.tvm_storage_sync", tvm.runtime.convert(["shared"]))) 
        
                       d = n >> (i + 1) 
        
                       with ib.if_scope(tx < d): 
        
                           temp[tx] += temp[tx + d] 
        
                   Bptr[0] = temp[0] 
        
                   return ib.get() 
        
               B = te.extern( 
        
                   (1,), 
        
                   [A], 
        
                   lambda ins, outs: test_device_ir(ins[0], outs[0]), 
        
                   name="reduce", 
        
                   dtype=dtype, 
        
               ) 
        
               s = te.create_schedule(B.op) 
        
               def check_target(target): 
        
                   if not tvm.testing.device_enabled(target): 
        
                       return 
        
                   freduce = tvm.build(s, [A, B], target) 
        
                   dev = tvm.device(target, 0) 
        
                   for n in [512, 1024]: 
        
                       a = tvm.nd.array(np.random.uniform(size=n).astype(A.dtype), dev) 
        
                       b = tvm.nd.array(np.zeros(1, dtype=B.dtype), dev) 
        
                       freduce(a, b) 
        
                       tvm.testing.assert_allclose(b.numpy()[0], np.sum(a.numpy()), 1e-4, 1e-4) 
        
               for target in ["cuda", "nvptx"]: 
        
                   check_target(target)

I have tried to fix it with this commit a8d14b1, and it works fine on my 2080ti gpu card.
But I am not sure if it is really OK.

vinx13

The codegen fix looks good

junrushao · 2021-11-25T19:43:19Z

Yeah I think it's reasonable to handle dynamic shared memory this way for now

junrushao · 2021-11-25T19:43:33Z

Thank you so much for the fix! @ZQPei

* [BugFix] fix nvptx not supported by device_enabled error Signed-off-by: ZQPei <[email protected]> * [BugFix] shared.dyn support for codegen_nvptx Signed-off-by: ZQPei <[email protected]>

[BugFix] fix nvptx not supported by device_enabled error

4d6d4c7

Signed-off-by: ZQPei <[email protected]>

ZQPei requested review from areusch, junrushao, kazum, liangfu, masahi, tmoreau89, tqchen, vinx13 and ZihengJiang as code owners November 25, 2021 03:43

ZQPei mentioned this pull request Nov 25, 2021

[BUG] NVPTX as a testable device seems to be not supported anymore #9513

Closed

junrushao approved these changes Nov 25, 2021

View reviewed changes

junrushao linked an issue Nov 25, 2021 that may be closed by this pull request

[BUG] NVPTX as a testable device seems to be not supported anymore #9513

Closed

[BugFix] shared.dyn support for codegen_nvptx

a8d14b1

Signed-off-by: ZQPei <[email protected]>

ZQPei requested a review from kparzysz-quic as a code owner November 25, 2021 08:02

vinx13 approved these changes Nov 25, 2021

View reviewed changes

junrushao merged commit fb4b7e2 into apache:main Nov 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] fix nvptx not supported by device_enabled error #9585

[BugFix] fix nvptx not supported by device_enabled error #9585

ZQPei commented Nov 25, 2021 •

edited

Loading

junrushao left a comment

ZQPei commented Nov 25, 2021 •

edited

Loading

vinx13 left a comment

junrushao commented Nov 25, 2021

junrushao commented Nov 25, 2021

[BugFix] fix nvptx not supported by device_enabled error #9585

[BugFix] fix nvptx not supported by device_enabled error #9585

Conversation

ZQPei commented Nov 25, 2021 • edited Loading

junrushao left a comment

Choose a reason for hiding this comment

ZQPei commented Nov 25, 2021 • edited Loading

vinx13 left a comment

Choose a reason for hiding this comment

junrushao commented Nov 25, 2021

junrushao commented Nov 25, 2021

ZQPei commented Nov 25, 2021 •

edited

Loading

ZQPei commented Nov 25, 2021 •

edited

Loading