Skip to content

Commit

Permalink
Add workgroup size attribute to AMDGPU functions in codegen
Browse files Browse the repository at this point in the history
When we did not set the workgroup size, LLVM will use too many registers
for kernel launches with many threads. This resulted in "invalid ISA"
errors. Here we set the maximum workgroup size to the maximum threads
per block from the device API.

Of course, one might look into allowing configurations with fewer
threads at runtime to use more registers.
  • Loading branch information
t-vi committed Nov 14, 2019
1 parent 8cd5cce commit 9ccc89d
Showing 1 changed file with 26 additions and 0 deletions.
26 changes: 26 additions & 0 deletions src/codegen/llvm/codegen_amdgpu.cc
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,39 @@
namespace tvm {
namespace codegen {

namespace {

// calls the device api to get the max threads per block
static inline int DetectROCMmaxThreadsPerBlock() {
TVMContext tvm_ctx;
tvm_ctx.device_type = kDLROCM;
tvm_ctx.device_id = 0;
tvm::runtime::DeviceAPI* api = tvm::runtime::DeviceAPI::Get(tvm_ctx, true);
if (api != nullptr) {
TVMRetValue val;
api->GetAttr(tvm_ctx, tvm::runtime::kExist, &val);
if (val.operator int() == 1) {
tvm::runtime::DeviceAPI::Get(tvm_ctx)->
GetAttr(tvm_ctx, tvm::runtime::kMaxThreadsPerBlock, &val);
return val.operator int();
}
}
LOG(WARNING) << "Cannot get maximum number of threads for AMD codegen";
return 1024;
}

} // namespace

// AMDGPU code generator.
class CodeGenAMDGPU : public CodeGenLLVM {
public:
void AddFunction(const LoweredFunc& f) final {
// add function as void return value
CodeGenLLVM::AddFunctionInternal(f, true);
function_->setCallingConv(llvm::CallingConv::AMDGPU_KERNEL);
std::ostringstream attr;
attr << "1," << DetectROCMmaxThreadsPerBlock();
function_->addFnAttr("amdgpu-flat-work-group-size", attr.str());
}

void VisitStmt_(const Allocate* op) final {
Expand Down

0 comments on commit 9ccc89d

Please sign in to comment.