Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add workgroup size attribute to AMDGPU functions in codegen
When we did not set the workgroup size, LLVM will use too many registers for kernel launches with many threads. This resulted in "invalid ISA" errors. Here we set the maximum workgroup size to the maximum threads per block from the device API. Of course, one might look into allowing configurations with fewer threads at runtime to use more registers.
- Loading branch information