-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PASS] Add GPU IR verifier #1296
Conversation
|
include/tvm/ir_pass.h
Outdated
* \return valid Whether it is a valid cuda ir | ||
* | ||
*/ | ||
bool VerifyCuda(Stmt stmt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is CUDA specific? or should we call it VerifyGPUCode
@merrymercy please act on the comments and fix the ci error |
src/pass/verify_gpu_code.cc
Outdated
|
||
class GPUCodeVerifier : public IRVisitor { | ||
public: | ||
bool verify(tvm::Stmt stmt, int max_shared_memory_per_block, int max_thread_per_block) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CamelCase for fucnctions
src/pass/verify_gpu_code.cc
Outdated
if (shared_buffers_.count(op->buffer_var.get()) != 0) { | ||
int64_t size = op->type.bytes(); | ||
for (auto dim : op->extents) { | ||
size *= dim.as<IntImm>()->value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
op-> constant_allocation_size()
src/pass/verify_gpu_code.cc
Outdated
// record the number of threads in a block | ||
std::string name = var.get()->name_hint; | ||
if (name == "threadIdx.x" || name == "threadIdx.y" || name == "threadIdx.z") { | ||
if (visited_threads_.find(name) == visited_threads_.end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!count(name)
src/pass/verify_gpu_code.cc
Outdated
size_t max_shared_memory_per_block_; | ||
size_t max_thread_per_block_; | ||
|
||
bool valid{true}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valid_
src/pass/verify_gpu_code.cc
Outdated
|
||
bool valid{true}; | ||
|
||
void reset_() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reset()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this visitor is only used once, reset is not necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reset is needed because there might be several gpu kernels in one Stmt.
src/pass/verify_gpu_code.cc
Outdated
} | ||
|
||
if (op->is_producer) { | ||
nest_level_++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer --nest_+level
@eqy can you also do a round of codereview? |
@tqchen @merrymercy if we also save the number of threads per dimension (x, y, z), perhaps we can also use this to capture EDIT: It does seem that CUDA devices can have a similar limit, which can read with deviceQuery |
It would be a good idea to pass in as many constraints as possible and allowing defaults to non-constraints. One possible way to do so is to allow pass in of a |
src/pass/verify_gpu_code.cc
Outdated
|
||
std::unordered_set<const tvm::Variable *> shared_buffers_; | ||
std::unordered_set<std::string> visited_threads_; | ||
size_t shared_memory_per_block_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local_memiry_per_block_
is also needed.
ready for review |
Is the plan to skip checking threadId/workitem dimensions in this round? |
please fix the compiler warning http://mode-gpu.cs.washington.edu:8080/blue/organizations/jenkins/dmlc%2Ftvm/detail/PR-1296/11/pipeline Currently we set compiler warning as error so built won't pass if there is a warning |
"""Test gpu code verifier""" | ||
import tvm | ||
|
||
global valid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always avoid use of global variable, to carry state, you can use a closure to capture a list
The test error likely indicate there is some problem with the current PR when importing runtime only dll. |
aebd3fd
to
f50f040
Compare
Add a pass to check whether a cuda ir is valid