-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Fix Disco-CUDAGraph Integration #15870
[Bugfix] Fix Disco-CUDAGraph Integration #15870
Conversation
CC: @jinhongyii |
void Clear() { NCCL_CALL(ncclCommDestroy(comm)); } | ||
void Clear() { | ||
NCCL_CALL(ncclCommDestroy(comm)); | ||
if (default_stream != nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, default_stream is always nullptr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given in this case default_stream
is a member that can be changed later, i think having such code is better. Note it is not the same as the default stream in the CUDA terminology, but in the nccl runtime
|
||
deviceStream_t GetDefaultStream() { return nullptr; } | ||
deviceStream_t GetDefaultStream() { | ||
const auto* func = tvm::runtime::Registry::Get("runtime.get_" TVM_DISCO_DEVICE_NAME "_stream"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any stream initialization in this file , so I'm confused what this function returns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's initialized in cudagraph
This PR fixed a bug introduced in apache#15827 since which the cudagraph's stream is discarded.
88536e1
to
e79e582
Compare
This PR fixed a bug introduced in #15827 since which the cudagraph's stream is discarded.