Undocumented CUDA graphs requirement that kernels must use stream #114048
Labels
module: cuda graphs
Ability to capture and then replay streams of CUDA kernels
module: custom-operators
custom operators, custom ops, custom-operators, custom-ops
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
🐛 Describe the bug
I have a function that calls some custom cuda kernels interleaved with pytorch operations. When I try to capture the function with a cuda graph, the cuda kernels become no-ops. That is, the graph captures fine and performs all operations except for those in the cuda kernels. Does torch's cuda graph work with custom kernels? Is there something special I need to do to enable custom kernels with graphs?
Versions
torch 2.1.0 on cuda 11.8
cc @mcarilli @ezyang
The text was updated successfully, but these errors were encountered: