Undocumented CUDA graphs requirement that kernels must use stream #114048

tsengalb99 · 2023-11-19T03:21:20Z

🐛 Describe the bug

I have a function that calls some custom cuda kernels interleaved with pytorch operations. When I try to capture the function with a cuda graph, the cuda kernels become no-ops. That is, the graph captures fine and performs all operations except for those in the cuda kernels. Does torch's cuda graph work with custom kernels? Is there something special I need to do to enable custom kernels with graphs?

Versions

torch 2.1.0 on cuda 11.8

cc @mcarilli @ezyang

ezyang · 2023-11-20T18:06:06Z

It does work with custom kernels, but there are a number of ways you could have messed up the kernels so that bad things happen. You'll probably have to share some code for more info.

tsengalb99 · 2023-11-20T18:12:51Z

I managed to get my kernel to work by passing the current cuda stream to the kernel call in the c++ wrapper. I only figured this out by finding a kernel that did work and comparing my code to that; is this requirement documented anywhere in pytorch? Not passing in the stream works fine when calling the c++ wrapper outside of a graph.

ezyang · 2023-11-20T18:29:10Z

It's a requirement for CUDA graphs itself. We could remind users about it in the CUDA graph API, wanna send a doc patch?

ngimel · 2023-12-04T01:36:47Z

CUDA graph API will warn if there were no kernels captured (e.g. if capture started on stream S but all the kernels were on the default stream), however, a mixture of kernels on default and capturing stream (without stream synchronizations in between) is an error that we can't possibly catch - maybe user did want those kernels to run and not be captured.

Abhishekghosh1998 · 2024-03-05T05:33:59Z

@tsengalb99, can you please share your code if possible? I'm just curious to know about your approach and where/how it went wrong.

ezyang changed the title ~~torch cuda graph with custom cuda kernel possible bug~~ Undocumented CUDA graphs requirement that kernels must use stream Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Undocumented CUDA graphs requirement that kernels must use stream #114048

Undocumented CUDA graphs requirement that kernels must use stream #114048

tsengalb99 commented Nov 19, 2023 •

edited by pytorch-bot bot

Loading

ezyang commented Nov 20, 2023

tsengalb99 commented Nov 20, 2023

ezyang commented Nov 20, 2023

ngimel commented Dec 4, 2023

Abhishekghosh1998 commented Mar 5, 2024

Undocumented CUDA graphs requirement that kernels must use stream #114048

Undocumented CUDA graphs requirement that kernels must use stream #114048

Comments

tsengalb99 commented Nov 19, 2023 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Versions

ezyang commented Nov 20, 2023

tsengalb99 commented Nov 20, 2023

ezyang commented Nov 20, 2023

ngimel commented Dec 4, 2023

Abhishekghosh1998 commented Mar 5, 2024

tsengalb99 commented Nov 19, 2023 •

edited by pytorch-bot bot

Loading