You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when number of conv is small, inductor will close layout opt. We have to force it open by TORCHINDUCTOR_FORCE_LAYOUT_OPT, otherwise we may meet inefficient kernel like cat_layernorm in this issue
For both xpu and cuda, when there are more nodes between conv, there will be unnecessary transpose. For example, conv (channel last) + bias and leakyrelu fusion (to channel first) + avg_pool (to channel last) + conv. It seems inductor does not propagate layout. Fuse bias and activation into conv will mitigate.
🚀 The feature, motivation and pitch
Analyze Triton kernels data and report to Triton XPU.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: