switch of backends #865
-
Hi, we use Triton in our OSS package (Kernl) and our understanding is that the backend will change soon to rely on MLIR. We are wondering how the switch to the new backend will happen? Our main concern is that to keep output precision high, we are relying on a bunch of workaround and wonder how they will behave with the new backend, what we can drop, if we can enable some optimizations we can't use right now, etc. Any info regarding the future release would be very welcome. Btw... tks for all the great work! (and the rewrite 🎉) |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
Hey! The |
Beta Was this translation helpful? Give feedback.
-
Thank you, it's very clear. We do not track all the workarounds we put in place, it may be strange things like declaring 2 times the same variable with the same name and value to avoid a segfault, avoiding some block shape which crash for no obvious reasons (at least we have not understood), or just things we do in a certain way because other apparently legitimate way to program the same thing do not work as expected (we have a -very- simple "debugger" which takes triton code and execute it on PyTorch to check if the issue is on our side or Triton one). We have unit tests, just typing pytest in the package context, and you have >2K tests running. It is quite slow to execute all of them, around 1h30, so maybe you want to let us run them :-) Just ping me when you think it's time for it. I am very sorry for my next question... but I have to ask, is there a date for the release? (or at least a feature complete version of Triton) |
Beta Was this translation helpful? Give feedback.
-
Ah I see. Yeah the hope is that these workarounds won't be needed anymore when the merge happens (or shortly after), and won't break anything on the new backend. We're planning to merge mid-December. We've been very hard at work lately and it's starting to look good, but we still need a few changes to reach performance parity on dense matmul, and get flash attention working. |
Beta Was this translation helpful? Give feedback.
Hey!
The
master
branch is barely supported now, as we are very focused on thetriton-mlir
branch, and the plan is indeed to completely deprecate it when the merge happens. However, we don't want the merge to break anything fundamentally (there will be very minor changes, such asview
->reshape
). Do you have a link to the specific workarounds you're concerned about? Does kernl have some sort of test suite we could run before the merge happens?