You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are here. LLVM-related results are here.
PGO shows measurable improvements in compiler-like loads (CPython, Clang, Clangd, clang-format, GCC, Rustc, etc.) I think it could be useful to check PGO on Triton as well.
We need to perform PGO benchmarks on Triton. And if it shows improvements - add a note to the documentation about possible improvements in Triton's performance with PGO. Providing an easier way (e.g. a build option) to build Triton with PGO can be useful for the end-users too (and the maintainers, who rebuild packages). Maybe testing Post-Link Optimization techniques (like LLVM BOLT from Facebook) would be interesting too but I recommend starting from the usual PGO.
I think the good starting point here could be recompiling 3rd parties with PGO like LLVM if you have such an option (that's usually a good thing to do if possible).
Another possible caveat is the profile collection. For C++-based binaries, it shouldn't be a problem since the C++-based binaries will dump PGO profiles after the exit if they are built with Instrumentation support (more information about that you can read here). If you are using C++ libraries in Python code - it could be tricky to dump the profiles. There could be an option to write a C++ "wrapper", run on near real-life workloads, collect profiles, recompile the libraries with the profile, and then use it via Python. Also, you can check how Pydantic integrated PGO into their build pipeline (they also have a similar project structure, but instead of C++ they use Rust - there shouldn't be a huge difference here): pydantic/pydantic-core#741 .
The text was updated successfully, but these errors were encountered:
Hi!
Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are here. LLVM-related results are here.
PGO shows measurable improvements in compiler-like loads (CPython, Clang, Clangd, clang-format, GCC, Rustc, etc.) I think it could be useful to check PGO on Triton as well.
We need to perform PGO benchmarks on Triton. And if it shows improvements - add a note to the documentation about possible improvements in Triton's performance with PGO. Providing an easier way (e.g. a build option) to build Triton with PGO can be useful for the end-users too (and the maintainers, who rebuild packages). Maybe testing Post-Link Optimization techniques (like LLVM BOLT from Facebook) would be interesting too but I recommend starting from the usual PGO.
I think the good starting point here could be recompiling 3rd parties with PGO like LLVM if you have such an option (that's usually a good thing to do if possible).
Another possible caveat is the profile collection. For C++-based binaries, it shouldn't be a problem since the C++-based binaries will dump PGO profiles after the exit if they are built with Instrumentation support (more information about that you can read here). If you are using C++ libraries in Python code - it could be tricky to dump the profiles. There could be an option to write a C++ "wrapper", run on near real-life workloads, collect profiles, recompile the libraries with the profile, and then use it via Python. Also, you can check how Pydantic integrated PGO into their build pipeline (they also have a similar project structure, but instead of C++ they use Rust - there shouldn't be a huge difference here): pydantic/pydantic-core#741 .
The text was updated successfully, but these errors were encountered: