Evaluate Profile-Guided Optimization (PGO) #2288

zamazan4ik · 2023-09-13T01:44:04Z

Hi!

Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are here. LLVM-related results are here.

PGO shows measurable improvements in compiler-like loads (CPython, Clang, Clangd, clang-format, GCC, Rustc, etc.) I think it could be useful to check PGO on Triton as well.

We need to perform PGO benchmarks on Triton. And if it shows improvements - add a note to the documentation about possible improvements in Triton's performance with PGO. Providing an easier way (e.g. a build option) to build Triton with PGO can be useful for the end-users too (and the maintainers, who rebuild packages). Maybe testing Post-Link Optimization techniques (like LLVM BOLT from Facebook) would be interesting too but I recommend starting from the usual PGO.

I think the good starting point here could be recompiling 3rd parties with PGO like LLVM if you have such an option (that's usually a good thing to do if possible).

Another possible caveat is the profile collection. For C++-based binaries, it shouldn't be a problem since the C++-based binaries will dump PGO profiles after the exit if they are built with Instrumentation support (more information about that you can read here). If you are using C++ libraries in Python code - it could be tricky to dump the profiles. There could be an option to write a C++ "wrapper", run on near real-life workloads, collect profiles, recompile the libraries with the profile, and then use it via Python. Also, you can check how Pydantic integrated PGO into their build pipeline (they also have a similar project structure, but instead of C++ they use Rust - there shouldn't be a huge difference here): pydantic/pydantic-core#741 .

Jokeren · 2023-09-13T02:19:57Z

I don't think existing instrumentation-based PGO support GPU applications. Correct me if I were wrong (I had a talk with Tipp long time ago).

zamazan4ik · 2023-09-13T02:23:15Z

I don't think existing instrumentation-based PGO support GPU applications. Correct me if I were wrong (I had a talk with Tipp long time ago).

I didn't hear about using PGO for the executed on a GPU code. But I thought Triton has also a large amount of CPU-only code, hasn't it?

Jokeren · 2023-09-13T02:24:14Z

I didn't hear about using PGO for the executed on a GPU code. But I thought Triton has also a large amount of CPU-only code, hasn't it?

It has, but lots of the overhead can be mitigated through cuda graph

Jokeren closed this as completed Sep 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate Profile-Guided Optimization (PGO) #2288

Evaluate Profile-Guided Optimization (PGO) #2288

zamazan4ik commented Sep 13, 2023

Jokeren commented Sep 13, 2023

zamazan4ik commented Sep 13, 2023

Jokeren commented Sep 13, 2023

Evaluate Profile-Guided Optimization (PGO) #2288

Evaluate Profile-Guided Optimization (PGO) #2288

Comments

zamazan4ik commented Sep 13, 2023

Jokeren commented Sep 13, 2023

zamazan4ik commented Sep 13, 2023

Jokeren commented Sep 13, 2023