1. Reduce compile time by 22%+ 2. Fix compile linking error on Ubuntu 22.04 gcc/g++ 11.4 with Cuda 12.4 #171
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Test Env:
This PR resolves two issues 1) compile linking failed on my env 2) reduce compile time by ~22%. I expect the diff to be larger if underlying io storage is slow spinners/ssd.
-std=c++17
based on my monitoring of the compilation processes on main branch.Linking Error Stracktrace
Test was performed on a Zen3 2x9334 system with 48 cores (96 threads) but for test I limited via lxd container to max 80 cores/threads. Ran an earlier version of the PR test with full 96 cores/threads and the diff % is same. Between each test I removed python/build and python/csrc/generated dirs.
Main:
PR:
Diff is ~22% but I expect the value to be even larger if underlying io system is slow spinner/sdd since thread contention would kill small io during compile.
Cause of slow compilation is as follows:
Primary cause is thread contention (resource over-subscription). Ninja using all cores by default is not optimal. Fix: Use half the cores but increase the number of threads nvcc can spawn from 1 to 8.
Also reduce the unnecessary compilation of minor archs. For example, Ampere has sm80 and sm86. The only diff is hardware resources such as sm cores, cache size, etc. I did not find any documentation from Nvidia that nvcc actually compile differently due to same arch but different cache size, sm cores. Just compile for the base archs. sm80 for Ampere, sm89 for Ada, sm90 for Hopper.
ref: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html
@yzh119