-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heuristic: Run jobs with large inputs before jobs with small inputs #1175
Comments
I'm not sure compile time is uniquely a function of file size.. I can imagine a small file that's a tough nut to crack for the compiler, and a large file that is trivial.. |
It's not. Nonetheless, compilation time is certainly positively correlated with file size. Thus the benefit of this heuristic. |
Also maybe ninja can do some sort of PGO (profile guided optimization) 👍 like collect time information how long in took to compile each file and use this information to prioritize files in next build. |
I don't think any heuristic based on source files will make sense. For instance we have less than thousand line C++ file which compile almost a minute (don't ask). Ultimately it depends how hard is the file for preprocessor and how many templates it will instantiate during compilation. Of course one can assume that 30k-line file will compile longer than 1k one, but that's not decision that ninja should make in my opinion.
This is already done in .ninja_log You can use this script https://github.com/nico/ninjatracing to parse it and load up in chrome to see how build process went.
so since we already have the build time info ninja could indeed use that info in next builds. Could actually help. EDIT: Ok, I have actually taken a look at compilation times. The file which compiles the longest is https://github.com/llvm-mirror/clang/blob/master/lib/ASTMatchers/Dynamic/Registry.cpp and it has only 600 lines. So file size metric doesn't work at all for this particular file. Also I don't see any bottlenecks in the process. But might help that I build with 13 threads. |
Hi, in case it's not clear, I actually work on LLVM and clang. I spent the past few weeks optimizing the speed of the clang/llvm backend, and in fact it was during this process that I ran into this problem. You're correct that a clean rebuild of clang packs well. But if you touch certain headers, you can easily end up waiting for one of these very long compiles, such as x86 isel lowering (which also happens to appear late in the alphabet, exacerbating the problem). The analysis above is not statistically meaningful because it looks at just one file. Here's a scatterplot of cpp file size (x-axis) versus compile time (y-axis) on my machine of doing a clean build of clang. If there were no correlation between file size and compilation time, we would expect this to look like random noise. But as you can see, there is a decent correlation between file size and compilation time. Anyway, think using past compilation times to inform the ordering is a great idea. |
Maybe I'm not being clear with my suggestion. The suggestion is, if you are going to make an arbitrary choice between building A and B, then maybe use some heuristic to figure out which target is going to take longer to build. The heuristic of "look at how long they took to build last time" is great, way better than my suggestion of looking at file sizes. But in the absence of historical build times, I'd expect file sizes still to be useful. Again, if a better heuristic (such as one of the ones you mentioned) applies, you should use that. But since I am only suggesting that this heuristic be applied as a last resort, I don't think the fact that it's not "universal" makes a difference. We were going to make an arbitrary choice, and now we make a slightly less-arbitrary choice. Anyway looking at historical build times is so much better, I don't really care if you look at the file sizes. :) |
Ninja has a heuristic for making arbitrary choices between building A and B: it builds whichever was first in the manifest. Could you put your heuristic in your build.ninja generator instead? |
@colincross That is fine for the initial ordering, but subsequent orderings can use the timing to be more accurate than generators could ever hope to be. Personally, I'm not too worried about initial build time, but generators could optimize that way. |
I think using the file size may not be enough for "profiling". For instance, a reeeally small cpp file with a lot of includes could be really long to process. And non-compilation processes (parsing, copy, file generations…) are not related to file size. |
Ninja does save the times for jobs, so it has that information. The first build may not be optimal, but that's already the case since the ninja deps log is not populated either. |
This comment was marked as abuse.
This comment was marked as abuse.
@jlebar, Ivan |
@advancedwebdeveloper indeed, there's sort of an assumption in Ninja that you have enough RAM to compile everything in parallel with the |
I came here looking for this feature, but I found this open issue instead. I hope it's clear that this "prioritize based on historical build-times" feature request is the sensible one to follow here. Can we change to title to attract more attention and avoid dups? Has anyone experimented with this yet? I'm not trying to continue the argument about whether input size is a more useful heuristic than random guessing in the absence of historical records. It clearly is! I only want to press for using the log of previous builds intelligently and I don't want to create another issue to request it since this one already exists. |
This comment was marked as abuse.
This comment was marked as abuse.
I think so!! |
In the LLVM build, we have some files which take much longer to compile than others. The filename of one of these -- a 30,000-line C++ file (don't ask) -- is pretty late alphabetically, and it often ends up spinning long after most other targets have finished building.
It would be cool if ninja could, as a heuristic, look at the combined sizes of a rule's inputs and run rules with larger inputs first.
The text was updated successfully, but these errors were encountered: