-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ccache to base-builder. #12675
Add ccache to base-builder. #12675
Conversation
This installs clang wrappers at /ccache/bin, and sets up a build cache at /ccache/cache. To use this, inside the project container we just need to do: ``` export PATH=/ccache/bin:$PATH ``` In another PR, we can store the /ccache/cache somewhere we can pull down at runtime. Some results: Fresh compile: real 0m49.249s user 10m41.818s sys 1m2.097s With ccache cache: real 0m9.877s user 0m6.278s sys 0m19.966s Fresh compile: real 1m17.214s user 0m49.454s sys 0m27.963s With ccache: real 0m34.962s user 0m18.092s sys 0m17.083s
Note: this is likely still complementary to #12608, as a baseline fallback that should always work. The downside here is that a lot of the bigger projects spend a fair bit of time downloading/configuring dependencies (or doing other weird things) which can't be cached by this mechanism. An example is poppler, which takes 16 minutes for a clean build. with ccache, this is only reduced down to 11 minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Nice! I suppose the savings will be larger for projects that take longer to compile (probably where most of our time is spent)
This is neat! So if I get this right, in this case we wouldn't want to use any cached containers but rather rely on the normal OSS-Fuzz approach and the existing If that's the case, then what's the higher-level architecture we'd deploy this in. For example, I think at this point we have multiple viable solutions to the problem: (1) relying on manually generated rebuilder scripts which need a cached container (https://github.com/google/oss-fuzz-gen/tree/main/fuzzer_build_script); (2) using the existing Chronos approach with a cached container; (3) using auto-generated scripts based on #12608 with a cached container; (4) using ccache with original base-images and no cached containers We need some form of reasoning capability around which process to take or some kind of reasoning around the ordering. The current caching from OFG (1 above) works by using a cached container and overwriting The logic for deciding which technique to use could perhaps be put in infra/build/functions/target_experiment.py but am not sure if it's smart to make that more complex. An alternative is to not have the selection process of which technique to use to be during an actual OFG run, but perhaps asynchronously in some manner. In either case need to address how to auto evaluate if a given solution is actually correctly building the updated harness, while still preserving that we can show failed build errors in the OFG experiments (but not really failed errors due to the issues in cache rebuilding I guess -- or at least it's not the most interesting thing to debug when doing an OFG run). Considering that this technique uses just the base images + a cache, why not just go ahead and use it by default in all OSS-Fuzz builds? |
To answer my own questions above, then I probably think I'd prefer to use the |
Agreed this approach will serve as a very useful baseline (and I think it will always work in that it shouldn't cause breakages or make anything wrose). We should just start with this. I think it will still be extremely beneficial to have an approach that would enable rebuilds on the order of seconds for most projects. This will enable LLM applications/agents that need a tighter feedback loop. I think ultimately, we should have a combination of this as the baseline + a saved container approach where we have an autogenerated Is there some way we can determine if the auto-generated recompile script will work ahead of time? Perhaps this is something we can precompute. i.e. for every OSS-Fuzz project,
Then, from the user's perspective, they just need to pull the saved container, and run "compile". Under the hood, this could either be using the recompile script, or the ccache cache. |
So this is what I was getting at with The main issue we need to ensure is that changes to the source of a harness are applied in the actual build afterwards. Am not sure if we need a definite yes in order to determine if a re-build works, since we would need to know the location of harnesses in order to do this, which may not be something we'd be interested in pulling into this approach. Alternatively, we could validate if the contents of "OUT" is similar, and if so declare it successful. Am not sure if checksums are too restrictive here, if not then that would be great. Otherwise perhaps simply size checking all executables and ensure the same nameset of executables are in OUT. Alternatively we could simply say it's successful if a rebuild script didn't crash. We should then do this on a regular basis? or part of the existing build infra? The rest of the approach sounds good though. |
+1. We can likely get by with a very simple heuristic -- clear binaries in $OUT and check if those filenames come back after calling
+1. I think we can just do this as part of OSS-Fuzz infra. |
Careful, I think we need to be smart about how we use this with jcc: https://ccache.dev/manual/3.2.5.html#_using_ccache_with_other_compiler_wrappers |
I don't think we need it nor would it probably work with our auth situation but mozilla has a version of ccache that can save to cloud storage: https://github.com/mozilla/sccache |
Yep, this is captured here: google/oss-fuzz-gen#682. I think the best way is for ccache to wrap jcc?
Yeah I think we can just save this in the image and push it to the registry to avoid any additional syncing. |
This installs clang wrappers at /ccache/bin, and sets up a build cache at /ccache/cache. To use this, inside the project container we just need to do:
In another PR, we can store the /ccache/cache somewhere we can pull down at runtime.
Some results:
Fresh compile:
real 0m49.249s
user 10m41.818s
sys 1m2.097s
With ccache cache:
real 0m9.877s
user 0m6.278s
sys 0m19.966s
Fresh compile:
real 1m17.214s
user 0m49.454s
sys 0m27.963s
With ccache:
real 0m34.962s
user 0m18.092s
sys 0m17.083s