Common Corpus is used to build coverage-minimized corpus data sets for fuzzing.
- Follow the initial setup instructions at "How to Build a Fuzzing Corpus" on the Isosceles blog (steps 1 through 7).
- Compile your target binary with SanitizerCoverage enabled (e.g. with
-fsanitize=address -fsanitize-coverage=trace-pc-guard
). - Setup the configuration variables in the header of
common_corpus.py
. This includes information about the file format, the target command line, and the access keys that are used for reading Common Crawl data on S3. - Run the
common_corpus.py
script and supply the CSV file created above as the first argument.
Corpus files will be created in the out
directory. The tool will output a "+" for each interesting file added to the corpus, and a "." for tests that did not result in new code coverage.