cmd/go: support corpus minimization #49290

katiehockman · 2021-11-02T15:34:12Z

We should support corpus minimization, which will mean the ability to remove items from the on-disk corpus which either 1) don't have the types that are supported by the fuzz test (e.g. they are left over from a previous version of the test which took different params), or 2) don't expand any new coverage that isn't already provided by other entries in the corpus.

(1) should be pretty straightforward. We can just unmarshal the contents of the file, and see if it matches. If it doesn't, delete it.

(2) will be a bit more involved, but not necessarily that complicated. We should take a look at how libFuzzer implements this. One potential solution would be to maintain a coverage map, and run each corpus item against the fuzz test in turn, updating the map is it runs. If any of them don't expand coverage, delete it. A potential pitfall of this: if we have 20 corpus entries that each expand 1 line, and 1 corpus entry that covers all 20 of those lines at once, which do we choose?

At least (2) will be needed for OSS-Fuzz integration, if they end up supporting native support.

bcmills · 2021-11-02T18:37:12Z

A thought regarding (2): if we have an effective minimization algorithm, is it actually necessary to prune the corpus on disk? It probably is worthwhile to prune the inputs stored in the Go build cache, but arguably the former-crashers in the testdata/fuzz directory should be retained as regression tests.

Perhaps we could instead start each fuzzing run by spending some fraction of the time budget re-minimizing the corpus. That would also avoid losing coverage if (for example) the code is modified in a way that significantly changes coverage before refactoring back to something closer to a previous approach (for which the pruned-out inputs might actually become relevant again).

katiehockman · 2021-11-02T18:55:51Z

It probably is worthwhile to prune the inputs stored in the Go build cache, but arguably the former-crashers in the testdata/fuzz directory should be retained as regression tests.

Yes agreed. When I talk about "corpus minimization" here, I'm only referring to the corpus in the build cache. We shouldn't touch the seed corpus.

Perhaps we could instead start each fuzzing run by spending some fraction of the time budget re-minimizing the corpus.

We could do this, but I would argue it should be opt in. e.g. go test -fuzz=Fuzz -fuzzminimizecorpus
Otherwise, I'm imagining a scenario where someone has a fuzz target that accepts []byte. In playing around with their test, they change the target to accept []byte, int. If we prune the corpus by default, then when they run go test -fuzz Fuzz, all of the cache corpus that only accepts []byte is going to be deleted before the user realizes what happened.

katiehockman · 2021-11-02T20:45:09Z

@dgryski pointed me to https://arxiv.org/pdf/1905.13055.pdf. Commenting here so we can find it later.

ianlancetaylor · 2022-06-24T19:21:04Z

Moving to Backlog.

morehouse · 2023-03-30T21:11:50Z

Bump.

Because corpus minimization/merging features are missing, it is difficult for major Go projects to maintain public corpora. PRs adding new inputs to the corpus can't be easily evaluated to determine which seeds actually increase coverage, and we essentially have to accept all new inputs as a package based on coverprofiles before/after the PR. Over time, this can lead to major bloat of the corpora.

Those serious about fuzzing Go end up resorting to libFuzzer build mode and doing hacky things to measure code coverage.

It would be so much nicer if we could just use the native fuzzing tools.

katiehockman added NeedsFix The path to resolution is known, but the work has not been done. fuzz Issues related to native fuzzing support labels Nov 2, 2021

katiehockman added this to the Go1.19 milestone Nov 2, 2021

katiehockman mentioned this issue Dec 15, 2021

cmd/go: support continuously running fuzzing with OSS-Fuzz #50192

Closed

joedian added this to Release Blockers Jun 15, 2022

ianlancetaylor modified the milestones: Go1.19, Backlog Jun 24, 2022

capnspacehook mentioned this issue Jul 14, 2022

internal/fuzz: deduplicate interesting inputs #48303

Open

julieqiu added this to Go Security Sep 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/go: support corpus minimization #49290

cmd/go: support corpus minimization #49290

katiehockman commented Nov 2, 2021 •

edited

Loading

bcmills commented Nov 2, 2021

katiehockman commented Nov 2, 2021

katiehockman commented Nov 2, 2021

ianlancetaylor commented Jun 24, 2022

morehouse commented Mar 30, 2023

cmd/go: support corpus minimization #49290

cmd/go: support corpus minimization #49290

Comments

katiehockman commented Nov 2, 2021 • edited Loading

bcmills commented Nov 2, 2021

katiehockman commented Nov 2, 2021

katiehockman commented Nov 2, 2021

ianlancetaylor commented Jun 24, 2022

morehouse commented Mar 30, 2023

katiehockman commented Nov 2, 2021 •

edited

Loading