Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go-fuzz: Worker.crasherQueue can grow without bounds #303

Open
rogpeppe opened this issue Sep 28, 2020 · 0 comments
Open

go-fuzz: Worker.crasherQueue can grow without bounds #303

rogpeppe opened this issue Sep 28, 2020 · 0 comments

Comments

@rogpeppe
Copy link

I started investigating this because my go-fuzz process was OOM-killed within 30 minutes two times in a row.

After spending a while puzzling over the heap profile results (before I found the MemProfileRate=0 assignment), I acquired a reasonable profile and a few logs of what happened when it started using lots of memory.

The process in question does crash a lot (a restart rate of over 1/50), which is obviously a big contributor to the problem, but I believe that go-fuzz should continue without using arbitrary memory even in that situation.

Here's a screenshot of the heap profile from one such run (unfortunately I lost the profile from that run), where over 2GB of memory is kept around in Worker.crasherQueue:

image

Although code inspection pointed towards crasherQueue as a possible culprit, I wasn't entirely sure that's what was happening until I reproduced the issue with a log statement added that showed the current size of the queue (including its associated data) whenever the queue slice is grown.

The final line that it printed before I dumped the heap profile was:

crasherQueue 0xc0000ca380 len 37171; space 465993686 (data 430941433; error 26019700; suppression 9032553)

That 466MB was 65% of the total current heap size of 713MB. In previous runs, I observed the total alloc size to rise to more than 8GB, although I wasn't able to obtain a heap profile at that time.

This problem does not always happen! It seems to depend very much on the current workload. It seems like it might be starvation problem, because only one of the worker queues grows in this way.

Here's the whole log printed by that run: https://gist.github.com/rogpeppe/ad97d2c83834c24b0777a4009d71d120

The crasherQueue log lines were produced by this patch to the Worker.noteCrasher method:

+++ b/go-fuzz/worker.go
@@ -628,6 +628,15 @@ func (w *Worker) noteCrasher(data, output []byte, hanged bool) {
 	if _, ok := ro.suppressions[hash(supp)]; ok {
 		return
 	}
+	if len(w.crasherQueue) == cap(w.crasherQueue) {
+		totalData, totalError, totalSuppression := 0, 0, 0
+		for _, a := range w.crasherQueue {
+			totalData += len(a.Data)
+			totalError += len(a.Error)
+			totalSuppression += len(a.Suppression)
+		}
+		log.Printf("crasherQueue %p len %d; space %d (data %d; error %d; suppression %d)", w, len(w.crasherQueue), totalData+totalError+totalSuppression, totalData, totalError, totalSuppression)
+	}
 	w.crasherQueue = append(w.crasherQueue, NewCrasherArgs{
 		Data:        makeCopy(data),
 		Error:       output,
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant