-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: SIGILL in runtime.mapassign1 unmarshalling large json document #11286
Comments
Another variant:
|
Here we go.
Different crashes, e.g.:
|
Bisected to 54af9a3. |
CC @RLH @aclements |
I ran it for 10000 crashes over lunch. 6 bad map state Every single crash either (1) had both SetMapIndex and hashmap.go in its stack trace or (2) failed the "want hello world test". I'm proceeding on the assumption that I want to exclude (or not) escape analysis, even though bisection strongly suggests GC. |
A refined test case:
|
The reason for the SIGILL is that the b.tophash[i] on hashmap.go:446 constructs and attempts to dereference a non-canonical address. We're not yet sure where that bad address came from. One suspicious thing, if lldb is to be believed, is that in mapassign1, t.typ.gcdata is {1,2,3,4,5,6,7,8,9,10,...}. kindGCProg is set, and this is not a reasonable GC program. t should be the type structure of the map[string]struct{} type. So, we may be building bad GC bitmaps, or corrupting them. Checkmark mode does not complain and the test still fails with gcstoptheworld=1 and with gcstackbarrieroff=1. It does not fail with GOGC=0. All of this, as well as the bisect result, is consistent with bad bitmaps (but not the only possible explanation, of course). |
This was a red herring. kindGCProg is not set, and the value is simply a pointer, so t.type.gcdata is actually, effectively, {1}. It looks so funny because it points to the very beginning of the gcbits section, which apparently has been sorted.
|
The hmap header is corrupted, which is how we're getting to the bad pointer that's causing the crash.
count, flags, and B are definitely wrong. Since B is wrong, we computed a bad bucket index and went off into the woods. The buckets field, however, appears to be fine. It points to an array of bmaps that appear to have the expected strings in them. The fields after that also look reasonable. |
Not corrupted. Just freed.
But it has nothing to do with missing write barriers (because it still fails with gcstoptheworld=1) or bad stack barriers (because it still fails with gcstackbarrieroff=1), which would be the most obvious culprits. |
I understand the problem. Our heap bitmap unroller incorrectly lays out the heap bitmap for repetitions of a type with pointers at the beginning and a large enough scalar tail, which is exactly what the backing store for []C is. I've got a fix. I'm working on figuring out if there's a better fix and I'll put together a CL. |
CL https://golang.org/cl/11423 mentions this issue. |
This adds a GC bitmap test of a type with many pointer bits and a large scalar tail, such as the one in issue #11286. This test would have failed prior to the fix in a8ae93f. This test is a more direct version of the test introduced in that commit (which was distilled from the failing test in the issue). Change-Id: I2e716cd1000b49bde237f5da6d857e8983fe7e7a Reviewed-on: https://go-review.googlesource.com/11423 Reviewed-by: Russ Cox <[email protected]>
With current tip (
go version devel +75ce330 Fri Jun 19 06:14:38 2015 +0000 darwin/amd64
), I see memory corruption issues when unmarshalling a large json document. Usually, I seeSIGILL
, sometimes just corrupted data. Tweaking the number of procs or garbage collection didn't seem to have reliable effects in either direction.My test code is below. Unfortunately, I haven't been able to reduce the struct definition and the json document to something I'm comfortable sharing publicly so far.
The test works fine with go1.4.
I'll keep trying to reduce the test case, but let me know if you have any other suggestions/questions.
The text was updated successfully, but these errors were encountered: