Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: SIGILL in runtime.mapassign1 unmarshalling large json document #11286

Closed
robx opened this issue Jun 19, 2015 · 12 comments
Closed

runtime: SIGILL in runtime.mapassign1 unmarshalling large json document #11286

robx opened this issue Jun 19, 2015 · 12 comments
Milestone

Comments

@robx
Copy link

robx commented Jun 19, 2015

With current tip (go version devel +75ce330 Fri Jun 19 06:14:38 2015 +0000 darwin/amd64), I see memory corruption issues when unmarshalling a large json document. Usually, I see SIGILL, sometimes just corrupted data. Tweaking the number of procs or garbage collection didn't seem to have reliable effects in either direction.

My test code is below. Unfortunately, I haven't been able to reduce the struct definition and the json document to something I'm comfortable sharing publicly so far.

The test works fine with go1.4.

I'll keep trying to reduce the test case, but let me know if you have any other suggestions/questions.

package campaigns

import (
        "encoding/json"
        "fmt"
        "io/ioutil"
        "testing"
)

type ConfigState struct{
...
}

func TestUnmarshal(t *testing.T) {
        content, err := ioutil.ReadFile("campaigns_zero.json")
        if err != nil {
                t.Error(err)
        }

        var cs []ConfigState
        if err := json.Unmarshal(content, &cs); err != nil {
                t.Error(err)
        }
        for _, c := range cs {
                fmt.Printf("campaign %q", c.CampaignID) // prints garbage when no crash
        }
}
SIGILL: illegal instruction
PC=0x97da m=3

goroutine 8 [running]:
runtime.mapassign1(0x2a1d80, 0xc8206c14d0, 0xc820a2f2d0, 0xc82002e068)
    /s/go/src/runtime/hashmap.go:446 +0x1fa fp=0xc8205e6b28 sp=0xc8205e6a80
reflect.mapassign(0x2a1d80, 0xc8206c14d0, 0xc820a2f2d0, 0xc82002e068)
    /s/go/src/runtime/hashmap.go:955 +0x3f fp=0xc8205e6b50 sp=0xc8205e6b28
reflect.Value.SetMapIndex(0x2a1d80, 0xc820a495f8, 0xd5, 0x2a3e80, 0xc820a2f2d0, 0x58, 0x2a2000, 0xc82002e068, 0xd5)
    /s/go/src/reflect/value.go:1469 +0x31a fp=0xc8205e6c00 sp=0xc8205e6b50
encoding/json.(*decodeState).object(0xc820092000, 0x2a1d80, 0xc820a495f8, 0xd5)
    /s/go/src/encoding/json/decode.go:613 +0xe23 fp=0xc8205e6fa8 sp=0xc8205e6c00
encoding/json.(*decodeState).value(0xc820092000, 0x2a1d80, 0xc820a495f8, 0xd5)
    /s/go/src/encoding/json/decode.go:289 +0x3bd fp=0xc8205e7080 sp=0xc8205e6fa8
encoding/json.(*decodeState).object(0xc820092000, 0x33cb00, 0xc820a493b8, 0xd9)
    /s/go/src/encoding/json/decode.go:606 +0x1151 fp=0xc8205e7428 sp=0xc8205e7080
encoding/json.(*decodeState).value(0xc820092000, 0x33cb00, 0xc820a493b8, 0xd9)
    /s/go/src/encoding/json/decode.go:289 +0x3bd fp=0xc8205e7500 sp=0xc8205e7428
encoding/json.(*decodeState).array(0xc820092000, 0x2952a0, 0xc82000a2c0, 0xd7)
    /s/go/src/encoding/json/decode.go:440 +0xa6b fp=0xc8205e7708 sp=0xc8205e7500
encoding/json.(*decodeState).value(0xc820092000, 0x2863c0, 0xc82000a2c0, 0x16)
    /s/go/src/encoding/json/decode.go:286 +0x3dd fp=0xc8205e77e0 sp=0xc8205e7708
encoding/json.(*decodeState).unmarshal(0xc820092000, 0x2863c0, 0xc82000a2c0, 0x0, 0x0)
    /s/go/src/encoding/json/decode.go:150 +0x196 fp=0xc8205e78b8 sp=0xc8205e77e0
encoding/json.Unmarshal(0xc8200e8000, 0x4e1750, 0x4e1950, 0x2863c0, 0xc82000a2c0, 0x0, 0x0)
    /s/go/src/encoding/json/decode.go:78 +0x12b fp=0xc8205e7928 sp=0xc8205e78b8
realzeit.io/rtb/util/campaigns.TestUnmarshal(0xc8200b6090)
    /go/src/realzeit.io/rtb/util/campaigns/unmarshal_test.go:17 +0x1c0 fp=0xc8205e7f48 sp=0xc8205e7928
testing.tRunner(0xc8200b6090, 0x556030)
    /s/go/src/testing/testing.go:455 +0x98 fp=0xc8205e7f80 sp=0xc8205e7f48
runtime.goexit()
    /s/go/src/runtime/asm_amd64.s:1691 +0x1 fp=0xc8205e7f88 sp=0xc8205e7f80
created by testing.RunTests
    /s/go/src/testing/testing.go:560 +0x86d

goroutine 1 [chan receive]:
testing.RunTests(0x4445c0, 0x556030, 0x1, 0x1, 0xc800000001)
    /s/go/src/testing/testing.go:561 +0x8ad
testing.(*M).Run(0xc82003df08, 0xc8200e25e8)
    /s/go/src/testing/testing.go:493 +0x70
main.main()
    realzeit.io/rtb/util/campaigns/_test/_testmain.go:54 +0x116

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /s/go/src/runtime/asm_amd64.s:1691 +0x1

goroutine 7 [chan receive]:
github.com/golang/glog.(*loggingT).flushDaemon(0x55e3e0)
    /go/src/github.com/golang/glog/glog.go:879 +0x67
created by github.com/golang/glog.init.1
    /go/src/github.com/golang/glog/glog.go:410 +0x297

rax    0x0
rbx    0x0
rcx    0x2fb8e8bb4121f470
rdx    0xc820074300
rdi    0xc8206c14d0
rsi    0xc81fffcac5
rbp    0x2fb8e8bb4121f470
rsp    0xc8205e6a80
r8     0x2ed3e
r9     0xf1
r10    0x0
r11    0x0
r12    0x1a
r13    0x444058
r14    0x8
r15    0x0
rip    0x97da
rflags 0x10293
cs     0x2b
fs     0x0
gs     0x0
exit status 2
@robx
Copy link
Author

robx commented Jun 19, 2015

Another variant:

unexpected fault address 0xb01dfacedebac1e
fatal error: fault
[signal 0xb code=0x1 addr=0xb01dfacedebac1e pc=0xae64]

goroutine 18 [running]:
runtime.throw(0x39cc50, 0x5)
    /s/go/src/runtime/panic.go:527 +0x96 fp=0xc8200d0950 sp=0xc8200d0938
runtime.sigpanic()
    /s/go/src/runtime/sigpanic_unix.go:27 +0x2ba fp=0xc8200d09a0 sp=0xc8200d0950
runtime.evacuate(0x2a1d80, 0xc821b6c1e0, 0x331bb269043)
    /s/go/src/runtime/hashmap.go:803 +0x94 fp=0xc8200d0a60 sp=0xc8200d09a0
runtime.growWork(0x2a1d80, 0xc821b6c1e0, 0x331bb269043)
    /s/go/src/runtime/hashmap.go:791 +0x5e fp=0xc8200d0a80 sp=0xc8200d0a60
runtime.mapassign1(0x2a1d80, 0xc821b6c1e0, 0xc821b68450, 0xc8200ee1d8)
    /s/go/src/runtime/hashmap.go:433 +0x175 fp=0xc8200d0b28 sp=0xc8200d0a80
reflect.mapassign(0x2a1d80, 0xc821b6c1e0, 0xc821b68450, 0xc8200ee1d8)
    /s/go/src/runtime/hashmap.go:955 +0x3f fp=0xc8200d0b50 sp=0xc8200d0b28
reflect.Value.SetMapIndex(0x2a1d80, 0xc821427740, 0xd5, 0x2a3e80, 0xc821b68450, 0x58, 0x2a2000, 0xc8200ee1d8, 0xd5)
    /s/go/src/reflect/value.go:1469 +0x31a fp=0xc8200d0c00 sp=0xc8200d0b50
encoding/json.(*decodeState).object(0xc8200bc000, 0x2a1d80, 0xc821427740, 0xd5)
    /s/go/src/encoding/json/decode.go:613 +0xe23 fp=0xc8200d0fa8 sp=0xc8200d0c00
encoding/json.(*decodeState).value(0xc8200bc000, 0x2a1d80, 0xc821427740, 0xd5)
    /s/go/src/encoding/json/decode.go:289 +0x3bd fp=0xc8200d1080 sp=0xc8200d0fa8
encoding/json.(*decodeState).object(0xc8200bc000, 0x33cb00, 0xc821427500, 0xd9)
    /s/go/src/encoding/json/decode.go:606 +0x1151 fp=0xc8200d1428 sp=0xc8200d1080
encoding/json.(*decodeState).value(0xc8200bc000, 0x33cb00, 0xc821427500, 0xd9)
    /s/go/src/encoding/json/decode.go:289 +0x3bd fp=0xc8200d1500 sp=0xc8200d1428
encoding/json.(*decodeState).array(0xc8200bc000, 0x2952a0, 0xc820094140, 0xd7)
    /s/go/src/encoding/json/decode.go:440 +0xa6b fp=0xc8200d1708 sp=0xc8200d1500
encoding/json.(*decodeState).value(0xc8200bc000, 0x2863c0, 0xc820094140, 0x16)
    /s/go/src/encoding/json/decode.go:286 +0x3dd fp=0xc8200d17e0 sp=0xc8200d1708
encoding/json.(*decodeState).unmarshal(0xc8200bc000, 0x2863c0, 0xc820094140, 0x0, 0x0)
    /s/go/src/encoding/json/decode.go:150 +0x196 fp=0xc8200d18b8 sp=0xc8200d17e0
encoding/json.Unmarshal(0xc8200fa000, 0x79bddb, 0x79bfdb, 0x2863c0, 0xc820094140, 0x0, 0x0)
    /s/go/src/encoding/json/decode.go:78 +0x12b fp=0xc8200d1928 sp=0xc8200d18b8
realzeit.io/rtb/util/campaigns.TestUnmarshal(0xc82009e120)
    /go/src/realzeit.io/rtb/util/campaigns/unmarshal_test.go:17 +0x1c0 fp=0xc8200d1f48 sp=0xc8200d1928
testing.tRunner(0xc82009e120, 0x556030)
    /s/go/src/testing/testing.go:455 +0x98 fp=0xc8200d1f80 sp=0xc8200d1f48
runtime.goexit()
    /s/go/src/runtime/asm_amd64.s:1691 +0x1 fp=0xc8200d1f88 sp=0xc8200d1f80
created by testing.RunTests
    /s/go/src/testing/testing.go:560 +0x86d

goroutine 1 [chan receive]:
testing.RunTests(0x4445a0, 0x556030, 0x1, 0x1, 0xc8200e5c01)
    /s/go/src/testing/testing.go:561 +0x8ad
testing.(*M).Run(0xc82003df08, 0xc8200e8f38)
    /s/go/src/testing/testing.go:493 +0x70
main.main()
    realzeit.io/rtb/util/campaigns/_test/_testmain.go:54 +0x116

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /s/go/src/runtime/asm_amd64.s:1691 +0x1

goroutine 5 [chan receive]:
github.com/golang/glog.(*loggingT).flushDaemon(0x55e3e0)
    /go/src/github.com/golang/glog/glog.go:879 +0x67
created by github.com/golang/glog.init.1
    /go/src/github.com/golang/glog/glog.go:410 +0x297
exit status 2
FAIL    realzeit.io/rtb/util/campaigns  0.226s

@robx
Copy link
Author

robx commented Jun 19, 2015

Here we go.

package crash

import (
        "encoding/json"
        "strconv"
        "testing"
)

type C struct {
        A string
        X [200]float64
        B map[string]struct{}
        Y [200]float64
}

func TestUnmarshal(t *testing.T) {

        c := C{
                A: "hello world",
                B: map[string]struct{}{},
        }
        for i := 0; i < 1000; i++ {
                c.B[strconv.Itoa(i)] = struct{}{}
        }

        var cs []C
        for i := 0; i < 250; i++ {
                cs = append(cs, c)
        }

        data, err := json.Marshal(cs)
        if err != nil {
                t.Error(err)
        }

        cs = nil
        if err := json.Unmarshal(data, &cs); err != nil {
                t.Error(err)
        }
        for _, c := range cs {
                if have, want := c.A, "hello world"; have != want {
                        t.Errorf("have %s, want %s", have, want)
                }
        }
}

Different crashes, e.g.:

unexpected fault address 0xb01dfacedebac1e
fatal error: fault
[signal 0xb code=0x1 addr=0xb01dfacedebac1e pc=0xac24]

goroutine 19 [running]:
runtime.throw(0x1ac1f0, 0x5)
    /s/go/src/runtime/panic.go:527 +0x96 fp=0x82044e800 sp=0x82044e7e8
runtime.sigpanic()
    /s/go/src/runtime/sigpanic_unix.go:27 +0x2ba fp=0x82044e850 sp=0x82044e800
runtime.evacuate(0x136bc0, 0x8203668d0, 0x88502c794dbcc7a2)
    /s/go/src/runtime/hashmap.go:803 +0x94 fp=0x82044e910 sp=0x82044e850
runtime.growWork(0x136bc0, 0x8203668d0, 0x88502c794dbcc7a2)
    /s/go/src/runtime/hashmap.go:791 +0x5e fp=0x82044e930 sp=0x82044e910
runtime.mapassign1(0x136bc0, 0x8203668d0, 0x820862800, 0x2b5c70)
    /s/go/src/runtime/hashmap.go:433 +0x175 fp=0x82044e9d8 sp=0x82044e930
reflect.mapassign(0x136bc0, 0x8203668d0, 0x820862800, 0x2b5c70)
    /s/go/src/runtime/hashmap.go:955 +0x3f fp=0x82044ea00 sp=0x82044e9d8
reflect.Value.SetMapIndex(0x136bc0, 0x8206a2830, 0xd5, 0x137ac0, 0x820862800, 0x58, 0x1361e0, 0x2b5c70, 0xd9)
    /s/go/src/reflect/value.go:1469 +0x31a fp=0x82044eab0 sp=0x82044ea00
encoding/json.(*decodeState).object(0x82043e000, 0x136bc0, 0x8206a2830, 0xd5)
    /s/go/src/encoding/json/decode.go:613 +0xe23 fp=0x82044ee58 sp=0x82044eab0
encoding/json.(*decodeState).value(0x82043e000, 0x136bc0, 0x8206a2830, 0xd5)
    /s/go/src/encoding/json/decode.go:289 +0x3bd fp=0x82044ef30 sp=0x82044ee58
encoding/json.(*decodeState).object(0x82043e000, 0x181ce0, 0x8206a21e0, 0xd9)
    /s/go/src/encoding/json/decode.go:606 +0x1151 fp=0x82044f2d8 sp=0x82044ef30
encoding/json.(*decodeState).value(0x82043e000, 0x181ce0, 0x8206a21e0, 0xd9)
    /s/go/src/encoding/json/decode.go:289 +0x3bd fp=0x82044f3b0 sp=0x82044f2d8
encoding/json.(*decodeState).array(0x82043e000, 0x12e860, 0x820372020, 0xd7)
    /s/go/src/encoding/json/decode.go:440 +0xa6b fp=0x82044f5b8 sp=0x82044f3b0
encoding/json.(*decodeState).value(0x82043e000, 0x126fa0, 0x820372020, 0x16)
    /s/go/src/encoding/json/decode.go:286 +0x3dd fp=0x82044f690 sp=0x82044f5b8
encoding/json.(*decodeState).unmarshal(0x82043e000, 0x126fa0, 0x820372020, 0x0, 0x0)
    /s/go/src/encoding/json/decode.go:150 +0x196 fp=0x82044f768 sp=0x82044f690
encoding/json.Unmarshal(0x8209fe000, 0x251c01, 0x412881, 0x126fa0, 0x820372020, 0x0, 0x0)
    /s/go/src/encoding/json/decode.go:78 +0x12b fp=0x82044f7d8 sp=0x82044f768
realzeit.io/rtb/util/campaigns.TestUnmarshal(0x8203a0000)
    /go/src/realzeit.io/rtb/util/campaigns/unmarshal_test.go:37 +0x3ba fp=0x820451f38 sp=0x82044f7d8
testing.tRunner(0x8203a0000, 0x28c3a0)
    /s/go/src/testing/testing.go:455 +0x98 fp=0x820451f70 sp=0x820451f38
runtime.goexit()
    /s/go/src/runtime/asm_amd64.s:1691 +0x1 fp=0x820451f78 sp=0x820451f70
created by testing.RunTests
    /s/go/src/testing/testing.go:560 +0x86d

goroutine 1 [chan receive]:
testing.RunTests(0x1ffb00, 0x28c3a0, 0x1, 0x1, 0x1)
    /s/go/src/testing/testing.go:561 +0x8ad
testing.(*M).Run(0x820337f08, 0xffffffffffffffff)
    /s/go/src/testing/testing.go:493 +0x70
main.main()
    realzeit.io/rtb/util/campaigns/_test/_testmain.go:54 +0x116
exit status 2

@robx
Copy link
Author

robx commented Jun 19, 2015

Bisected to 54af9a3.

@ianlancetaylor
Copy link
Member

CC @RLH @aclements

@ianlancetaylor ianlancetaylor added this to the Go1.5Maybe milestone Jun 19, 2015
@dr2chase
Copy link
Contributor

I ran it for 10000 crashes over lunch.

6 bad map state
1022 fatal error: evacuation not done in time
30 fatal error: unexpected signal during runtime execution
135 have NUL...NUL, want hello world
166 yucky variations on not getting hello world.
5905 SIGILL: illegal instruction
2737 unexpected fault address 0xb01dfacedebac1e

Every single crash either (1) had both SetMapIndex and hashmap.go in its stack trace or (2) failed the "want hello world test".

I'm proceeding on the assumption that I want to exclude (or not) escape analysis, even though bisection strongly suggests GC.

@mikioh mikioh changed the title SIGILL in runtime.mapassign1 unmarshalling large json document runtime: SIGILL in runtime.mapassign1 unmarshalling large json document Jun 19, 2015
@dr2chase
Copy link
Contributor

A refined test case:

package main

import (
    "encoding/json"
    "fmt"
    "strconv"
)

const D = 55 // 54 good, 55 bad

type C struct {
    A string              // 16 bytes
    X [D]float64          // 8*D bytes
    B map[string]struct{} // 48 bytes
    Y [D]float64          // 8*D bytes
    // total is 64 + 16*D bytes.
}

func main() {

    c := C{
        A: "hello world",
        B: map[string]struct{}{},
    }
    for i := 0; i < 1000; i++ {
        c.B[strconv.Itoa(i)] = struct{}{}
    }

    var cs []C
    for i := 0; i < 250; i++ {
        cs = append(cs, c)
    }

    data, err := json.Marshal(cs)
    if err != nil {
        fmt.Println(err)
    }

    cs = nil
    if err := json.Unmarshal(data, &cs); err != nil {
        fmt.Println(err)
    }
    for _, c := range cs {
        if have, want := c.A, "hello world"; have != want {
            fmt.Printf("have %s, want %s\n", have, want)
        }
    }
}

@aclements
Copy link
Member

The reason for the SIGILL is that the b.tophash[i] on hashmap.go:446 constructs and attempts to dereference a non-canonical address. We're not yet sure where that bad address came from.

One suspicious thing, if lldb is to be believed, is that in mapassign1, t.typ.gcdata is {1,2,3,4,5,6,7,8,9,10,...}. kindGCProg is set, and this is not a reasonable GC program. t should be the type structure of the map[string]struct{} type. So, we may be building bad GC bitmaps, or corrupting them.

Checkmark mode does not complain and the test still fails with gcstoptheworld=1 and with gcstackbarrieroff=1. It does not fail with GOGC=0. All of this, as well as the bisect result, is consistent with bad bitmaps (but not the only possible explanation, of course).

@aclements
Copy link
Member

t.typ.gcdata is {1,2,3,4,5,6,7,8,9,10,...}

This was a red herring. kindGCProg is not set, and the value is simply a pointer, so t.type.gcdata is actually, effectively, {1}. It looks so funny because it points to the very beginning of the gcbits section, which apparently has been sorted.

(gdb) print *t
$5 = {typ = {size = 8, ptrdata = 8, hash = 3780268472, _unused = 0 '\000', 
    align = 8 '\b', fieldalign = 8 '\b', kind = 53 '5', 
    alg = 0x5f1fe0 <runtime.algarray+192>, 
    gcdata = 0x57c8fd <runtime.gcbits> "\001\002\003\004\005\006\a\b\t\n\v\r\016\017\020\021\022\025\026\031\032\033\037,568>?Ur\236\237\325\365\377", 
    _string = "map[string]struct {}", x = 0x0, ptrto = 0x0, 
    zero = 0x57d720 <runtime.zerovalue> ""}, key = 0x4e5520, elem = 0x4e45c0, 
  bucket = 0x515e40, hmap = 0x522380, keysize = 16 '\020', 
  indirectkey = false, valuesize = 0 '\000', indirectvalue = false, 
  bucketsize = 144, reflexivekey = true}

@aclements
Copy link
Member

The hmap header is corrupted, which is how we're getting to the bad pointer that's causing the crash.

(gdb) print *h
$7 = {count = 859530937904, flags = 173 '\255', B = 222 '\336', 
  hash0 = 3735936685, buckets = 0xc82038a000, oldbuckets = 0x0, 
  nevacuate = 128, overflow = 0x0}

count, flags, and B are definitely wrong. Since B is wrong, we computed a bad bucket index and went off into the woods. The buckets field, however, appears to be fine. It points to an array of bmaps that appear to have the expected strings in them. The fields after that also look reasonable.

@aclements
Copy link
Member

Not corrupted. Just freed.

(gdb) print/x *h
$3 = {count = 0xc8200125a0, flags = 0xad, B = 0xde, hash0 = 0xdeaddead, 
  buckets = 0xc82023a000, oldbuckets = 0x0, nevacuate = 0x40, overflow = 0x0}
(gdb) x/6xg h
0xc820012570:   0x000000c8200125a0  0xdeaddeaddeaddead
0xc820012580:   0x000000c82023a000  0x0000000000000000
0xc820012590:   0x0000000000000040  0x0000000000000000

But it has nothing to do with missing write barriers (because it still fails with gcstoptheworld=1) or bad stack barriers (because it still fails with gcstackbarrieroff=1), which would be the most obvious culprits.

@aclements
Copy link
Member

I understand the problem. Our heap bitmap unroller incorrectly lays out the heap bitmap for repetitions of a type with pointers at the beginning and a large enough scalar tail, which is exactly what the backing store for []C is. I've got a fix. I'm working on figuring out if there's a better fix and I'll put together a CL.

@mikioh mikioh modified the milestones: Go1.5, Go1.5Maybe Jun 23, 2015
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/11423 mentions this issue.

aclements added a commit that referenced this issue Jun 26, 2015
This adds a GC bitmap test of a type with many pointer bits and a
large scalar tail, such as the one in issue #11286. This test would
have failed prior to the fix in a8ae93f. This test is a more direct
version of the test introduced in that commit (which was distilled
from the failing test in the issue).

Change-Id: I2e716cd1000b49bde237f5da6d857e8983fe7e7a
Reviewed-on: https://go-review.googlesource.com/11423
Reviewed-by: Russ Cox <[email protected]>
@golang golang locked and limited conversation to collaborators Jun 25, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants