Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cockroachdb hit SIGSEGV in Go runtime during test run #1144

Closed
davepacheco opened this issue Jun 1, 2022 · 3 comments
Closed

cockroachdb hit SIGSEGV in Go runtime during test run #1144

davepacheco opened this issue Jun 1, 2022 · 3 comments
Labels
Test Flake Tests that work. Wait, no. Actually yes. Hang on. Something is broken.

Comments

@davepacheco
Copy link
Collaborator

davepacheco commented Jun 1, 2022

While trying to reproduce #1130, I ran into a new problem where CockroachDB appears to have hit a SIGSEGV in golang.

@davepacheco
Copy link
Collaborator Author

I was running this:

#!/bin/bash

TMPDIR=$TMPDIR/try_repro.$$
mkdir -p $TMPDIR

cd nexus
for ((i = 0; i >= 0; i++)) {
	echo ATTEMPT $i;
	../target/debug/deps/test_all-d586ea57740e3382 \
	    test_disk_create_disk_that_already_exists_fails || break
}

like this:

nohup ./try_repro.sh > try_repro2.out 2>&1 &

It ultimately failed like this:

ATTEMPT 7507

running 1 test
test integration_tests::disks::test_disk_create_disk_that_already_exists_fails ... FAILED

failures:

---- integration_tests::disks::test_disk_create_disk_that_already_exists_fails stdout ----
log file: "/dangerzone/omicron_tmp/try_repro.18532/test_all-d586ea57740e3382-test_disk_create_disk_that_already_exists_fails.18947.0.log"
note: configured to log to "/dangerzone/omicron_tmp/try_repro.18532/test_all-d586ea57740e3382-test_disk_create_disk_that_already_exists_fails.18947.0.log"
thread 'integration_tests::disks::test_disk_create_disk_that_already_exists_fails' panicked at 'called `Result::unwrap()` on an `Err` value: Exited', /home/dap/omicron/test-utils/src/dev/mod.rs:141:42
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    integration_tests::disks::test_disk_create_disk_that_already_exists_fails

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 74 filtered out; finished in 0.31s

That panic is pretty early in the test setup process. We were waiting for CockroachDB to start, but it exited instead. The entire test log file consists of:

[2022-06-01T06:08:00.749907538Z]  INFO: test_disk_create_disk_that_already_exists_fails/18947 on ivanova: cockroach temporary directory: /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P
[2022-06-01T06:08:00.750287178Z]  INFO: test_disk_create_disk_that_already_exists_fails/18947 on ivanova: cockroach: copying from seed directory (/home/dap/omicron/target/debug/build/nexus-test-utils-308521ed0d0eed98/out/crdb-base) to storage directory (/dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data)
[2022-06-01T06:08:00.759684722Z]  INFO: test_disk_create_disk_that_already_exists_fails/18947 on ivanova: cockroach command line: cockroach start-single-node --insecure --http-addr=:0 --store=path=/dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data,ballast-size=0 --listen-addr 127.0.0.1:0 --listening-url-file /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/listen-url

Here are all the files in the CockroachDB directory, sorted in ascending order of modification time (most recently modified last):

$ find /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P -type f | xargs ls -lrt
-rw-r-----   1 dap      staff       2271 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/OPTIONS-000003
-rw-r-----   1 dap      staff          8 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/STORAGE_MIN_VERSION
-rw-r-----   1 dap      staff          0 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/marker.format-version.000003.004
-rw-r-----   1 dap      staff         15 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/cockroach.advertise-addr
-rw-r-----   1 dap      staff         16 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/CURRENT
-rw-r-----   1 dap      staff         15 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/cockroach.sql-addr
-rw-r-----   1 dap      staff       2393 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach-security.ivanova.dap.2022-05-27T21_22_09Z.027878.log
-rw-r-----   1 dap      staff     235792 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach.log
-rw-r-----   1 dap      staff       2823 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach-pebble.ivanova.dap.2022-05-27T21_22_08Z.027878.log
-rw-r-----   1 dap      staff       2376 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach-health.ivanova.dap.2022-05-27T21_22_09Z.027878.log
-rw-r-----   1 dap      staff       1147 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach-stderr.ivanova.dap.2022-05-27T21_22_07Z.027878.log
-rw-r-----   1 dap      staff       2393 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach-security.log
-rw-r-----   1 dap      staff     235792 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach.ivanova.dap.2022-05-27T21_22_07Z.027878.log
-rw-r-----   1 dap      staff      46720 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach-sql-schema.ivanova.dap.2022-05-27T21_22_08Z.027878.log
-rw-r-----   1 dap      staff       1147 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach-stderr.log
-rw-r-----   1 dap      staff       2823 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach-pebble.log
-rw-r-----   1 dap      staff     103773 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/heap_profiler/memprof.2022-05-27T21_22_18.339.34793904.pprof
-rw-r-----   1 dap      staff       2376 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach-health.log
-rw-r-----   1 dap      staff      46720 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach-sql-schema.log
-rw-r-----   1 dap      staff    1756535 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/000006.log
-rw-r-----   1 dap      staff          0 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/LOCK
-rw-r-----   1 dap      staff          0 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/temp-dirs-record.txt
-rw-r--r--   1 dap      staff          0 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/cockroachdb_stdout
-rw-r-----   1 dap      staff         15 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/cockroach.listen-addr
-rw-r-----   1 dap      staff     448388 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/000004.log
-rw-r-----   1 dap      staff     879384 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/000005.log
-rw-r-----   1 dap      staff          0 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/marker.manifest.000001.MANIFEST-000001
-rw-r-----   1 dap      staff     227063 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/000002.log
-rw-r-----   1 dap      staff         15 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/cockroach.http-addr
-rw-r-----   1 dap      staff         15 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/cockroach.advertise-sql-addr
-rw-r-----   1 dap      staff    1404754 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/000007.log
-rw-r-----   1 dap      staff         44 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/MANIFEST-000001
-rw-r--r--   1 dap      staff      10999 May 31 23:08 /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/cockroachdb_stderr

Let's take a look at the CockroachDB stderr:

dap@ivanova omicron $ cat /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/cockroachdb_stderr 
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0xb1 pc=0x1091550]

runtime stack:
runtime.throw(0x5a6fcb6, 0x2a)
	/opt/ooce/go-1.16/src/runtime/panic.go:1117 +0x72
runtime.sigpanic()
	/opt/ooce/go-1.16/src/runtime/signal_unix.go:718 +0x2ef
runtime.gcDrain(0xc000081698, 0x2)
	/opt/ooce/go-1.16/src/runtime/mgcmark.go:1023 +0x1b0
runtime.gcBgMarkWorker.func2()
	/opt/ooce/go-1.16/src/runtime/mgc.go:1999 +0x12d
runtime.systemstack(0xfffffc7fed42b140)
	/opt/ooce/go-1.16/src/runtime/asm_amd64.s:379 +0x73
runtime.mstart()
	/opt/ooce/go-1.16/src/runtime/proc.go:1246

goroutine 1 [runnable, locked to thread]:
bufio.(*Writer).WriteByte(0xc00191f700, 0x59ded0a, 0xf, 0xf)
	/opt/ooce/go-1.16/src/bufio/bufio.go:658 +0xc5
encoding/csv.(*Writer).Write(0xc00191f6f0, 0xc000a91ed0, 0x1, 0x1, 0x0, 0x200)
	/opt/ooce/go-1.16/src/encoding/csv/writer.go:116 +0x40d
github.com/spf13/pflag.writeAsCSV(0xc000a91ed0, 0x1, 0x1, 0x8, 0x10, 0xfffffc7fef1df5b8, 0x10)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/pflag/string_slice.go:34 +0x125
github.com/spf13/pflag.(*stringSliceValue).String(0xc00039c000, 0x107e858, 0x10)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/pflag/string_slice.go:61 +0x49
github.com/spf13/pflag.(*FlagSet).VarPF(0xc0010e4700, 0x6fb5c40, 0xc00039c000, 0x59c970d, 0x8, 0x0, 0x0, 0xc0019864d0, 0x6d, 0xc00198c0a0)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/pflag/flag.go:829 +0x35
github.com/spf13/pflag.(*FlagSet).VarP(...)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/pflag/flag.go:837
github.com/spf13/pflag.(*FlagSet).StringSliceVar(0xc0010e4700, 0x9568f90, 0x59c970d, 0x8, 0xc000a91ed0, 0x1, 0x1, 0xc0019864d0, 0x6d)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/pflag/string_slice.go:105 +0xcf
github.com/cockroachdb/cockroach/pkg/cli.stringSliceFlag(0xc0010e4700, 0x9568f90, 0x59c970d, 0x8, 0x0, 0x0, 0x59ebb77, 0x12, 0x5aecbed, 0x40)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/flags.go:144 +0xb0
github.com/cockroachdb/cockroach/pkg/cli.init.8()
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/cli/flags.go:956 +0x5dc5

goroutine 36 [chan receive]:
github.com/klauspost/compress/zstd.(*blockDec).startDecoder(0xc00019dd40)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:215 +0x149
created by github.com/klauspost/compress/zstd.newBlockDec
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:118 +0x173

goroutine 37 [chan receive]:
github.com/klauspost/compress/zstd.(*blockDec).startDecoder(0xc00019de10)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:215 +0x149
created by github.com/klauspost/compress/zstd.newBlockDec
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:118 +0x173

goroutine 38 [chan receive]:
github.com/klauspost/compress/zstd.(*blockDec).startDecoder(0xc00019dee0)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:215 +0x149
created by github.com/klauspost/compress/zstd.newBlockDec
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:118 +0x173

goroutine 39 [chan receive]:
github.com/klauspost/compress/zstd.(*blockDec).startDecoder(0xc000484000)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:215 +0x149
created by github.com/klauspost/compress/zstd.newBlockDec
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:118 +0x173

goroutine 40 [chan receive]:
github.com/klauspost/compress/zstd.(*blockDec).startDecoder(0xc0004840d0)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:215 +0x149
created by github.com/klauspost/compress/zstd.newBlockDec
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:118 +0x173

goroutine 41 [chan receive]:
github.com/klauspost/compress/zstd.(*blockDec).startDecoder(0xc0004841a0)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:215 +0x149
created by github.com/klauspost/compress/zstd.newBlockDec
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:118 +0x173

goroutine 42 [chan receive]:
github.com/klauspost/compress/zstd.(*blockDec).startDecoder(0xc000484270)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:215 +0x149
created by github.com/klauspost/compress/zstd.newBlockDec
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:118 +0x173

goroutine 43 [chan receive]:
github.com/klauspost/compress/zstd.(*blockDec).startDecoder(0xc000484340)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:215 +0x149
created by github.com/klauspost/compress/zstd.newBlockDec
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:118 +0x173

goroutine 44 [chan receive]:
github.com/klauspost/compress/zstd.(*blockDec).startDecoder(0xc000484410)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:215 +0x149
created by github.com/klauspost/compress/zstd.newBlockDec
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:118 +0x173

goroutine 45 [chan receive]:
github.com/klauspost/compress/zstd.(*blockDec).startDecoder(0xc0004844e0)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:215 +0x149
created by github.com/klauspost/compress/zstd.newBlockDec
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:118 +0x173

goroutine 46 [chan receive]:
github.com/klauspost/compress/zstd.(*blockDec).startDecoder(0xc0004845b0)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:215 +0x149
created by github.com/klauspost/compress/zstd.newBlockDec
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:118 +0x173

goroutine 47 [chan receive]:
github.com/klauspost/compress/zstd.(*blockDec).startDecoder(0xc000484680)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:215 +0x149
created by github.com/klauspost/compress/zstd.newBlockDec
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:118 +0x173

goroutine 48 [chan receive]:
github.com/klauspost/compress/zstd.(*blockDec).startDecoder(0xc000484750)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:215 +0x149
created by github.com/klauspost/compress/zstd.newBlockDec
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:118 +0x173

goroutine 49 [chan receive]:
github.com/klauspost/compress/zstd.(*blockDec).startDecoder(0xc000484820)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:215 +0x149
created by github.com/klauspost/compress/zstd.newBlockDec
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:118 +0x173

goroutine 50 [chan receive]:
github.com/klauspost/compress/zstd.(*blockDec).startDecoder(0xc0004848f0)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:215 +0x149
created by github.com/klauspost/compress/zstd.newBlockDec
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:118 +0x173

goroutine 51 [chan receive]:
github.com/klauspost/compress/zstd.(*blockDec).startDecoder(0xc0004849c0)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:215 +0x149
created by github.com/klauspost/compress/zstd.newBlockDec
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/github.com/klauspost/compress/zstd/blockdec.go:118 +0x173

goroutine 52 [chan receive]:
github.com/cockroachdb/cockroach/pkg/util/log.flushDaemon()
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/log_flush.go:75 +0x74
created by github.com/cockroachdb/cockroach/pkg/util/log.init.5
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/log_flush.go:41 +0x35

goroutine 53 [chan receive]:
github.com/cockroachdb/cockroach/pkg/util/log.signalFlusher()
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/log_flush.go:98 +0x12c
created by github.com/cockroachdb/cockroach/pkg/util/log.init.5
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/log/log_flush.go:42 +0x4d

goroutine 55 [syscall]:
os/signal.signal_recv(0x0)
	/opt/ooce/go-1.16/src/runtime/sigqueue.go:168 +0xa5
os/signal.loop()
	/opt/ooce/go-1.16/src/os/signal/signal_unix.go:23 +0x25
created by os/signal.Notify.func1.1
	/opt/ooce/go-1.16/src/os/signal/signal.go:151 +0x45

goroutine 11 [chan receive]:
github.com/cockroachdb/cockroach/pkg/util/goschedstats.init.0.func1()
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/goschedstats/runnable.go:165 +0x16b
created by github.com/cockroachdb/cockroach/pkg/util/goschedstats.init.0
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/pkg/util/goschedstats/runnable.go:157 +0x35

goroutine 69 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc001096000)
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/go.opencensus.io/stats/view/worker.go:276 +0xcd
created by go.opencensus.io/stats/view.init.0
	/ws/gc/cockroach/cache/gopath/src/github.com/cockroachdb/cockroach/vendor/go.opencensus.io/stats/view/worker.go:34 +0x68

I believe this is the key piece of output from that:

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0xb1 pc=0x1091550]

runtime stack:
runtime.throw(0x5a6fcb6, 0x2a)
	/opt/ooce/go-1.16/src/runtime/panic.go:1117 +0x72
runtime.sigpanic()
	/opt/ooce/go-1.16/src/runtime/signal_unix.go:718 +0x2ef
runtime.gcDrain(0xc000081698, 0x2)
	/opt/ooce/go-1.16/src/runtime/mgcmark.go:1023 +0x1b0
runtime.gcBgMarkWorker.func2()
	/opt/ooce/go-1.16/src/runtime/mgc.go:1999 +0x12d
runtime.systemstack(0xfffffc7fed42b140)
	/opt/ooce/go-1.16/src/runtime/asm_amd64.s:379 +0x73
runtime.mstart()
	/opt/ooce/go-1.16/src/runtime/proc.go:1246

It seems that we hit a SIGSEGV inside the Go garbage collector? Unfortunately, it looks like it exits the process rather than allowing SIGSEGV to cause the process to dump core, so we don't have a lot to go on. I didn't find any core files.

I've created a tarball with what seems like all the information I have, including the CockroachDB directory and the log file:

$ tar cjvf issue-1144.tgz /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P /dangerzone/omicron_tmp/try_repro.18532/test_all-d586ea57740e3382-test_disk_create_disk_that_already_exists_fails.18947.0.log
Compressing 'issue-1144.tgz' with '/usr/bin/bzip2'...
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/ 0K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/ 0K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/cockroach.sql-addr 1K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/OPTIONS-000003 3K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/ 0K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach.log 231K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach-security.ivanova.dap.2022-05-27T21_22_09Z.027878.log 3K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach-health.log 3K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach-sql-schema.log 46K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/inflight_trace_dump/ 0K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/goroutine_dump/ 0K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach-pebble.ivanova.dap.2022-05-27T21_22_08Z.027878.log 3K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach-pebble.log 3K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach-stderr.log 2K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach-security.log 3K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach-stderr.ivanova.dap.2022-05-27T21_22_07Z.027878.log 2K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach-health.ivanova.dap.2022-05-27T21_22_09Z.027878.log 3K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach-sql-schema.ivanova.dap.2022-05-27T21_22_08Z.027878.log 46K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/cockroach.ivanova.dap.2022-05-27T21_22_07Z.027878.log 231K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/heap_profiler/ 0K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs/heap_profiler/memprof.2022-05-27T21_22_18.339.34793904.pprof 102K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/000006.log 1716K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/marker.format-version.000003.004 0K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/CURRENT 1K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/cockroach.advertise-addr 1K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/STORAGE_MIN_VERSION 1K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/LOCK 0K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/temp-dirs-record.txt 0K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/marker.manifest.000001.MANIFEST-000001 0K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/auxiliary/ 0K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/000007.log 1372K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/000004.log 438K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/000002.log 222K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/000005.log 859K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/MANIFEST-000001 1K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/cockroach.listen-addr 1K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/cockroach.http-addr 1K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/cockroach.advertise-sql-addr 1K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/cockroachdb_stderr 11K
a /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/cockroachdb_stdout 0K
a /dangerzone/omicron_tmp/try_repro.18532/test_all-d586ea57740e3382-test_disk_create_disk_that_already_exists_fails.18947.0.log 2K

I've attached it to this issue (it's a tarball inside a ZIP file because GitHub doesn't support attaching tarballs directly):
issue-1144.tgz.zip.

I would not expect this to be very reproducible, but just to check, I reran the CockroachDB command line reported in the log, which would use the exact same parameters and storage directory. It did not crash:

$ cockroach start-single-node --insecure --http-addr=:0 --store=path=/dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data,ballast-size=0 --listen-addr 127.0.0.1:0 --listening-url-file /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/listen-url
*
* WARNING: ALL SECURITY CONTROLS HAVE BEEN DISABLED!
* 
* This mode is intended for non-production testing only.
* 
* In this mode:
* - Your cluster is open to any client that can access 127.0.0.1.
* - Intruders with access to your machine or network can observe client-server traffic.
* - Intruders can log in without password and read or write any data in the cluster.
* - Intruders can consume all your server's resources and cause unavailability.
*
*
* INFO: To start a secure server without mandating TLS for clients,
* consider --accept-sql-without-tls instead. For other options, see:
* 
* - https://go.crdb.dev/issue-v/53404/v21.2
* - https://www.cockroachlabs.com/docs/v21.2/secure-a-cluster.html
*
CockroachDB node starting at 2022-06-01 18:23:38.276374802 +0000 UTC (took 0.1s)
build:               OSS v21.2.9 @ 2022/04/28 04:02:42 (go1.16.10)
webui:               http://127.0.0.1:48965
sql:                 postgresql://[email protected]:49190/defaultdb?sslmode=disable
sql (JDBC):          jdbc:postgresql://127.0.0.1:49190/defaultdb?sslmode=disable&user=root
RPC client flags:    cockroach <client cmd> --host=127.0.0.1:49190 --insecure
logs:                /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/logs
temp dir:            /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/cockroach-temp051753485
external I/O path:   /dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data/extern
store[0]:            path=/dangerzone/omicron_tmp/try_repro.18532/.tmpVZ8h8P/data
storage engine:      pebble
status:              restarted pre-existing node
clusterID:           4c142fad-dcf8-4bbe-a07b-0348ea3fd2ba
nodeID:              1

I'm probably not going to dig deeper into this any time soon.

@davepacheco davepacheco changed the title SIGSEGV raised in cockroachdb during test run cockroachdb hit SIGSEGV in Go runtime during test run Jun 2, 2022
@davepacheco davepacheco added the Test Flake Tests that work. Wait, no. Actually yes. Hang on. Something is broken. label Jul 15, 2022
@davepacheco
Copy link
Collaborator Author

This seems quite possibly a result of the same issue that caused #1146 but I don't see how we'll ever know that unless we hit the same issue again with core dumps enabled.

@davepacheco
Copy link
Collaborator Author

Closing this since we haven't seen it since that issue was resolved. We can reopen if we start seeing it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Test Flake Tests that work. Wait, no. Actually yes. Hang on. Something is broken.
Projects
None yet
Development

No branches or pull requests

1 participant