Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed [fix pending] #60812

Closed
cockroach-teamcity opened this issue Feb 19, 2021 · 12 comments · Fixed by #61094
Closed

roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed [fix pending] #60812

cockroach-teamcity opened this issue Feb 19, 2021 · 12 comments · Fixed by #61094
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).tpccbench/nodes=9/cpu=4/chaos/partition failed on master@83e70ce84b740e27e721c3b73c38a4b8b515094a:

		  |
		  | goroutine 7 [chan receive, 598 minutes]:
		  | runtime.gopark(0x515bc60, 0xc00010e118, 0x515170e, 0x2)
		  | 	/usr/local/go/src/runtime/proc.go:306 +0xe5 fp=0xc00009ae90 sp=0xc00009ae70 pc=0x48e245
		  | runtime.chanrecv(0xc00010e0c0, 0xc00009afc0, 0x8412101, 0x515c048)
		  | 	/usr/local/go/src/runtime/chan.go:577 +0x36f fp=0xc00009af20 sp=0xc00009ae90 pc=0x45a3cf
		  | runtime.chanrecv2(0xc00010e0c0, 0xc00009afc0, 0x1)
		  | 	/usr/local/go/src/runtime/chan.go:444 +0x2b fp=0xc00009af50 sp=0xc00009af20 pc=0x45a04b
		  | github.com/cockroachdb/cockroach/pkg/util/log.signalFlusher()
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log_flush.go:98 +0x12c fp=0xc00009afe0 sp=0xc00009af50 pc=0xf740ec
		  | runtime.goexit()
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc00009afe8 sp=0xc00009afe0 pc=0x4c4681
		  | created by github.com/cockroachdb/cockroach/pkg/util/log.init.5
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log_flush.go:42 +0x4d
		  |
		  | goroutine 18 [select, 598 minutes, locked to thread]:
		  | runtime.gopark(0x515bec8, 0x0, 0x1809, 0x1)
		  | 	/usr/local/go/src/runtime/proc.go:306 +0xe5 fp=0xc00008ee08 sp=0xc00008ede8 pc=0x48e245
		  | runtime.selectgo(0xc00008ef78, 0xc00008ef70, 0x2, 0x8, 0xc00010e001)
		  | 	/usr/local/go/src/runtime/select.go:338 +0xcef fp=0xc00008ef30 sp=0xc00008ee08 pc=0x49e3af
		  | runtime.ensureSigM.func1()
		  | 	/usr/local/go/src/runtime/signal_unix.go:897 +0x1fa fp=0xc00008efe0 sp=0xc00008ef30 pc=0x4bc9ba
		  | runtime.goexit()
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc00008efe8 sp=0xc00008efe0 pc=0x4c4681
		  | created by runtime.ensureSigM
		  | 	/usr/local/go/src/runtime/signal_unix.go:880 +0xd5
		  |
		  | goroutine 34 [syscall, 598 minutes]:
		  | runtime.notetsleepg(0x8463280, 0xffffffffffffffff, 0x0)
		  | 	/usr/local/go/src/runtime/lock_futex.go:235 +0x34 fp=0xc000088798 sp=0xc000088768 pc=0x460214
		  | os/signal.signal_recv(0x0)
		  | 	/usr/local/go/src/runtime/sigqueue.go:147 +0x9d fp=0xc0000887c0 sp=0xc000088798 pc=0x4c0bdd
		  | os/signal.loop()
		  | 	/usr/local/go/src/os/signal/signal_unix.go:23 +0x25 fp=0xc0000887e0 sp=0xc0000887c0 pc=0xf5ac45
		  | runtime.goexit()
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc0000887e8 sp=0xc0000887e0 pc=0x4c4681
		  | created by os/signal.Notify.func1.1
		  | 	/usr/local/go/src/os/signal/signal.go:150 +0x45
		  |
		  | goroutine 8 [GC worker (idle)]:
		  | runtime.gopark(0x515bd00, 0xc000580000, 0x1418, 0x0)
		  | 	/usr/local/go/src/runtime/proc.go:306 +0xe5 fp=0xc00008f760 sp=0xc00008f740 pc=0x48e245
		  |
		  | stdout:
		Wraps: (8) secondary error attachment
		  | signal: killed
		  | (1) signal: killed
		  | Error types: (1) *exec.ExitError
		Wraps: (9) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *main.withCommandDetails (8) *secondary.withSecondaryError (9) *errors.errorString

More

Artifacts: /tpccbench/nodes=9/cpu=4/chaos/partition
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Feb 19, 2021
@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/chaos/partition failed on master@64c4aef909f4382523cd9248341ca9f4448d841a:

		  | main.runTPCCBench.func3
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:894
		  | github.com/cockroachdb/cockroach/pkg/util/search.searchWithSearcher
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:43
		  | github.com/cockroachdb/cockroach/pkg/util/search.(*lineSearcher).Search
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:173
		  | main.runTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:803
		  | main.registerTPCCBenchSpec.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:617
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:767
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 4: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1667,context.go:140,cluster.go:1656,test_runner.go:848: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2699361-1613891087-74-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 10: skipped
		4: dead
		9: 10181
		3: 9783
		8: 10364
		2: 9181
		6: 9903
		5: 8783
		7: 8692
		1: 9821
		Error: UNCLASSIFIED_PROBLEM: 4: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1852
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 4: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /tpccbench/nodes=9/cpu=4/chaos/partition
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/chaos/partition failed on master@bf9744bad5a416a4b06907f0f3dd42896f7342f3:

		  | main.runTPCCBench.func3
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:894
		  | github.com/cockroachdb/cockroach/pkg/util/search.searchWithSearcher
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:43
		  | github.com/cockroachdb/cockroach/pkg/util/search.(*lineSearcher).Search
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:173
		  | main.runTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:803
		  | main.registerTPCCBenchSpec.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:617
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:767
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 2: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1667,context.go:140,cluster.go:1656,test_runner.go:848: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2702231-1613977007-79-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 10: skipped
		9: 8808
		3: 8495
		2: dead
		4: 9331
		8: 8284
		5: 9433
		6: 9904
		7: 8281
		1: 9648
		Error: UNCLASSIFIED_PROBLEM: 2: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1852
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 2: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /tpccbench/nodes=9/cpu=4/chaos/partition
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/chaos/partition failed on master@5cfd7e5553a3072a1490d392390dddf968844215:

		  | main.runTPCCBench.func3
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:894
		  | github.com/cockroachdb/cockroach/pkg/util/search.searchWithSearcher
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:43
		  | github.com/cockroachdb/cockroach/pkg/util/search.(*lineSearcher).Search
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:173
		  | main.runTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:803
		  | main.registerTPCCBenchSpec.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:617
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:767
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 9: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1667,context.go:140,cluster.go:1656,test_runner.go:848: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2707822-1614064242-75-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 10: skipped
		9: dead
		7: 9538
		2: 9223
		3: 9724
		8: 9498
		1: 9054
		4: 8869
		6: 9237
		5: 8502
		Error: UNCLASSIFIED_PROBLEM: 9: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1852
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 9: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /tpccbench/nodes=9/cpu=4/chaos/partition
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@jordanlewis
Copy link
Member

NPE in relocateOne, seemingly. cc @tbg

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1e0c45d]

goroutine 622648 [running]:
panic(0x43f4d40, 0x7a4d0a0)
	/usr/local/go/src/runtime/panic.go:1064 +0x545 fp=0xc01625f7c0 sp=0xc01625f6f8 pc=0x48b205
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc0009d2e00, 0x58b9340, 0xc0372da420)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:233 +0x126 fp=0xc01625f820 sp=0xc01625f7c0 pc=0x14d2866
runtime.call32(0x0, 0x518b370, 0xc0270b3c08, 0x1800000018)
	/usr/local/go/src/runtime/asm_amd64.s:540 +0x3e fp=0xc01625f850 sp=0xc01625f820 pc=0x4c2c1e
runtime.reflectcallSave(0xc01625f990, 0x518b370, 0xc0270b3c08, 0x18)
	/usr/local/go/src/runtime/panic.go:881 +0x58 fp=0xc01625f880 sp=0xc01625f850 pc=0x48abf8
runtime.runOpenDeferFrame(0xc0103bf980, 0xc0270b3bc0, 0x0)
	/usr/local/go/src/runtime/panic.go:855 +0x2cd fp=0xc01625f910 sp=0xc01625f880 pc=0x48aaad
panic(0x43f4d40, 0x7a4d0a0)
	/usr/local/go/src/runtime/panic.go:969 +0x1b9 fp=0xc01625f9d8 sp=0xc01625f910 pc=0x48ae79
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc0009d2e00, 0x58b9340, 0xc0372daf30)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:233 +0x126 fp=0xc01625fa38 sp=0xc01625f9d8 pc=0x14d2866
runtime.call32(0x0, 0x518b370, 0xc0270b3c08, 0x1800000018)
	/usr/local/go/src/runtime/asm_amd64.s:540 +0x3e fp=0xc01625fa68 sp=0xc01625fa38 pc=0x4c2c1e
runtime.reflectcallSave(0xc01625fba8, 0x518b370, 0xc0270b3c08, 0xc000000018)
	/usr/local/go/src/runtime/panic.go:881 +0x58 fp=0xc01625fa98 sp=0xc01625fa68 pc=0x48abf8
runtime.runOpenDeferFrame(0xc0103bf980, 0xc0270b3bc0, 0x0)
	/usr/local/go/src/runtime/panic.go:855 +0x2cd fp=0xc01625fb28 sp=0xc01625fa98 pc=0x48aaad
panic(0x43f4d40, 0x7a4d0a0)
	/usr/local/go/src/runtime/panic.go:969 +0x1b9 fp=0xc01625fbf0 sp=0xc01625fb28 pc=0x48ae79
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).Send.func1(0xc016261a68, 0xc016261b08, 0xc00162ce00, 0xc016261b00)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_send.go:100 +0x247 fp=0xc01625fc18 sp=0xc01625fbf0 pc=0x1ebf9e7
runtime.call32(0x0, 0x5186998, 0xc016261398, 0x2000000020)
	/usr/local/go/src/runtime/asm_amd64.s:540 +0x3e fp=0xc01625fc48 sp=0xc01625fc18 pc=0x4c2c1e
panic(0x43f4d40, 0x7a4d0a0)
	/usr/local/go/src/runtime/panic.go:975 +0x47a fp=0xc01625fd10 sp=0xc01625fc48 pc=0x48b13a
runtime.panicmem(...)
	/usr/local/go/src/runtime/panic.go:212
runtime.sigpanic()
	/usr/local/go/src/runtime/signal_unix.go:742 +0x413 fp=0xc01625fd40 sp=0xc01625fd10 pc=0x4a1f53
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).relocateOne(0xc00162ce00, 0x58b9340, 0xc0372daf90, 0xc02894cbd0, 0xc002e1df60, 0x3, 0x3, 0x84b94d0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:2929 +0x85d fp=0xc016260368 sp=0xc01625fd40 pc=0x1e0c45d
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).relocateReplicas(0xc00162ce00, 0x58b9340, 0xc0372daf90, 0x9d, 0xc0025df090, 0x8, 0x8, 0xc0025df098, 0x8, 0x8, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:2771 +0x50f fp=0xc016260698 sp=0xc016260368 pc=0x1e0b58f

@tbg
Copy link
Member

tbg commented Feb 23, 2021

@tbg
Copy link
Member

tbg commented Feb 23, 2021

... which it's allowed to do:

return nil, ""

@tbg
Copy link
Member

tbg commented Feb 23, 2021

@aayushshah15 it looks like you most recently touched most of the involved code. Could you take a look to see if this is just a matter of adding a nil check or indicative of something else?

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/chaos/partition failed on master@ec011620c7cf299fdbb898db692b36454defc4a2:

		  | main.runTPCCBench.func3
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:894
		  | github.com/cockroachdb/cockroach/pkg/util/search.searchWithSearcher
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:43
		  | github.com/cockroachdb/cockroach/pkg/util/search.(*lineSearcher).Search
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:173
		  | main.runTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:803
		  | main.registerTPCCBenchSpec.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:617
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:767
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 6: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1667,context.go:140,cluster.go:1656,test_runner.go:848: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2712399-1614149800-21-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 10: skipped
		6: dead
		2: 17751
		1: 20303
		8: 17387
		3: 18516
		7: 18160
		4: 17680
		5: 17445
		9: 17568
		Error: UNCLASSIFIED_PROBLEM: 6: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1852
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 6: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /tpccbench/nodes=9/cpu=4/chaos/partition
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@nvanbenschoten
Copy link
Member

Same stack trace in this failure.

@aayushshah15
Copy link
Contributor

Could you take a look to see if this is just a matter of adding a nil check or indicative of something else?

Looks like we've always been ignoring to check if that returned targetStore is nil, so #59403 has likely changed some behavior that shouldn't have changed. I'll dig into this later this week.

@aayushshah15
Copy link
Contributor

Oh, looks like I was dreaming. Indeed, it looks like I removed a nil check that should've been there. Fixing now.

aayushshah15 added a commit to aayushshah15/cockroach that referenced this issue Feb 24, 2021
e924d91 introduced a bug by spuriously
removing a nil check over the result of a call to
`allocateTargetFromList`. This commit re-adds the check.

The bug could cause a panic when `AdminRelocateRange` was called by the
`StoreRebalancer` or the `mergeQueue` if one (or more) of the stores
that are supposed to receive a replica for a range become unfit for
receiving the replica (due to balancing reasons / or shifting
constraints) _between when rebalancing decision is made and when it's
executed_.

Resolves cockroachdb#60812

Release justification: fixes bug that causes a panic
Release note: None
@tbg tbg changed the title roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed [fix pending] Feb 25, 2021
@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/chaos/partition failed on master@c7e088826bc079620dfd3b5ae75d1c15cd9cd16d:

		  | main.runTPCCBench.func3
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:894
		  | github.com/cockroachdb/cockroach/pkg/util/search.searchWithSearcher
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:43
		  | github.com/cockroachdb/cockroach/pkg/util/search.(*lineSearcher).Search
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:173
		  | main.runTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:803
		  | main.registerTPCCBenchSpec.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:617
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:767
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 9: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1667,context.go:140,cluster.go:1656,test_runner.go:848: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2716822-1614236552-21-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 10: skipped
		9: dead
		4: 22075
		3: 20835
		8: 20970
		7: 21508
		2: 20310
		6: 20543
		1: 23015
		5: 20970
		Error: UNCLASSIFIED_PROBLEM: 9: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1852
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 9: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /tpccbench/nodes=9/cpu=4/chaos/partition

See this test on roachdash
powered by pkg/cmd/internal/issues

craig bot pushed a commit that referenced this issue Feb 25, 2021
57183: importccl,sql: support importing non-public schemas from pgdump r=adityamaru a=adityamaru

Previously, a PGDUMP import did not support any non-public schema
statements. Now that CRDB has user defined schemas, bundle format
IMPORTs need to be taught how to parse, create and cleanup schema
related PGDUMP operations.

Note this PR only adds support for `CREATE SCHEMA` and usage of the
schema in `CREATE TABLE/SEQUENCE` PGDUMP statements. `ALTER SCHEMA`
statements are ignored and support might be added in a follow up.

Release justification: low risk, high reward. Import PGDUMP does not support user
defined schemas.

Release note (sql change): IMPORT PGDUMP can now import dump files with
non-public schemas.

60922: sql,cli: add payloads_for_trace builtin r=knz a=angelapwen

Possibly the final step to #55733. Resolves #58608 😄 

Previously it was quite cumbersome to view all payloads for a given
trace: we needed to join on the `node_inflight_trace_spans` vtable
to filter for span IDs that match a trace ID, then apply the
`payloads_for_span()` builtin to each span ID. This patch adds
syntactic sugar to the above query.

Instead of

```
WITH spans AS (
  SELECT span_id
  FROM crdb_internal.node_inflight_trace_spans
  WHERE trace_id = $TRACE_ID)
) SELECT *
  FROM spans, LATERAL crdb_internal.payloads_for_span(spans.span_id);
```

we can now simply use:
```
crdb_internal.payloads_for_trace($TRACE_ID);
```

and achieve the same result. The patch also adds all payloads for all
long-running spans to the `crdb_internal.node_inflight_trace_spans`
table of the debug.zip file.

Release note (sql change): Add `payloads_for_trace()` builtin so that
all payloads attached to all spans for a given trace ID will be
displayed, utilizing the `crdb_internal.payloads_for_span()`
builtin under the hood. All payloads for long-running spans are also
added to debug.zip in the `crdb_internal.node_inflight_trace_spans`
table dump.

Co-authored-by: Tobias Grieger <[email protected]>

Release justification: This patch is safe for release because it
adds syntactic sugar to an internal observability feature.

61094: kvserver: re-add spuriously removed nil check in `relocateOne` r=aayushshah15 a=aayushshah15

bce8317 introduced a bug by spuriously
removing a nil check over the result of a call to
`allocateTargetFromList`. This commit re-adds the check.

The bug could cause a panic when `AdminRelocateRange` was called by the
`StoreRebalancer` or the `mergeQueue` if one (or more) of the stores
that are supposed to receive a replica for a range become unfit for
receiving the replica (due to balancing reasons / or shifting
constraints) _between when rebalancing decision is made and when it's
executed_.

Resolves #60812

Release justification: fixes bug that causes a panic
Release note: None

61097: opt: use computed columns to improve FDs and remove uniqueness checks r=rytaft a=rytaft

**opt: use computed columns to build functional dependencies**

This commit updates `MakeTableFuncDep` so that it adds equivalencies
or synthesized columns to the table FDs for each of the computed
columns available in the metadata. This will be necessary to support
removing uniqueness checks in some cases in a future commit.

Release justification: This commit is a low risk, high benefit change
to existing functionality.

Release note (performance improvement): The optimizer now infers
additional functional dependencies based on computed columns in tables.
This may enable additional optimizations and lead to better query plans.

**opt: remove uniqueness checks when uniqueness inferred through FDs**

This commit removes uniqueness checks for columns that can be
inferred to be unique through functional dependencies. This is
relevant in particular for `REGIONAL BY ROW` tables with a computed
region column that depends on the primary key. In this case,
uniqueness checks are never needed on the primary key, since
uniqueness is already guaranteed by the primary index.

Fixes #57720

Release justification: This commit is a low-risk, high benefit
update to new functionality.

Release note (performance improvement): Removed uniqueness checks
on the primary key for REGIONAL BY ROW tables with a computed
region column that is a function of the primary key columns.
Uniqueness checks are not necessary in this case since uniqueness
can be suitably guaranteed by the primary index. Removing these
checks improves performance of INSERT, UPDATE, and UPSERT
statements.

61100: diagnostics: lock while populating hardware information r=andy-kimball a=andy-kimball

The shirou/gopsutil/host library that we use to gather hardware information
during diagnostics reporting is not multi-thread safe. As one example, it
lazily initializes a global map the first time the Virtualization function
is called, but takes no lock while doing so. Work around this limitation by
taking our own lock.

This code never triggered race conditions before, but is doing so after recent
changes I made to the diagnostics reporting code. Previously, we were using a
single background goroutine to do both diagnostics reporting and checking for
updates. Now we are doing each of those on different goroutines, which triggers
race detection.

Fixes #61091

Release justification: fixes for high-priority or high-severity bugs in existing
functionality
Release note: None

61105: builtins: implement ST_MakePoint and ST_MakePointM r=otan,rafiss a=andyyang890

This patch implements the geometry builtins `ST_MakePoint`
and `ST_MakePointM`.

Resolves #60857.
Resolves #60858.
Resolves #60859.

Release justification: low-risk update to new functionality
Release note (sql change): The geometry builtins `ST_MakePoint`
and `ST_MakePointM` have been implemented and provide a mechanism
for easily creating new points.

Co-authored-by: Aditya Maru <[email protected]>
Co-authored-by: angelapwen <[email protected]>
Co-authored-by: Aayush Shah <[email protected]>
Co-authored-by: Rebecca Taft <[email protected]>
Co-authored-by: Andrew Kimball <[email protected]>
Co-authored-by: Andy Yang <[email protected]>
@craig craig bot closed this as completed in 72be631 Feb 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants