Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: tpccbench/nodes=3/cpu=16 failed #39013

Closed
cockroach-teamcity opened this issue Jul 21, 2019 · 21 comments · Fixed by #40910
Closed

roachtest: tpccbench/nodes=3/cpu=16 failed #39013

cockroach-teamcity opened this issue Jul 21, 2019 · 21 comments · Fixed by #40910
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/7dab0dcfd37c389af357c302c073b9611b5ada25

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=3/cpu=16 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1398203&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190721-1398203/tpccbench/nodes=3/cpu=16/run_1
	cluster.go:2090,tpcc.go:842,tpcc.go:559,test_runner.go:691: unexpected node event: 3: dead

@cockroach-teamcity cockroach-teamcity added this to the 19.2 milestone Jul 21, 2019
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels Jul 21, 2019
@nvanbenschoten
Copy link
Member

Node 3 was killed by the OOM killer, which is also what we see in #37163. I'm running three versions of that roachtest now to see if anything obvious jumps out.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/1ad0ecc8cbddf82c9fedb5a5c5e533e72a657ff7

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=3/cpu=16 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1399004&tab=buildLog

The test failed on branch=master, cloud=aws:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190722-1399004/tpccbench/nodes=3/cpu=16/run_1
	test_runner.go:706: test timed out (10h0m0s)
	cluster.go:2090,tpcc.go:842,tpcc.go:559,test_runner.go:691: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1563776259-07-n4cpu16:4 -- ./cockroach workload fixtures import tpcc --warehouses=2500 --split --scatter --checks=false {pgurl:1} returned:
		stderr:
		
		stdout:
		ows, 0 index entries, took 7.790861805s, 0.02 MiB/s)
		I190722 08:14:54.571990 66 ccl/workloadccl/fixture.go:396  imported 2.5 MiB in district table (25000 rows, 0 index entries, took 33.2477079s, 0.07 MiB/s)
		I190722 08:14:54.572801 71 ccl/workloadccl/fixture.go:396  imported 7.8 MiB in item table (100000 rows, 0 index entries, took 33.248559976s, 0.23 MiB/s)
		I190722 08:18:58.829750 70 ccl/workloadccl/fixture.go:396  imported 319 MiB in new_order table (22500000 rows, 0 index entries, took 4m37.50548591s, 1.15 MiB/s)
		I190722 08:26:12.959946 69 ccl/workloadccl/fixture.go:396  imported 3.3 GiB in order table (75000000 rows, 75000000 index entries, took 11m51.635709897s, 4.80 MiB/s)
		I190722 08:32:29.270275 68 ccl/workloadccl/fixture.go:396  imported 11 GiB in history table (75000000 rows, 150000000 index entries, took 18m7.945958194s, 10.05 MiB/s)
		I190722 08:38:58.782045 67 ccl/workloadccl/fixture.go:396  imported 43 GiB in customer table (75000000 rows, 75000000 index entries, took 24m37.457791382s, 29.90 MiB/s)
		: signal: killed

@nvanbenschoten
Copy link
Member

This last failure was #39022.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/65055d6c16bf9386d8c4f4f9cd23e0a848814dc9

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=3/cpu=16 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1411161&tab=buildLog

The test failed on branch=master, cloud=aws:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190730-1411161/tpccbench/nodes=3/cpu=16/run_1
	test_runner.go:706: test timed out (10h0m0s)
	cluster.go:2090,tpcc.go:842,tpcc.go:559,test_runner.go:691: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1564466983-07-n4cpu16:4 -- ./cockroach workload fixtures import tpcc --warehouses=2500 --split --scatter --checks=false {pgurl:1} returned:
		stderr:
		
		stdout:
		s, 0 index entries, took 9.882970263s, 0.01 MiB/s)
		I190730 08:09:46.915165 60 ccl/workloadccl/fixture.go:432  imported 7.8 MiB in item table (100000 rows, 0 index entries, took 30.621113561s, 0.25 MiB/s)
		I190730 08:09:50.250534 55 ccl/workloadccl/fixture.go:432  imported 2.5 MiB in district table (25000 rows, 0 index entries, took 33.956620368s, 0.07 MiB/s)
		I190730 08:14:37.840342 59 ccl/workloadccl/fixture.go:432  imported 319 MiB in new_order table (22500000 rows, 0 index entries, took 5m21.54640979s, 0.99 MiB/s)
		I190730 08:20:16.805987 58 ccl/workloadccl/fixture.go:432  imported 3.3 GiB in order table (75000000 rows, 75000000 index entries, took 11m0.511910088s, 5.18 MiB/s)
		I190730 08:26:16.104629 57 ccl/workloadccl/fixture.go:432  imported 11 GiB in history table (75000000 rows, 150000000 index entries, took 16m59.810678023s, 10.72 MiB/s)
		I190730 08:31:12.225201 56 ccl/workloadccl/fixture.go:432  imported 43 GiB in customer table (75000000 rows, 75000000 index entries, took 21m55.931278667s, 33.57 MiB/s)
		: signal: killed

@nvanbenschoten
Copy link
Member

I190730 08:09:16.290250 1 ccl/workloadccl/fixture.go:316  starting import of 9 tables
I190730 08:09:26.176903 54 ccl/workloadccl/fixture.go:432  imported 133 KiB in warehouse table (2500 rows, 0 index entries, took 9.882970263s, 0.01 MiB/s)
I190730 08:09:46.915165 60 ccl/workloadccl/fixture.go:432  imported 7.8 MiB in item table (100000 rows, 0 index entries, took 30.621113561s, 0.25 MiB/s)
I190730 08:09:50.250534 55 ccl/workloadccl/fixture.go:432  imported 2.5 MiB in district table (25000 rows, 0 index entries, took 33.956620368s, 0.07 MiB/s)
I190730 08:14:37.840342 59 ccl/workloadccl/fixture.go:432  imported 319 MiB in new_order table (22500000 rows, 0 index entries, took 5m21.54640979s, 0.99 MiB/s)
I190730 08:20:16.805987 58 ccl/workloadccl/fixture.go:432  imported 3.3 GiB in order table (75000000 rows, 75000000 index entries, took 11m0.511910088s, 5.18 MiB/s)
I190730 08:26:16.104629 57 ccl/workloadccl/fixture.go:432  imported 11 GiB in history table (75000000 rows, 150000000 index entries, took 16m59.810678023s, 10.72 MiB/s)
I190730 08:31:12.225201 56 ccl/workloadccl/fixture.go:432  imported 43 GiB in customer table (75000000 rows, 75000000 index entries, took 21m55.931278667s, 33.57 MiB/s)
18:08:46 test.go:290: test failure: 	test_runner.go:706: test timed out (10h0m0s)

Very likely #39022.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/93860e69f96aa3a86bd8bb42f310fb2629d53f39

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=3/cpu=16 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1447036&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190821-1447036/tpccbench/nodes=3/cpu=16/run_1
	test_runner.go:688: test timed out (10h0m0s)

@nvanbenschoten
Copy link
Member

/home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190821-1447036/tpccbench/nodes=3/cpu=16/run_1/1.logs/goroutine_dump/goroutine_dump.2019-08-21T14_23_04.545.double_since_last_dump.000005468.txt.gz: No space left on device
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190821-1447036/tpccbench/nodes=3/cpu=16/run_1/1.logs/goroutine_dump/goroutine_dump.2019-08-21T12_40_02.154.double_since_last_dump.000002453.txt.gz: No space left on device
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190821-1447036/tpccbench/nodes=3/cpu=16/run_1/1.logs/goroutine_dump/goroutine_dump.2019-08-21T13_00_50.028.double_since_last_dump.000006178.txt.gz: No space left on device
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190821-1447036/tpccbench/nodes=3/cpu=16/run_1/1.logs/goroutine_dump/goroutine_dump.2019-08-21T14_12_54.201.double_since_last_dump.000002277.txt.gz: No space left on device
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190821-1447036/tpccbench/nodes=3/cpu=16/run_1/1.logs/goroutine_dump/goroutine_dump.2019-08-21T12_04_03.259.double_since_last_dump.000003631.txt.gz: No space left on device
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190821-1447036/tpccbench/nodes=3/cpu=16/run_1/1.logs/goroutine_dump/goroutine_dump.2019-08-21T13_41_55.242.double_since_last_dump.000002320.txt.gz: No space left on device
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190821-1447036/tpccbench/nodes=3/cpu=16/run_1/1.logs/goroutine_dump/goroutine_dump.2019-08-21T12_44_22.178.double_since_last_dump.000005148.txt.gz: No space left on device
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190821-1447036/tpccbench/nodes=3/cpu=16/run_1/1.logs/goroutine_dump/goroutine_dump.2019-08-21T13_33_56.167.double_since_last_dump.000005192.txt.gz: No space left on device
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190821-1447036/tpccbench/nodes=3/cpu=16/run_1/1.logs/goroutine_dump/goroutine_dump.2019-08-21T13_10_58.803.double_since_last_dump.000002370.txt.gz: No space left on device
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190821-1447036/tpccbench/nodes=3/cpu=16/run_1/1.logs/goroutine_dump/goroutine_dump.2019-08-21T13_47_05.272.double_since_last_dump.000004734.txt.gz: No space left on device
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190821-1447036/tpccbench/nodes=3/cpu=16/run_1/1.logs/goroutine_dump/goroutine_dump.2019-08-21T12_24_30.425.double_since_last_dump.000002463.txt.gz: No space left on device
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190821-1447036/tpccbench/nodes=3/cpu=16/run_1/1.logs/goroutine_dump/goroutine_dump.2019-08-21T14_02_12.575.double_since_last_dump.000004905.txt.gz: No space left on device

We seem to be hitting issues with the roachtest coordinator node running out of space. I think this is because we're not deleting the test logs from passing tests anymore. I opened https://github.com/scaledata/rksql/pull/4 to address the biggest offender of this test log bloat, but I think the real fix is to go back and delete the testing logs themselves.

@nvanbenschoten
Copy link
Member

I think #39810 is also the cause or at least a contributor to this problem.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/09acaf49a587d5a79f8ceab568247b0ba6e60fae

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=3/cpu=16 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1494403&tab=artifacts#/tpccbench/nodes=3/cpu=16

The test failed on branch=master, cloud=aws:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190918-1494403/tpccbench/nodes=3/cpu=16/run_1
	cluster.go:2114,tpcc.go:877,tpcc.go:588,test_runner.go:689: unexpected node event: 3: dead

@nvanbenschoten
Copy link
Member

Node 3 crashed due to a nil pointer dereference:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1f449b6]

goroutine 19990 [running]:
panic(0x3e77e40, 0x75fac40)
	/usr/local/go/src/runtime/panic.go:565 +0x2c5 fp=0xc00a7f8718 sp=0xc00a7f8688 pc=0x78d565
runtime.panicmem(...)
	/usr/local/go/src/runtime/panic.go:82
runtime.sigpanic()
	/usr/local/go/src/runtime/signal_unix.go:390 +0x411 fp=0xc00a7f8748 sp=0xc00a7f8718 pc=0x7a30c1
github.com/cockroachdb/cockroach/pkg/sql/row.(*Fetcher).GetRangesInfo(0xc01a12f680, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/fetcher.go:1414 +0x26 fp=0xc00a7f8778 sp=0xc00a7f8748 pc=0x1f449b6
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*tableReader).generateMeta(0xc00f54f100, 0x4ce7c80, 0xc01d541c40, 0xcb1971, 0x4105760, 0x42205c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/tablereader.go:273 +0x410 fp=0xc00a7f8950 sp=0xc00a7f8778 pc=0x2053610
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*tableReader).generateTrailingMeta(0xc00f54f100, 0x4ce7c80, 0xc01d541c40, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/tablereader.go:144 +0x43 fp=0xc00a7f89a8 sp=0xc00a7f8950 pc=0x2052323
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*tableReader).generateTrailingMeta-fm(0x4ce7c80, 0xc01d541c40, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/tablereader.go:143 +0x3e fp=0xc00a7f89e8 sp=0xc00a7f89a8 pc=0x20651ae
github.com/cockroachdb/cockroach/pkg/sql/execinfra.(*ProcessorBase).moveToTrailingMeta(0xc00f54f100)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:687 +0x136 fp=0xc00a7f8ae8 sp=0xc00a7f89e8 pc=0x2004576
github.com/cockroachdb/cockroach/pkg/sql/execinfra.(*ProcessorBase).MoveToDraining(0xc00f54f100, 0x4c71b40, 0xc013dd8560)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:589 +0x195 fp=0xc00a7f8be0 sp=0xc00a7f8ae8 pc=0x2003f75
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*tableReader).Start(0xc00f54f100, 0x4ce7c80, 0xc01d541c40, 0xc000e0ffa8, 0x1b1e28e)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/tablereader.go:184 +0x35f fp=0xc00a7f8f20 sp=0xc00a7f8be0 pc=0x20526ef
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*samplerProcessor).Run(0xc0082e8a80, 0x4ce7c80, 0xc01d541c40)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/sampler.go:178 +0x5b fp=0xc00a7f8fa0 sp=0xc00a7f8f20 pc=0x203f59b
github.com/cockroachdb/cockroach/pkg/sql/flowinfra.(*FlowBase).startInternal.func1(0xc01a12f560, 0xc019ac7380, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow.go:284 +0x65 fp=0xc00a7f8fc8 sp=0xc00a7f8fa0 pc=0x201bc05
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc00a7f8fd0 sp=0xc00a7f8fc8 pc=0x7bd2f1
created by github.com/cockroachdb/cockroach/pkg/sql/flowinfra.(*FlowBase).startInternal
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow.go:283 +0x35b

@jordanlewis could someone on SQL execution take a look at this? It looks like an issue in row.Fetcher.

@nvanbenschoten
Copy link
Member

This looks like #39350 (see #36570 (comment)), but it's not the same.

@jordanlewis
Copy link
Member

Thanks. Will take a look.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/93c030e0677283cdea2c9b97a5c91dfe78dc63c1

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=3/cpu=16 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1495889&tab=artifacts#/tpccbench/nodes=3/cpu=16

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190918-1495889/tpccbench/nodes=3/cpu=16/run_1
	cluster.go:2114,tpcc.go:877,tpcc.go:588,test_runner.go:689: unexpected node event: 3: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/7c8323ad186dc4c4aef43882a08c5a75c0648695

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=3/cpu=16 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1496043&tab=artifacts#/tpccbench/nodes=3/cpu=16

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190918-1496043/tpccbench/nodes=3/cpu=16/run_1
	cluster.go:2114,tpcc.go:877,tpcc.go:588,test_runner.go:689: unexpected node event: 3: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/7c8323ad186dc4c4aef43882a08c5a75c0648695

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=3/cpu=16 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1496041&tab=artifacts#/tpccbench/nodes=3/cpu=16

The test failed on branch=master, cloud=aws:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190918-1496041/tpccbench/nodes=3/cpu=16/run_1
	cluster.go:2114,tpcc.go:877,tpcc.go:588,test_runner.go:689: unexpected node event: 3: dead

@jordanlewis
Copy link
Member

@yuzefovich could this be related to the refactor we did recently? Maybe not, but would be good to double check if you have a moment.

@yuzefovich
Copy link
Member

Hm, I think it is possible but unlikely. I'll take a look tomorrow.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/c6342c90a7fa4ceb1b674faa47a95e1726d05e79

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=3/cpu=16 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1496387&tab=artifacts#/tpccbench/nodes=3/cpu=16

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190919-1496387/tpccbench/nodes=3/cpu=16/run_1
	cluster.go:2114,tpcc.go:877,tpcc.go:588,test_runner.go:689: unexpected node event: 2: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/c6342c90a7fa4ceb1b674faa47a95e1726d05e79

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=3/cpu=16 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1496385&tab=artifacts#/tpccbench/nodes=3/cpu=16

The test failed on branch=master, cloud=aws:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190919-1496385/tpccbench/nodes=3/cpu=16/run_1
	cluster.go:2114,tpcc.go:877,tpcc.go:588,test_runner.go:689: unexpected node event: 3: dead

@jordanlewis
Copy link
Member

I'm adding this to the release blocker list. Thanks @nvanbenschoten for uncovering this issue.

@yuzefovich
Copy link
Member

Yes, I'm at fault here (although it was not epic distsqlrun refactor, rather CFetcher refactor). It should be an easy fix.

@craig craig bot closed this as completed in 809e4c2 Sep 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants