Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: tpccbench/nodes=9/cpu=4/multi-region failed #41876

Closed
cockroach-teamcity opened this issue Oct 23, 2019 · 118 comments · Fixed by #46585
Closed

roachtest: tpccbench/nodes=9/cpu=4/multi-region failed #41876

cockroach-teamcity opened this issue Oct 23, 2019 · 118 comments · Fixed by #46585
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/5612fc5f44e34cf10f60e63ed5a53b6dfa867190

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/multi-region PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1554254&tab=artifacts#/tpccbench/nodes=9/cpu=4/multi-region

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191023-1554254/tpccbench/nodes=9/cpu=4/multi-region/run_1
	test_runner.go:704: test timed out (10h0m0s)

@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels Oct 23, 2019
@cockroach-teamcity cockroach-teamcity added this to the 19.2 milestone Oct 23, 2019
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/f9a102814bdce90d687f6215acadf10a9d784c29

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/multi-region PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1555992&tab=artifacts#/tpccbench/nodes=9/cpu=4/multi-region

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191024-1555992/tpccbench/nodes=9/cpu=4/multi-region/run_1
	test_runner.go:704: test timed out (10h0m0s)

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/1a940ddc06876a1d6511e614391fcffcbe42f664

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/multi-region PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1557715&tab=artifacts#/tpccbench/nodes=9/cpu=4/multi-region

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191025-1557715/tpccbench/nodes=9/cpu=4/multi-region/run_1
	test_runner.go:704: test timed out (10h0m0s)

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/10cc0bbe7ee37af42782f7cf904efa15acef223f

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/multi-region PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1560587&tab=artifacts#/tpccbench/nodes=9/cpu=4/multi-region

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191027-1560587/tpccbench/nodes=9/cpu=4/multi-region/run_1
	cluster.go:2163,tpcc.go:720,tpcc.go:561,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1560587-1572213535-79-n12cpu4-geo:4 -- ./workload fixtures load tpcc --warehouses=5000 --scatter --checks=false --partitions=3 --zones="us-east1-b,us-west1-b,europe-west2-b" {pgurl:1} returned:
		stderr:
		
		stdout:
		3.476902051s (45000000 rows, 0 index entries, 461 KiB)
		I191028 04:05:45.444319 58 ccl/workloadccl/fixture.go:547  loaded 21 GiB table history in 34m16.683601409s (150000000 rows, 300000000 index entries, 11 MiB)
		I191028 04:15:27.335107 59 ccl/workloadccl/fixture.go:547  loaded 6.7 GiB table order in 43m58.57395722s (150000000 rows, 150000000 index entries, 2.6 MiB)
		I191028 06:55:56.931877 57 ccl/workloadccl/fixture.go:547  loaded 86 GiB table customer in 3h24m28.170839942s (150000000 rows, 150000000 index entries, 7.2 MiB)
		I191028 07:26:45.518893 62 ccl/workloadccl/fixture.go:547  loaded 157 GiB table stock in 3h55m16.7578643s (500000000 rows, 500000000 index entries, 11 MiB)
		Error: restoring fixture: pq: importing 4044 ranges: change replicas of r16937 failed: descriptor changed: expected r16937:/Table/54/1/42{29/4/-507/7-54/1/-898/5} [(n3,s3):1, (n5,s5):2, (n8,s8):4VOTER_OUTGOING, (n7,s7):6VOTER_INCOMING, next=7, gen=468, sticky=1572237091.384240317,0] != [actual] nil (range subsumed)
		Error:  exit status 1
		: exit status 1

@ajwerner
Copy link
Contributor

This last failure seems like it would be fixed by #41392. We should also add a retry loop around any calls to change replicas.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/a57647381a4714b48f6ec6dec0bf766eaa6746dd

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/multi-region PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1561660&tab=artifacts#/tpccbench/nodes=9/cpu=4/multi-region

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191029-1561660/tpccbench/nodes=9/cpu=4/multi-region/run_1
	cluster.go:2163,tpcc.go:720,tpcc.go:561,test_runner.go:697: write tcp 172.17.0.2:48900->34.73.2.2:26257: write: connection timed out

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/a4d88c2c5ab6131878d2b4552446d94fd93b1553

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/multi-region PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1563612&tab=artifacts#/tpccbench/nodes=9/cpu=4/multi-region

The test failed on branch=release-19.1, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191030-1563612/tpccbench/nodes=9/cpu=4/multi-region/run_1
	cluster.go:2163,tpcc.go:720,tpcc.go:561,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1563612-1572415545-77-n12cpu4-geo:4 -- ./workload fixtures load tpcc --warehouses=5000 --scatter --checks=false --partitions=3 --zones="us-east1-b,us-west1-b,europe-west2-b" {pgurl:1} returned:
		stderr:
		
		stdout:
		0000 rows, 300000000 index entries, 10 MiB)
		I191030 12:47:53.215523 78 ccl/workloadccl/fixture.go:547  loaded 86 GiB table customer in 1h27m19.92065554s (150000000 rows, 150000000 index entries, 17 MiB)
		I191030 13:42:18.375359 84 ccl/workloadccl/fixture.go:547  loaded 115 GiB table order_line in 2h21m45.080671519s (1500013787 rows, 1500013787 index entries, 14 MiB)
		I191030 14:13:40.042852 83 ccl/workloadccl/fixture.go:547  loaded 157 GiB table stock in 2h53m6.747700009s (500000000 rows, 500000000 index entries, 16 MiB)
		I191030 14:13:45.432524 1 ccl/workloadccl/cliccl/fixtures.go:286  restored 387 GiB bytes in 9 tables (took 2h53m12.279052532s, 38.16 MiB/s)
		Error: Could not postload: could not partition tables: Couldn't exec "\n\t\t\tALTER INDEX item@replicated_idx_2\n\t\t\tCONFIGURE ZONE USING num_replicas = COPY FROM PARENT, constraints = '{\"+zone=europe-west2-b\":1}', lease_preferences = '[[+zone=europe-west2-b]]'": pq: could not validate zone config: at least one replica is required
		Error:  exit status 1
		: exit status 1

@ajwerner ajwerner assigned ajwerner and unassigned andreimatei Oct 30, 2019
@ajwerner
Copy link
Contributor

I’ll take this one. I’ve been mucking around over here.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/262e6f2499e34eb4373d0450fa9f6a820a609b2c

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/multi-region PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1565222&tab=artifacts#/tpccbench/nodes=9/cpu=4/multi-region

The test failed on branch=provisional_201910301435_v19.2.0-rc.3, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191030-1565222/tpccbench/nodes=9/cpu=4/multi-region/run_1
	test_runner.go:712: test timed out (10h0m0s)

@solongordon solongordon mentioned this issue Oct 31, 2019
18 tasks
@ajwerner
Copy link
Contributor

This most recent timeout seems reasonable. It took ~4hr to import the data. We then waited 1h40m for rebalancing which in practice was about 2hr. Each iteration of the line search takes ~20 minutes. The line search was closing in on a number on its 12 iteration which is roughly 3h40m + a bit. That adds up to about the 10h timeout. I think we expect the import to take less time but we changed the import size not too long ago and didn't adjust the timeout.

Let's leave this open to adjust the test timing. Perhaps we don't need to wait so long for rebalancing and perhaps also we could improve the import speed by using some of the tricks we've learned recently.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/0f473848083559c8a98be032949df9428068c223

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/multi-region PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1565997&tab=artifacts#/tpccbench/nodes=9/cpu=4/multi-region

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191031-1565997/tpccbench/nodes=9/cpu=4/multi-region/run_1
	test_runner.go:712: test timed out (10h0m0s)

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/62801ce77d9055c00b0e30010f5998ea2cd86686

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/multi-region PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1571210&tab=artifacts#/tpccbench/nodes=9/cpu=4/multi-region

The test failed on branch=provisional_201911010137_v19.2.0-rc.3, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191102-1571210/tpccbench/nodes=9/cpu=4/multi-region/run_1
	test_runner.go:712: test timed out (10h0m0s)

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/8b9f54761adc58eb9aecbf9b26f1a7987d8a01e5

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/multi-region PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1573251&tab=artifacts#/tpccbench/nodes=9/cpu=4/multi-region

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191105-1573251/tpccbench/nodes=9/cpu=4/multi-region/run_1
	test_runner.go:712: test timed out (10h0m0s)

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/239513342a2d23f683bbc1d386f87ff59cc78d10

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/multi-region PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1575479&tab=artifacts#/tpccbench/nodes=9/cpu=4/multi-region

The test failed on branch=provisional_201910141814_v19.2.0-rc.1, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191106-1575479/tpccbench/nodes=9/cpu=4/multi-region/run_1
	cluster.go:2163,tpcc.go:720,tpcc.go:561,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1575479-1573004318-83-n12cpu4-geo:4 -- ./workload fixtures load tpcc --warehouses=5000 --scatter --checks=false --partitions=3 --zones="us-east1-b,us-west1-b,europe-west2-b" {pgurl:1} returned:
		stderr:
		
		stdout:
		ies, 5.0 MiB)
		I191106 09:25:39.610850 75 ccl/workloadccl/fixture.go:547  loaded 86 GiB table customer in 2h34m47.938731103s (150000000 rows, 150000000 index entries, 9.5 MiB)
		I191106 10:38:28.384347 81 ccl/workloadccl/fixture.go:547  loaded 115 GiB table order_line in 3h47m36.712637848s (1500013787 rows, 1500013787 index entries, 8.7 MiB)
		I191106 10:56:00.926603 80 ccl/workloadccl/fixture.go:547  loaded 157 GiB table stock in 4h5m9.25455371s (500000000 rows, 500000000 index entries, 11 MiB)
		I191106 10:56:07.591565 1 ccl/workloadccl/cliccl/fixtures.go:286  restored 387 GiB bytes in 9 tables (took 4h5m16.731735591s, 26.95 MiB/s)
		Error: Could not postload: could not partition tables: Couldn't exec "\n\t\t\tALTER INDEX item@replicated_idx_0\n\t\t\tCONFIGURE ZONE USING num_replicas = COPY FROM PARENT, constraints = '{\"+zone=us-east1-b\":1}', lease_preferences = '[[+zone=us-east1-b]]'": pq: could not validate zone config: when per-replica constraints are set, num_replicas must be set as well
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/33b96613ae532b25a1b6b716453bece9b60ba2d6

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/multi-region PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1583742&tab=artifacts#/tpccbench/nodes=9/cpu=4/multi-region

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191109-1583742/tpccbench/nodes=9/cpu=4/multi-region/run_1
	test_runner.go:712: test timed out (10h0m0s)

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/35e138aa3c2be545fb4e17a85ea6f1b8d6525e53

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/multi-region PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1584763&tab=artifacts#/tpccbench/nodes=9/cpu=4/multi-region

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191110-1584763/tpccbench/nodes=9/cpu=4/multi-region/run_1
	test_runner.go:712: test timed out (10h0m0s)

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/64162f27783bd59799531ff977f0fd1d0fd5ae86

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/multi-region PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1587139&tab=artifacts#/tpccbench/nodes=9/cpu=4/multi-region

The test failed on branch=release-19.2, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191112-1587139/tpccbench/nodes=9/cpu=4/multi-region/run_1
	test_runner.go:712: test timed out (10h0m0s)

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/0e9dd73f803247cdcfd06f51ce6b23396af1b9f5

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/multi-region PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1587121&tab=artifacts#/tpccbench/nodes=9/cpu=4/multi-region

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191112-1587121/tpccbench/nodes=9/cpu=4/multi-region/run_1
	test_runner.go:712: test timed out (10h0m0s)

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/35e138aa3c2be545fb4e17a85ea6f1b8d6525e53

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/multi-region PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1588906&tab=artifacts#/tpccbench/nodes=9/cpu=4/multi-region

The test failed on branch=provisional_201911111508_v20.1.0-alpha.20191118, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191112-1588906/tpccbench/nodes=9/cpu=4/multi-region/run_1
	test_runner.go:712: test timed out (10h0m0s)

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/8622fad01478fb4a4f05b5579eb0b8561c02e491

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/multi-region PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1591079&tab=artifacts#/tpccbench/nodes=9/cpu=4/multi-region

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191114-1591079/tpccbench/nodes=9/cpu=4/multi-region/run_1
	cluster.go:2163,tpcc.go:720,tpcc.go:561,test_runner.go:697: unexpected node event: 5: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/53eef0857d14cc3af720e136ddaff4eeab026fd0

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/multi-region PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1599080&tab=artifacts#/tpccbench/nodes=9/cpu=4/multi-region

The test failed on branch=provisional_201911182308_v19.2.1, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191119-1599080/tpccbench/nodes=9/cpu=4/multi-region/run_1
	test_runner.go:712: test timed out (10h0m0s)

@thoszhang thoszhang mentioned this issue Nov 20, 2019
18 tasks
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/1c3fce73cd25fa69e5a8c05ce8e215d66b4f49e4

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/multi-region PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1599833&tab=artifacts#/tpccbench/nodes=9/cpu=4/multi-region

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191120-1599833/tpccbench/nodes=9/cpu=4/multi-region/run_1
	test_runner.go:712: test timed out (10h0m0s)

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/multi-region failed on master@5570c01402796edb7cd06eb8ce7f615371f22d42:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20200311-1801614/tpccbench/nodes=9/cpu=4/multi-region/run_1
	tpcc.go:858,tpcc.go:570,test_runner.go:747: error with attached stack trace:
		    main.(*monitor).WaitE
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2356
		    main.runTPCCBench.func3
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:835
		    github.com/cockroachdb/cockroach/pkg/util/search.searchWithSearcher
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:43
		    github.com/cockroachdb/cockroach/pkg/util/search.(*lineSearcher).Search
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:173
		    main.runTPCCBench
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:746
		    main.registerTPCCBenchSpec.func1
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:570
		    main.(*testRunner).runTest.func2
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:747
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - monitor failure:
		  - error with attached stack trace:
		    main.(*monitor).wait.func3
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2416
		    main.(*monitor).wait.func4
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2445
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - monitor command failure:
		  - signal: interrupt

	cluster.go:2050,cluster.go:2069,cluster.go:2173,cluster.go:1470,context.go:135,cluster.go:1467,test_runner.go:797: context canceled

More

Artifacts: /tpccbench/nodes=9/cpu=4/multi-region
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/multi-region failed on master@69dc87d68addedf2fabfb2b14c098cfb35b5f3d0:

		          538.0s        0            3.0            4.7   1006.6   2952.8   2952.8   2952.8 stockLevel
		          539.0s        0            5.0            4.8   7516.2  42949.7  42949.7  42949.7 delivery
		          539.0s        0           27.0           45.6  45097.2 103079.2 103079.2 103079.2 newOrder
		          539.0s        0            4.0            4.8    704.6  20401.1  20401.1  20401.1 orderStatus
		          539.0s        0           26.0           46.8  11274.3  68719.5 103079.2 103079.2 payment
		          539.0s        0            4.0            4.7   1006.6   7516.2   7516.2   7516.2 stockLevel
		          540.0s        0            1.0            4.8  42949.7  42949.7  42949.7  42949.7 delivery
		          540.0s        0           30.0           45.5  24696.1  94489.3 103079.2 103079.2 newOrder
		          540.0s        0            4.0            4.8    872.4  11811.2  11811.2  11811.2 orderStatus
		          540.0s        0           21.0           46.8  14495.5  66572.0 103079.2 103079.2 payment
		          540.0s        0            2.0            4.7   1543.5  22548.6  22548.6  22548.6 stockLevel
		        _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		          541.0s        0            4.0            4.8   8053.1  21474.8  21474.8  21474.8 delivery
		          541.0s        0           30.0           45.5  20401.1 103079.2 103079.2 103079.2 newOrder
		          541.0s        0            3.0            4.8    570.4    704.6    704.6    704.6 orderStatus
		          541.0s        0           28.0           46.7  13958.6  85899.3 103079.2 103079.2 payment
		          541.0s        0            4.0            4.7   3221.2  64424.5  64424.5  64424.5 stockLevel
		          542.0s        0            2.0            4.8   4563.4   7516.2   7516.2   7516.2 delivery
		          542.0s        0           30.0           45.5  21474.8  85899.3 103079.2 103079.2 newOrder
		          542.0s        0            2.0            4.8    704.6    872.4    872.4    872.4 orderStatus
		          542.0s        0           20.0           46.7  20401.1  51539.6  73014.4  73014.4 payment
		          542.0s        0            1.0            4.7  73014.4  73014.4  73014.4  73014.4 stockLevel
		          543.0s        0            2.0            4.8   4563.4   6979.3   6979.3   6979.3 delivery
		          543.0s        0           56.0           45.5  22548.6 103079.2 103079.2 103079.2 newOrder
		          543.0s        0            3.0            4.8   1409.3   4831.8   4831.8   4831.8 orderStatus
		          543.0s        0           37.0           46.7  16106.1  77309.4  90194.3  90194.3 payment
		          543.0s        0            2.0            4.7  20401.1  38654.7  38654.7  38654.7 stockLevel
		          544.0s        0            6.0            4.8   7516.2  25769.8  25769.8  25769.8 delivery
		          544.0s        0           42.0           45.5  31138.5 103079.2 103079.2 103079.2 newOrder
		          544.0s        0            5.0            4.8    906.0  11811.2  11811.2  11811.2 orderStatus
		          544.0s        0           37.0           46.6  11811.2  98784.2 103079.2 103079.2 payment
		          544.0s        0            7.0            4.7  12348.0  47244.6  47244.6  47244.6 stockLevel
		        _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		          545.0s        0            0.0            4.8      0.0      0.0      0.0      0.0 delivery
		          545.0s        0           10.0           45.4  33286.0 103079.2 103079.2 103079.2 newOrder
		          545.0s        0            3.0            4.8    738.2   1275.1   1275.1   1275.1 orderStatus
		          545.0s        0           27.0           46.6  13958.6 103079.2 103079.2 103079.2 payment
		          545.0s        0            4.0            4.7  10200.5  66572.0  66572.0  66572.0 stockLevel:
		      - context canceled
		    error running tpcc load generator
		    main.runTPCCBench.func3.1
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:818
		    main.(*monitor).Go.func1
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2344
		    github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357

	cluster.go:2050,cluster.go:2069,cluster.go:2173,cluster.go:1470,context.go:135,cluster.go:1467,test_runner.go:803: context canceled

More

Artifacts: /tpccbench/nodes=9/cpu=4/multi-region
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/multi-region failed on master@33d71472dc01cbc5064b3c5e1fcd666a33f606de:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20200315-1808695/tpccbench/nodes=9/cpu=4/multi-region/run_1
	tpcc.go:858,tpcc.go:570,test_runner.go:753: error with attached stack trace:
		    main.(*monitor).WaitE
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2356
		    main.runTPCCBench.func3
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:835
		    github.com/cockroachdb/cockroach/pkg/util/search.searchWithSearcher
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:43
		    github.com/cockroachdb/cockroach/pkg/util/search.(*lineSearcher).Search
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:173
		    main.runTPCCBench
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:746
		    main.registerTPCCBenchSpec.func1
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:570
		    main.(*testRunner).runTest.func2
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:753
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - monitor failure:
		  - error with attached stack trace:
		    main.(*monitor).wait.func3
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2416
		    main.(*monitor).wait.func4
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2445
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - monitor command failure:
		  - signal: interrupt

	cluster.go:2050,cluster.go:2069,cluster.go:2173,cluster.go:1470,context.go:135,cluster.go:1467,test_runner.go:803: context canceled

More

Artifacts: /tpccbench/nodes=9/cpu=4/multi-region
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/multi-region failed on master@5a3d0c9539a671f0e55b680d3021b18dde9d190d:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20200317-1811809/tpccbench/nodes=9/cpu=4/multi-region/run_1
	tpcc.go:858,tpcc.go:570,test_runner.go:753: error with attached stack trace:
		    main.(*monitor).WaitE
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2356
		    main.runTPCCBench.func3
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:835
		    github.com/cockroachdb/cockroach/pkg/util/search.searchWithSearcher
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:43
		    github.com/cockroachdb/cockroach/pkg/util/search.(*lineSearcher).Search
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:173
		    main.runTPCCBench
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:746
		    main.registerTPCCBenchSpec.func1
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:570
		    main.(*testRunner).runTest.func2
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:753
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - monitor failure:
		  - error with attached stack trace:
		    main.(*monitor).wait.func3
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2416
		    main.(*monitor).wait.func4
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2445
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - monitor command failure:
		  - signal: interrupt

	cluster.go:2050,cluster.go:2069,cluster.go:2173,cluster.go:1470,context.go:135,cluster.go:1467,test_runner.go:803: context canceled

More

Artifacts: /tpccbench/nodes=9/cpu=4/multi-region
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/multi-region failed on master@b5f030223fbcf22e806c48a3c46e74a73a54f50f:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20200318-1814553/tpccbench/nodes=9/cpu=4/multi-region/run_1
	tpcc.go:858,tpcc.go:570,test_runner.go:753: error with attached stack trace:
		    main.(*monitor).WaitE
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2356
		    main.runTPCCBench.func3
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:835
		    github.com/cockroachdb/cockroach/pkg/util/search.searchWithSearcher
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:43
		    github.com/cockroachdb/cockroach/pkg/util/search.(*lineSearcher).Search
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:173
		    main.runTPCCBench
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:746
		    main.registerTPCCBenchSpec.func1
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:570
		    main.(*testRunner).runTest.func2
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:753
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - monitor failure:
		  - error with attached stack trace:
		    main.(*monitor).wait.func3
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2416
		    main.(*monitor).wait.func4
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2445
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - monitor command failure:
		  - signal: interrupt

	cluster.go:2050,cluster.go:2069,cluster.go:2173,cluster.go:1470,context.go:135,cluster.go:1467,test_runner.go:821: context canceled

More

Artifacts: /tpccbench/nodes=9/cpu=4/multi-region
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@ajwerner
Copy link
Contributor

#46184

@ajwerner
Copy link
Contributor

Oops, meant to use this one as the master issue.

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/multi-region failed on master@055561809b95488bff2cad19422e7f4a7472e3a2:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20200324-1823985/tpccbench/nodes=9/cpu=4/multi-region/run_1
	tpcc.go:858,tpcc.go:570,test_runner.go:753: error with attached stack trace:
		    main.(*monitor).WaitE
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2356
		    main.runTPCCBench.func3
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:835
		    github.com/cockroachdb/cockroach/pkg/util/search.searchWithSearcher
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:43
		    github.com/cockroachdb/cockroach/pkg/util/search.(*lineSearcher).Search
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:173
		    main.runTPCCBench
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:746
		    main.registerTPCCBenchSpec.func1
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:570
		    main.(*testRunner).runTest.func2
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:753
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - monitor failure:
		  - error with attached stack trace:
		    main.(*monitor).wait.func3
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2416
		    main.(*monitor).wait.func4
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2445
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - monitor command failure:
		  - signal: interrupt

	cluster.go:2050,cluster.go:2069,cluster.go:2173,cluster.go:1470,context.go:135,cluster.go:1467,test_runner.go:821: context canceled

More

Artifacts: /tpccbench/nodes=9/cpu=4/multi-region
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@nvanbenschoten nvanbenschoten self-assigned this Mar 25, 2020
@nvanbenschoten
Copy link
Member

It doesn't look like these tests are actually hitting any issues, they just take a very long time to run and are being killed by the test runner.

Specifically, we see that the import (workload fixtures load tpcc --warehouses=5000 --scatter --checks=false --partitions=3 --zones="us-east1-b,us-west1-b,europe-west2-b" {pgurl:1}) of 5000 warehouses in a multi-region cluster takes ~4 hours. We then wait for the cluster to rebalance: waiting 1h40m0s for rebalancing. By the time that's done, we only have about an hour and a half until the roachtest times out. The tests begin performing a few tpccbench iterations, each of which takes ~20 minutes. The tpccbench search is not able to complete before the test runner hits its time limit.

There are a few interesting takeaways here:

  1. we should bump the nightly roachtest timeout from 1000 minutes (16h40m) to 1200 minutes (20h).
  2. we should recalibrate this test to start with a better initial estimate (EstimatedMax). Right now it starts at 2200, but that seems to be an underestimate. The last time this test passed successfully, it hit 3087 warehouses. We much have made some big strides here!
  3. we should update tpccbench to more intelligently wait for correct balancing instead of conservatively guessing that it needs to wait for 1h40m after the import.

I'm going to address the first two now and file an issue for the third. With those addressed, we should be able to close this issue.

@nvanbenschoten
Copy link
Member

we should bump the nightly roachtest timeout from 1000 minutes (16h40m) to 1200 minutes (20h).

Done.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 25, 2020
See cockroachdb#41876.

Searching from 2200 up to around 3000 each time this test runs takes a long
time and can lead to test timeouts. Now that we're more efficient, we can
bump the estimated max and limit the max warehouse search.

Release justification: testing only
@nvanbenschoten
Copy link
Member

we should recalibrate this test to start with a better initial estimate (EstimatedMax).

Done in #46585.

@nvanbenschoten
Copy link
Member

we should update tpccbench to more intelligently wait for correct balancing instead of conservatively guessing that it needs to wait for 1h40m after the import.

This is already filed as #44999.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants