Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: scrub/index-only/tpcc-1000 failed #33151

Closed
cockroach-teamcity opened this issue Dec 14, 2018 · 30 comments · Fixed by #34548
Closed

roachtest: scrub/index-only/tpcc-1000 failed #33151

cockroach-teamcity opened this issue Dec 14, 2018 · 30 comments · Fixed by #34548
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/859214b81838a4ba33048b81497442ce5774baa7

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1053297&tab=buildLog

The test failed on master:
	test.go:630,schemachange.go:329,scrub.go:75,cluster.go:1465,errgroup.go:57: dial tcp 104.154.79.202:26257: connect: connection refused
	test.go:630,cluster.go:1486,tpcc.go:120,scrub.go:71: signal: interrupt

@cockroach-teamcity cockroach-teamcity added this to the 2.2 milestone Dec 14, 2018
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Dec 14, 2018
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/0c87b11cb99ba5c677c95ded55dcba385928474e

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1054703&tab=buildLog

The test failed on release-2.1:
	test.go:628,cluster.go:1139,tpcc.go:97,tpcc.go:101,scrub.go:71: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1054703-scrub-index-only-tpcc-1000:5 -- ./workload fixtures load tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I181214 07:07:16.894415 82 ccl/workloadccl/fixture.go:493  loaded item (1m48s, 100000 rows, 0 index entries, 7.8 MiB)
		I181214 07:07:25.627730 44 ccl/workloadccl/fixture.go:493  loaded warehouse (1m57s, 1000 rows, 0 index entries, 53 KiB)
		I181214 07:07:57.119545 45 ccl/workloadccl/fixture.go:493  loaded district (2m28s, 10000 rows, 0 index entries, 1006 KiB)
		I181214 07:08:30.471942 49 ccl/workloadccl/fixture.go:493  loaded new_order (3m1s, 9000000 rows, 0 index entries, 126 MiB)
		: signal: interrupt

@thoszhang thoszhang self-assigned this Dec 14, 2018
@thoszhang
Copy link
Contributor

same situation as #33149 (comment)

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/7efc92a4dec689efc855ecd382a6f6b6065b98ec

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1055192&tab=buildLog

The test failed on master:
	test.go:628,schemachange.go:329,scrub.go:75,cluster.go:1465,errgroup.go:57: dial tcp 35.188.110.95:26257: connect: connection refused
	test.go:628,cluster.go:1486,tpcc.go:120,scrub.go:71: signal: interrupt

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/f524717e66973da0c11655c860d4b131f82409b9

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1056506&tab=buildLog

The test failed on master:
	test.go:628,cluster.go:1139,tpcc.go:110,cluster.go:1465,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1056506-scrub-index-only-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=3h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		     3.5  11274.3  34359.7  34359.7  34359.7 delivery
		2h15m25s    55237           39.0           35.0   9126.8  13958.6 103079.2 103079.2 newOrder
		2h15m25s    55237            5.0            3.5    184.5    352.3    352.3    352.3 orderStatus
		2h15m25s    55237           29.0           34.7   7247.8  60129.5 103079.2 103079.2 payment
		2h15m25s    55237            3.0            3.5    906.0   1275.1   1275.1   1275.1 stockLevel
		E181215 12:22:36.369270 1 workload/cli/run.go:402  error in payment: dial tcp 10.128.0.35:26257: connect: connection refused
		2h15m26s    70677            4.0            3.5  10200.5  25769.8  25769.8  25769.8 delivery
		2h15m26s    70677           25.0           35.0   7784.6  10737.4  12348.0  12348.0 newOrder
		2h15m26s    70677            1.0            3.5    176.2    176.2    176.2    176.2 orderStatus
		2h15m26s    70677           31.0           34.7   5905.6  24696.1 103079.2 103079.2 payment
		2h15m26s    70677            2.0            3.5    318.8    369.1    369.1    369.1 stockLevel
		: signal: killed
	test.go:628,schemachange.go:329,scrub.go:75,cluster.go:1465,errgroup.go:57: pq: initial connection heartbeat failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.128.0.47:26257: connect: connection refused"
	test.go:628,cluster.go:1486,tpcc.go:120,scrub.go:71: unexpected node event: 4: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/334cce5d61b32b0bb4a300668522c38fb9d6b96d

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1056651&tab=buildLog

The test failed on master:
	test.go:628,schemachange.go:329,scrub.go:75,cluster.go:1465,errgroup.go:57: dial tcp 35.224.163.58:26257: connect: connection refused
	test.go:628,cluster.go:1486,tpcc.go:120,scrub.go:71: signal: interrupt

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/e7dc507fa0ecc7dc5ed597ca5c6cdeb48086428c

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1057350&tab=buildLog

The test failed on master:
	test.go:628,schemachange.go:329,scrub.go:75,cluster.go:1465,errgroup.go:57: dial tcp 104.198.65.143:26257: connect: connection refused
	test.go:628,cluster.go:1486,tpcc.go:120,scrub.go:71: signal: interrupt

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/8367c883c5db0f4b5aea949530e41a068f25530d

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1058060&tab=buildLog

The test failed on master:
	test.go:628,schemachange.go:329,scrub.go:75,cluster.go:1465,errgroup.go:57: dial tcp 35.224.98.124:26257: connect: connection refused
	test.go:628,cluster.go:1486,tpcc.go:120,scrub.go:71: signal: interrupt

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/5ea89330c569d100dc7356ce7b61204202f02273

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1066316&tab=buildLog

The test failed on master:
	test.go:703,scrub.go:71,cluster.go:1463,errgroup.go:57: pq: communication error: rpc error: code = Canceled desc = context canceled
	test.go:703,cluster.go:1137,tpcc.go:110,cluster.go:1463,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1066316-scrub-index-only-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		l
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		   11m9s      132            5.0            2.6   2818.6   3892.3   3892.3   3892.3 delivery
		   11m9s      132           26.0           21.7   5100.3 103079.2 103079.2 103079.2 newOrder
		   11m9s      132            1.0            2.6     10.0     10.0     10.0     10.0 orderStatus
		   11m9s      132           34.9           20.2   3355.4 103079.2 103079.2 103079.2 payment
		   11m9s      132            1.0            2.6    121.6    121.6    121.6    121.6 stockLevel
		  11m10s      132            3.0            2.6   4160.7  10200.5  10200.5  10200.5 delivery
		  11m10s      132            6.0           21.7   6710.9 103079.2 103079.2 103079.2 newOrder
		  11m10s      132            2.0            2.6   1140.9   4563.4   4563.4   4563.4 orderStatus
		  11m10s      132            4.0           20.2   3758.1 103079.2 103079.2 103079.2 payment
		  11m10s      132            0.0            2.6      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	test.go:703,cluster.go:1484,tpcc.go:120,scrub.go:56: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/b024b461265a7ca3cc1d156fef459818d127b065

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1074018&tab=buildLog

The test failed on release-2.1:
	test.go:703,cluster.go:1137,tpcc.go:110,cluster.go:1463,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1074018-scrub-index-only-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		Error: read tcp 10.128.0.45:51874->10.128.0.6:26257: read: connection reset by peer
		Error:  exit status 1
		: exit status 1
	test.go:703,cluster.go:1484,tpcc.go:120,scrub.go:56: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/f5e3c29b2eed92868cf3d449575283e2e383f199

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1088848&tab=buildLog

The test failed on master:
	test.go:696,cluster.go:1164,tpcc.go:120,cluster.go:1490,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1088848-scrub-index-only-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		    2.4      0.0      0.0      0.0      0.0 delivery
		 1h18m1s    57954            3.0           23.1   9126.8  13958.6  13958.6  13958.6 newOrder
		 1h18m1s    57954            0.0            2.4      0.0      0.0      0.0      0.0 orderStatus
		 1h18m1s    57954            7.0           22.4  10737.4  13421.8  13421.8  13421.8 payment
		 1h18m1s    57954            1.0            2.4    113.2    113.2    113.2    113.2 stockLevel
		E190111 15:35:42.154520 1 workload/cli/run.go:402  error in newOrder: dial tcp 10.128.0.33:26257: connect: connection refused
		 1h18m2s   102528            0.0            2.4      0.0      0.0      0.0      0.0 delivery
		 1h18m2s   102528            7.0           23.1   8589.9  81604.4  81604.4  81604.4 newOrder
		 1h18m2s   102528            0.0            2.4      0.0      0.0      0.0      0.0 orderStatus
		 1h18m2s   102528            5.0           22.4  10200.5  13958.6  13958.6  13958.6 payment
		 1h18m2s   102528            0.0            2.4      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	test.go:696,cluster.go:1511,tpcc.go:130,scrub.go:58: unexpected node event: 1: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/5058e4a6b1a20ec208505029eddb5b5e25cb7d65

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1089946&tab=buildLog

The test failed on provisional_201901110710_v2.2.0-alpha-20190114:
	test.go:696,cluster.go:1164,tpcc.go:120,cluster.go:1490,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1089946-scrub-index-only-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		: signal: killed
	test.go:696,cluster.go:1248,cluster.go:1267,cluster.go:1362,scrub.go:72,cluster.go:1490,errgroup.go:57: context canceled
	test.go:696,cluster.go:1511,tpcc.go:130,scrub.go:58: unexpected node event: 3: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/fe6fbbb99f51f414804daaeb704635ee0ff17b28

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1091924&tab=buildLog

The test failed on master:
	test.go:696,cluster.go:1164,tpcc.go:132,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1091924-scrub-index-only-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190114 16:01:45.066839 1 workload/tpcc/tpcc.go:290  check 3.3.2.1 took 6.727721917s
		Error: check failed: 3.3.2.1: 13 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/6885730c58f9a45511f92be95e94129005d6b875

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1099559&tab=buildLog

The test failed on master:
	test.go:727,cluster.go:1203,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1099559-scrub-index-only-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190118 16:12:05.523504 1 workload/tpcc/tpcc.go:290  check 3.3.2.1 took 5.744697126s
		Error: check failed: 3.3.2.1: 191 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/0e848c9ef932f8f2a2b016580feedf2275c8e505

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1100545&tab=buildLog

The test failed on master:
	test.go:727,cluster.go:1203,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1100545-scrub-index-only-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190119 16:02:08.968413 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 5.241623213s
		Error: check failed: 3.3.2.1: 77 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/b52b7d1382b454ce1bb43f2187088aef9c557ed5

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1101227&tab=buildLog

The test failed on master:
	test.go:727,cluster.go:1203,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1101227-scrub-index-only-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190120 16:10:27.560231 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 4.851466031s
		Error: check failed: 3.3.2.1: 47 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/792e95bf111fc544b06b471d6ed5b0d8fe3acf5e

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1102546&tab=buildLog

The test failed on master:
	test.go:727,cluster.go:1203,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1102546-scrub-index-only-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190121 15:55:45.487296 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 4.634519592s
		Error: check failed: 3.3.2.1: 79 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/798304879367166c8954825f40c404ba100cea0a

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1103521&tab=buildLog

The test failed on master:
	test.go:727,cluster.go:1203,tpcc.go:118,cluster.go:1541,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1103521-scrub-index-only-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		Error: read tcp 10.128.0.54:40530->10.128.0.57:26257: read: connection reset by peer
		Error:  exit status 1
		: exit status 1
	test.go:727,cluster.go:1279,cluster.go:1298,cluster.go:1402,scrub.go:72,cluster.go:1541,errgroup.go:57: context canceled
	test.go:727,cluster.go:1562,tpcc.go:128,scrub.go:58: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/f37d09cd3cdd32f4d4894611cfd60caf25c10fff

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1105096&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1195,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1105096-scrub-index-only-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190123 16:09:06.997407 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 4.520870659s
		Error: check failed: 3.3.2.1: 232 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/295b6ae3142518f04a5771c79ce171043697ee1f

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1107301&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1195,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1107301-scrub-index-only-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190124 16:30:06.763357 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 5.566227729s
		Error: check failed: 3.3.2.1: 72 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/2952f08ba7260967d7dfd10addbfe80b51d2b8ed

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1109027&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1195,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1109027-scrub-index-only-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190125 16:36:27.095971 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 7.748697403s
		Error: check failed: 3.3.2.1: 96 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/dc2fbcdc0dccb8cc676fc67370375bab36b3cff0

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1110068&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1195,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1110068-scrub-index-only-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190126 17:17:14.107063 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 3.677611098s
		Error: check failed: 3.3.2.1: 1 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/8cbeb534432b81c57564956ed7d645b854b426be

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1111300&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1195,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1111300-scrub-index-only-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190127 16:12:37.934260 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 3.709970366s
		Error: check failed: 3.3.2.1: 4 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/9f084ad576e85756c5c5a7e41335d9aa2d3eee30

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1112101&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1195,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1112101-scrub-index-only-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190128 16:53:57.994503 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 3.636875703s
		Error: check failed: 3.3.2.1: 1 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/e10fb557b11b5ff1b8609aa963da23c37a1143c8

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1113854&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1226,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1113854-scrub-index-only-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190129 16:15:18.545896 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 3.559499614s
		Error: check failed: 3.3.2.1: 6 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/395d842feb97c5bd8cad2b32b71a5156c03061eb

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1115923&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1226,tpcc.go:118,cluster.go:1564,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1115923-scrub-index-only-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		0      0.0      0.0 orderStatus
		   1m32s    40806            0.0           66.8      0.0      0.0      0.0      0.0 payment
		   1m32s    40806            0.0            8.7      0.0      0.0      0.0      0.0 stockLevel
		E190130 15:25:38.789339 1 workload/cli/run.go:402  error in newOrder: dial tcp 10.128.0.23:26257: connect: connection refused
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		   1m33s    96981            1.0            7.9  28991.0  28991.0  28991.0  28991.0 delivery
		   1m33s    96981            0.0           73.7      0.0      0.0      0.0      0.0 newOrder
		   1m33s    96981            2.0            8.2      9.4     11.0     11.0     11.0 orderStatus
		   1m33s    96981            0.0           66.1      0.0      0.0      0.0      0.0 payment
		   1m33s    96981            1.0            8.6     37.7     37.7     37.7     37.7 stockLevel
		E190130 15:25:39.789420 1 workload/cli/run.go:402  error in newOrder: dial tcp 10.128.0.23:26257: connect: connection refused
		: signal: killed
	test.go:743,cluster.go:1302,cluster.go:1321,cluster.go:1425,scrub.go:72,cluster.go:1564,errgroup.go:57: context canceled
	test.go:743,cluster.go:1585,tpcc.go:128,scrub.go:58: unexpected node event: 4: dead

@nvanbenschoten
Copy link
Member

Previous failure was

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x190242d]

goroutine 719576 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc0001c5830, 0x3898dc0, 0xc00f811050)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:183 +0x11f
panic(0x2d24dc0, 0x546ff60)
	/usr/local/go/src/runtime/panic.go:513 +0x1b9
github.com/cockroachdb/cockroach/pkg/storage.(*truncateDecision).raftSnapshotsForIndex(0xc0060625f0, 0x0, 0xc0054edb00)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_log_queue.go:226 +0x3d

Fixed by #34399.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/407932a95f3ad53d61481e5a7493fc4ed468faa9

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1117776&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1226,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1117776-scrub-index-only-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190131 16:59:43.382885 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 3.771742354s
		Error: check failed: 3.3.2.1: 1 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/fc3ea118c87ae1a9d2ed6f4974f2296766607666

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1119860&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1226,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1119860-scrub-index-only-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190201 16:46:57.579943 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 3.666064396s
		Error: check failed: 3.3.2.1: 7 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/1b8689c0b4df102e1bf4e271913c4bb096ca8ffe

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1121356&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1226,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1121356-scrub-index-only-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190202 16:15:23.621161 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 3.720390729s
		Error: check failed: 3.3.2.1: 1 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/b9bd958fccddc699d47eccbbec80db75c10eab46

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1122796&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1226,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1122796-scrub-index-only-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190204 17:12:11.838092 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 4.030364759s
		Error: check failed: 3.3.2.1: 1 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 5, 2019
Fixes cockroachdb#34025.
Fixes cockroachdb#33624.
Fixes cockroachdb#33335.
Fixes cockroachdb#33151.
Fixes cockroachdb#33149.
Fixes cockroachdb#34159.
Fixes cockroachdb#34293.
Fixes cockroachdb#32813.
Fixes cockroachdb#30886.
Fixes cockroachdb#34228.
Fixes cockroachdb#34321.

It is rare but possible for a replica to become a leaseholder but not
learn about this until it applies a snapshot. Immediately upon the
snapshot application's `ReplicaState` update, the replica will begin
operating as a standard leaseholder.

Before this change, leases acquired in this way would not trigger
in-memory side-effects to be performed. This could result in a regression
in the new leaseholder's timestamp cache compared to the previous
leaseholder, allowing write-skew like we saw in cockroachdb#34025. This could
presumably result in other anomalies as well, because all of the
steps in `leasePostApply` were skipped.

This PR fixes this bug by detecting lease updates when applying
snapshots and making sure to react correctly to them. It also likely
fixes the referenced issue. The new test demonstrated that without
this fix, the serializable violation speculated about in the issue
was possible.

Release note (bug fix): Fix bug where lease transfers passed through
Snapshots could forget to update in-memory state on the new leaseholder,
allowing write-skew between read-modify-write operations.
craig bot pushed a commit that referenced this issue Feb 5, 2019
34548: storage: apply lease change side-effects on snapshot recipients r=nvanbenschoten a=nvanbenschoten

Fixes #34025.
Fixes #33624.
Fixes #33335.
Fixes #33151.
Fixes #33149.
Fixes #34159.
Fixes #34293.
Fixes #32813.
Fixes #30886.
Fixes #34228.
Fixes #34321.

It is rare but possible for a replica to become a leaseholder but not learn about this until it applies a snapshot. Immediately upon the snapshot application's `ReplicaState` update, the replica will begin operating as a standard leaseholder.

Before this change, leases acquired in this way would not trigger in-memory side-effects to be performed. This could result in a regression in the new leaseholder's timestamp cache compared to the previous leaseholder's cache, allowing write-skew like we saw in #34025. This could presumably result in other anomalies as well, because all of the steps in `leasePostApply` were skipped (as theorized by #34025 (comment)).

This PR fixes this bug by detecting lease updates when applying snapshots and making sure to react correctly to them. It also likely fixes the referenced issue. The new test demonstrates that without this fix, the serializable violation speculated about in the issue was possible.

Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig craig bot closed this as completed in #34548 Feb 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants