roachtest: clearrange/checks=false failed #38772

cockroach-teamcity · 2019-07-09T18:10:07Z

SHA: https://github.com/cockroachdb/cockroach/commits/8c6fdc64908a13291e4ddc5d233bbbaa379e71a2

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1378458&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190709-1378458/clearrange/checks=false/run_1
	test_runner.go:685: test timed out (6h30m0s)
	cluster.go:1724,clearrange.go:56,clearrange.go:35,test_runner.go:670: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1562652995-24-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned:
		stderr:
		
		stdout:
		I190709 11:35:27.249688 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		: signal: killed

The text was updated successfully, but these errors were encountered:

nvanbenschoten · 2019-07-09T19:47:32Z

Tons of RocksDB stalls like:

W190709 11:38:41.369612 17 storage/engine/rocksdb.go:116  [rocksdb] [/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/column_family.cc:779] [default] Stalling writes because we have 3 immutable memtables (waiting for flush), max_write_buffer_number is set to 4 rate 16777216

@ajkr noticed this in #38095 (comment). I closed that issue because part of it was fixed, but this still needs to be tracked.

cockroach-teamcity · 2019-07-19T15:39:38Z

SHA: https://github.com/cockroachdb/cockroach/commits/1ca35fc4a0e2665e7f6efd945e65a0db97984fa7

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1396096&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190719-1396096/clearrange/checks=false/run_1
	cluster.go:1726,clearrange.go:56,clearrange.go:35,test_runner.go:691: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1563517204-16-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned:
		stderr:
		
		stdout:
		I190719 09:08:25.176389 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 1
		: exit status 1

cockroach-teamcity · 2019-07-20T16:50:04Z

SHA: https://github.com/cockroachdb/cockroach/commits/22d48caaa7d39efdcef7b3c87a99fc421e1473af

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1397412&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190720-1397412/clearrange/checks=false/run_1
	test_runner.go:706: test timed out (6h30m0s)
	cluster.go:1788,cluster.go:1807,cluster.go:1911,clearrange.go:110,clearrange.go:159,cluster.go:2069,errgroup.go:57: context canceled
	cluster.go:2090,clearrange.go:187,clearrange.go:35,test_runner.go:691: Goexit() was called

cockroach-teamcity · 2019-07-22T14:21:38Z

SHA: https://github.com/cockroachdb/cockroach/commits/1ad0ecc8cbddf82c9fedb5a5c5e533e72a657ff7

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1399000&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190722-1399000/clearrange/checks=false/run_1
	cluster.go:1726,clearrange.go:56,clearrange.go:35,test_runner.go:691: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1563776264-15-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned:
		stderr:
		
		stdout:
		I190722 09:39:37.537357 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 1
		: exit status 1

cockroach-teamcity · 2019-07-23T17:42:24Z

SHA: https://github.com/cockroachdb/cockroach/commits/7111a67b2ea3a19c2f312f8d214b8823f431cac0

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1400942&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190723-1400942/clearrange/checks=false/run_1
	test_runner.go:706: test timed out (6h30m0s)
	cluster.go:2090,clearrange.go:187,clearrange.go:35,test_runner.go:691: context canceled

cockroach-teamcity · 2019-07-24T16:40:25Z

SHA: https://github.com/cockroachdb/cockroach/commits/86eab2ff0a1a4c2d9b5f7e7a45deda74c98c6c37

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1402541&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190724-1402541/clearrange/checks=false/run_1
	test_runner.go:706: test timed out (6h30m0s)
	cluster.go:1788,cluster.go:1807,cluster.go:1911,clearrange.go:110,clearrange.go:159,cluster.go:2069,errgroup.go:57: context canceled
	cluster.go:2090,clearrange.go:187,clearrange.go:35,test_runner.go:691: Goexit() was called

cockroach-teamcity · 2019-07-25T09:43:57Z

SHA: https://github.com/cockroachdb/cockroach/commits/26edea51118a0e16b61748c08068bfa6f76543ca

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1404886&tab=buildLog

The test failed on branch=provisional_201907241708_v19.2.0-alpha.20190729, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190725-1404886/clearrange/checks=false/run_1
	cluster.go:1726,clearrange.go:56,clearrange.go:35,test_runner.go:691: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1564034590-17-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned:
		stderr:
		
		stdout:
		I190725 09:16:41.993259 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 1
		: exit status 1

cockroach-teamcity · 2019-07-26T11:01:06Z

SHA: https://github.com/cockroachdb/cockroach/commits/9078c4e63c1bff1c3d220ee216000b0903dd4d65

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1406479&tab=buildLog

The test failed on branch=provisional_201907252112_v19.2.0-alpha.20190729, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190726-1406479/clearrange/checks=false/run_1
	test_runner.go:706: test timed out (6h30m0s)
	cluster.go:1726,clearrange.go:56,clearrange.go:35,test_runner.go:691: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1564100376-15-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned:
		stderr:
		
		stdout:
		I190726 03:59:30.358940 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		: signal: killed

cockroach-teamcity · 2019-07-27T15:00:27Z

SHA: https://github.com/cockroachdb/cockroach/commits/cfdaadc3514e7e8660f6c009ba159fdfd604f0a8

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1409070&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190727-1409070/clearrange/checks=false/run_1
	cluster.go:1726,clearrange.go:56,clearrange.go:35,test_runner.go:691: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1564208378-15-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned:
		stderr:
		
		stdout:
		I190727 10:21:58.287597 1 ccl/workloadccl/fixture.go:316  starting import of 1 tables
		Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 1
		: exit status 1

cockroach-teamcity · 2019-07-30T16:54:29Z

SHA: https://github.com/cockroachdb/cockroach/commits/65055d6c16bf9386d8c4f4f9cd23e0a848814dc9

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1411157&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190730-1411157/clearrange/checks=false/run_1
	test_runner.go:706: test timed out (6h30m0s)
	cluster.go:1788,cluster.go:1807,cluster.go:1911,clearrange.go:110,clearrange.go:159,cluster.go:2069,errgroup.go:57: context canceled
	cluster.go:2090,clearrange.go:187,clearrange.go:35,test_runner.go:691: Goexit() was called

cockroach-teamcity · 2019-07-31T17:33:22Z

SHA: https://github.com/cockroachdb/cockroach/commits/92fef12128c997233d985d1c19e11faac005073f

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1413388&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190731-1413388/clearrange/checks=false/run_1
	test_runner.go:706: test timed out (6h30m0s)
	cluster.go:2090,clearrange.go:187,clearrange.go:35,test_runner.go:691: context canceled

cockroach-teamcity · 2019-08-01T14:58:19Z

SHA: https://github.com/cockroachdb/cockroach/commits/da56c792e968574b8f1d9ef3fdb45d56a530221a

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1415578&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190801-1415578/clearrange/checks=false/run_1
	cluster.go:1726,clearrange.go:56,clearrange.go:35,test_runner.go:691: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1564640260-17-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned:
		stderr:
		
		stdout:
		I190801 10:32:20.045149 1 ccl/workloadccl/fixture.go:316  starting import of 1 tables
		Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 1
		: exit status 1

cockroach-teamcity · 2019-08-02T10:33:02Z

SHA: https://github.com/cockroachdb/cockroach/commits/5bd37e8eb58ca66b9293c234bc572411057fec3a

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1417287&tab=buildLog

The test failed on branch=provisional_201908012151_v19.2.0-alpha.20190729, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190802-1417287/clearrange/checks=false/run_1
	cluster.go:2090,clearrange.go:187,clearrange.go:35,test_runner.go:691: dial tcp 104.154.157.162:26257: connect: connection refused

cockroach-teamcity · 2019-08-02T16:06:04Z

SHA: https://github.com/cockroachdb/cockroach/commits/175c5ada040fd0cbbf178636b1c551d5c2229ec4

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1417597&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190802-1417597/clearrange/checks=false/run_1
	test_runner.go:706: test timed out (6h30m0s)
	cluster.go:1726,clearrange.go:56,clearrange.go:35,test_runner.go:691: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1564726582-16-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned:
		stderr:
		
		stdout:
		I190802 09:31:09.129042 1 ccl/workloadccl/fixture.go:316  starting import of 1 tables
		: signal: killed

cockroach-teamcity · 2019-08-05T14:32:51Z

SHA: https://github.com/cockroachdb/cockroach/commits/3b9a95bd7eb2cfa6d544fe7217852a85ec3b76f4

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1422703&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190805-1422703/clearrange/checks=false/run_1
	cluster.go:1726,clearrange.go:56,clearrange.go:35,test_runner.go:691: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1564984076-17-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned:
		stderr:
		
		stdout:
		I190805 09:30:39.198110 1 ccl/workloadccl/fixture.go:316  starting import of 1 tables
		Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 1
		: exit status 1

cockroach-teamcity · 2019-08-06T16:00:33Z

SHA: https://github.com/cockroachdb/cockroach/commits/3db89b230b0c41e399354cbeb78c1e82c8e30004

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1424320&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190806-1424320/clearrange/checks=false/run_1
	test_runner.go:706: test timed out (6h30m0s)

cockroach-teamcity · 2019-08-08T06:38:22Z

SHA: https://github.com/cockroachdb/cockroach/commits/51a6fdedf0ce1d1329d40d801a7deaf8206b6b07

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1428934&tab=buildLog

The test failed on branch=provisional_201908060405_v19.1.4, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190807-1428934/clearrange/checks=false/run_1
	cluster.go:1735,clearrange.go:56,clearrange.go:35,test_runner.go:691: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1565218672-16-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned:
		stderr:
		
		stdout:
		I190808 03:09:28.999008 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		I190808 06:32:00.807565 14 ccl/workloadccl/fixture.go:516  imported bank (3h22m32s, 0 rows, 0 index entries, 0 B)
		Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 1
		: exit status 1

cockroach-teamcity · 2019-08-13T10:08:25Z

SHA: https://github.com/cockroachdb/cockroach/commits/51a6fdedf0ce1d1329d40d801a7deaf8206b6b07

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1436116&tab=buildLog

The test failed on branch=provisional_201908060405_v19.1.4, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190812-1436116/clearrange/checks=false/run_1
	test_runner.go:706: test timed out (6h30m0s)
	cluster.go:2099,clearrange.go:187,clearrange.go:35,test_runner.go:691: context canceled

cockroach-teamcity · 2019-08-20T16:15:01Z

SHA: https://github.com/cockroachdb/cockroach/commits/01ee0704865391599abef3bbc89f462117f8007a

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1445527&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190820-1445527/clearrange/checks=false/run_1
	test_runner.go:688: test timed out (6h30m0s)

petermattis · 2019-09-13T13:48:27Z

I backported cockroachdb/rocksdb#43 and cockroachdb/rocksdb#42 to RocksDB 5.17.2 and the import is chugging along now. #43 in particular could explain the consistency checker issue on 5.17.2. I'll need to run this a bunch of times to be sure, though.

nvanbenschoten · 2019-09-13T15:09:51Z

I backported cockroachdb/rocksdb#43 and cockroachdb/rocksdb#42 to RocksDB 5.17.2 and the import is chugging along now.

It's possible that there was some surprising interaction between ingested ssts and ssts that came originally from the memtable, but note that neither here nor in #40213 did the dropped range deletion tombstone come from an ingested sst.

petermattis · 2019-09-13T20:46:15Z

It's possible that there was some surprising interaction between ingested ssts and ssts that came originally from the memtable, but note that neither here nor in #40213 did the dropped range deletion tombstone come from an ingested sst.

I think I might have confused the consistencyChecker complaining about stats needing to be refreshed with an actual consistency failure.

cockroach-teamcity · 2019-09-15T00:59:22Z

SHA: https://github.com/cockroachdb/cockroach/commits/62b1678f652461bbc1aaf6bc2c0dd03105ce0ebe

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1488785&tab=buildLog

The test failed on branch=40765, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190914-1488785/clearrange/checks=false/run_1
	cluster.go:2114,clearrange.go:187,clearrange.go:35,test_runner.go:688: pq: batch timestamp 1568507626.955369258,0 must be after replica GC threshold 1568508464.174751679,0

cockroach-teamcity · 2019-09-16T02:37:44Z

SHA: https://github.com/cockroachdb/cockroach/commits/62b1678f652461bbc1aaf6bc2c0dd03105ce0ebe

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1489712&tab=buildLog

The test failed on branch=40765, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190915-1489712/clearrange/checks=false/run_1
	test_runner.go:703: test timed out (6h30m0s)

Picks up cockroachdb/rocksdb#56. Release justification: This feature can cause a corruption where keys deleted by range tombstones reappear (see cockroachdb#38772 and cockroachdb#40213), so it's important we revert it. Release note: None

40899: c-deps: bump rocksdb to revert compaction snapshot refresh r=ajkr a=ajkr Picks up cockroachdb/rocksdb#56. Release justification: This feature can cause a corruption where keys deleted by range tombstones reappear (see #38772 and #40213), so it's important we revert it. Release note: None Co-authored-by: Andrew Kryczka <[email protected]>

ajkr · 2019-09-19T01:42:55Z

Thanks for narrowing down the consistency check failure to a range deletion bug, @nvanbenschoten. I never would have figured that out.

nvanbenschoten · 2019-09-19T02:03:09Z

Thank you for taking it the rest of the way! It feels good to knock down two release blockers with one stone.

We've seen instability recently due to invariants being violated as replicas catch up across periods of being removed and re-added to a range. Due to learner replicas and their rollback behavior this is now a relatively common case. Rather than handle all of these various scenarios this PR prevents them from occuring by actively removing replicas when we determine that they must have been removed. Here's a high level overview of the change: * Once a Replica object has a non-zero Replica.mu.replicaID it will not change. * In this commit however, if a node crashes it may forget that it learned about a replica ID. * If a raft message or snapshot addressed to a higher replica ID is received the current replica will be removed completely. * If a replica sees a ChangeReplicasTrigger which removes it then it completely removes itself while applying that command. * Replica.mu.destroyStatus is used to meaningfully signify the removal state of a Replica. Replicas about to be synchronously removed are in destroyReasonRemovalPending. This hopefully gives us some new invariants: * There is only ever at most 1 Replica which IsAlive() for a range on a Store at a time. * Once a Replica has a non-zero ReplicaID is never changes. * This applies only to the in-memory object, not the store itself. * Once a Replica applies a command as a part of the range descriptor it will never apply another command as a different Replica ID or outside of the Range. * Corrolary: a Replica created as a learner will only ever apply commands while that replica is in the range. The change also introduces some new complexity. Namely we now allow removal of uninitialized replicas, including their hard state. This allows us to catch up across a split even when we know the RHS must have been removed. Fixes cockroachdb#40367. Issue cockroachdb#38772 (comment) manifests itself as the RHS not being found for a merge. This happens because the Replica is processing commands to catch itself up while it is not in the range. This is no longer possible. Fixes cockroachdb#40257. Issue cockroachdb#40257 is another case of a replica processing commands while it is not in the range. Fixes cockroachdb#40470. Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal prior to a LHS processing a split. This case is now handled properly and is tested. Release justification: This commit is safe for 19.2 because it fixes release blockers. Release note (bug fix): Fix crashes by preventing replica ID change.

We've seen instability recently due to invariants being violated as replicas catch up across periods of being removed and re-added to a range. Due to learner replicas and their rollback behavior this is now a relatively common case. Rather than handle all of these various scenarios this PR prevents them from occuring by actively removing replicas when we determine that they must have been removed. Here's a high level overview of the change: * Once a Replica object has a non-zero Replica.mu.replicaID it will not change. * In this commit however, if a node crashes it may forget that it learned about a replica ID. * If a raft message or snapshot addressed to a higher replica ID is received the current replica will be removed completely. * If a replica sees a ChangeReplicasTrigger which removes it then it completely removes itself while applying that command. * Replica.mu.destroyStatus is used to meaningfully signify the removal state of a Replica. Replicas about to be synchronously removed are in destroyReasonRemovalPending. This hopefully gives us some new invariants: * There is only ever at most 1 Replica which IsAlive() for a range on a Store at a time. * Once a Replica has a non-zero ReplicaID is never changes. * This applies only to the in-memory object, not the store itself. * Once a Replica applies a command as a part of the range descriptor it will never apply another command as a different Replica ID or outside of the Range. * Corrolary: a Replica created as a learner will only ever apply commands while that replica is in the range. The change also introduces some new complexity. Namely we now allow removal of uninitialized replicas, including their hard state. This allows us to catch up across a split even when we know the RHS must have been removed. Fixes cockroachdb#40367. Issue cockroachdb#38772 (comment) manifests itself as the RHS not being found for a merge. This happens because the Replica is processing commands to catch itself up while it is not in the range. This is no longer possible. Fixes cockroachdb#40257. Issue cockroachdb#40257 is another case of a replica processing commands while it is not in the range. Fixes cockroachdb#40470. Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal prior to a LHS processing a split. This case is now handled properly and is tested. Release justification: This commit is safe for 19.2 because it fixes release blockers. Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes: cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS" cockroachdb#39796 "replica descriptor of local store not found in right hand side of split" cockroachdb#40470 "split trigger found right-hand side with tombstone" cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"

cockroach-teamcity added this to the 19.2 milestone Jul 9, 2019

cockroach-teamcity assigned andreimatei Jul 9, 2019

cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels Jul 9, 2019

nvanbenschoten assigned dt and ajkr and unassigned andreimatei Jul 9, 2019

nvanbenschoten mentioned this issue Jul 9, 2019

roachtest: clearrange/checks=true failed #38720

Closed

nvanbenschoten mentioned this issue Aug 2, 2019

release: v19.2.0-alpha.20190805 #39036

Closed

14 tasks

ajkr mentioned this issue Sep 18, 2019

c-deps: bump rocksdb to revert compaction snapshot refresh #40899

Merged

nvanbenschoten closed this as completed Sep 19, 2019

nvanbenschoten mentioned this issue Mar 25, 2020

roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed #45820

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roachtest: clearrange/checks=false failed #38772

roachtest: clearrange/checks=false failed #38772

cockroach-teamcity commented Jul 9, 2019

nvanbenschoten commented Jul 9, 2019

cockroach-teamcity commented Jul 19, 2019

cockroach-teamcity commented Jul 20, 2019

cockroach-teamcity commented Jul 22, 2019

cockroach-teamcity commented Jul 23, 2019

cockroach-teamcity commented Jul 24, 2019

cockroach-teamcity commented Jul 25, 2019

cockroach-teamcity commented Jul 26, 2019

cockroach-teamcity commented Jul 27, 2019

cockroach-teamcity commented Jul 30, 2019

cockroach-teamcity commented Jul 31, 2019

cockroach-teamcity commented Aug 1, 2019

cockroach-teamcity commented Aug 2, 2019

cockroach-teamcity commented Aug 2, 2019

cockroach-teamcity commented Aug 5, 2019

cockroach-teamcity commented Aug 6, 2019

cockroach-teamcity commented Aug 8, 2019

cockroach-teamcity commented Aug 13, 2019

cockroach-teamcity commented Aug 20, 2019

petermattis commented Sep 13, 2019

nvanbenschoten commented Sep 13, 2019

petermattis commented Sep 13, 2019

cockroach-teamcity commented Sep 15, 2019

cockroach-teamcity commented Sep 16, 2019

ajkr commented Sep 19, 2019

nvanbenschoten commented Sep 19, 2019

roachtest: clearrange/checks=false failed #38772

roachtest: clearrange/checks=false failed #38772

Comments

cockroach-teamcity commented Jul 9, 2019

nvanbenschoten commented Jul 9, 2019

cockroach-teamcity commented Jul 19, 2019

cockroach-teamcity commented Jul 20, 2019

cockroach-teamcity commented Jul 22, 2019

cockroach-teamcity commented Jul 23, 2019

cockroach-teamcity commented Jul 24, 2019

cockroach-teamcity commented Jul 25, 2019

cockroach-teamcity commented Jul 26, 2019

cockroach-teamcity commented Jul 27, 2019

cockroach-teamcity commented Jul 30, 2019

cockroach-teamcity commented Jul 31, 2019

cockroach-teamcity commented Aug 1, 2019

cockroach-teamcity commented Aug 2, 2019

cockroach-teamcity commented Aug 2, 2019

cockroach-teamcity commented Aug 5, 2019

cockroach-teamcity commented Aug 6, 2019

cockroach-teamcity commented Aug 8, 2019

cockroach-teamcity commented Aug 13, 2019

cockroach-teamcity commented Aug 20, 2019

petermattis commented Sep 13, 2019

nvanbenschoten commented Sep 13, 2019

petermattis commented Sep 13, 2019

cockroach-teamcity commented Sep 15, 2019

cockroach-teamcity commented Sep 16, 2019

ajkr commented Sep 19, 2019

nvanbenschoten commented Sep 19, 2019