Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: scaledata/distributed_semaphore/nodes=6 failed #41735

Closed
cockroach-teamcity opened this issue Oct 19, 2019 · 18 comments
Closed

roachtest: scaledata/distributed_semaphore/nodes=6 failed #41735

cockroach-teamcity opened this issue Oct 19, 2019 · 18 comments
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/fb8e58c062970c0e49c21b16b2be0af83ca7ee54

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/distributed_semaphore/nodes=6 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1546763&tab=artifacts#/scaledata/distributed_semaphore/nodes=6

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191019-1546763/scaledata/distributed_semaphore/nodes=6/run_1
	cluster.go:2159,scaledata.go:121,scaledata.go:48,test_runner.go:689: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1546763-1571461908-63-n7cpu4:7 -- ./distributed_semaphore  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.169:26257,10.128.0.157:26257,10.128.0.170:26257,10.128.0.156:26257,10.128.0.155:26257,10.128.0.171:26257'  returned:
		stderr:
		54396833 +0000 UTC m=+88.455372790, took 4.053836998s
		2019/10/19 09:26:45 [/Users/nathan/Go/src/github.com/scaledata/rksql/src/go/src/rubrik/sqlapp/distributed_semaphore/cockroach.go:54 pq: communication error: rpc error: code = Canceled desc = context canceled]
		2019/10/19 09:26:45 pq error - Error code : 58C01, Error class : 58
		2019/10/19 09:26:45 ExecuteTx retry attempt 1 failed, started at 2019-10-19 09:26:43.683156249 +0000 UTC m=+86.784132196, now = 2019-10-19 09:26:45.354440368 +0000 UTC m=+88.455416341, took 1.671284145s
		2019/10/19 09:26:45 pq error - Error code : 58C01, Error class : 58
		2019/10/19 09:26:45 pq error - Error code : 58C01, Error class : 58
		2019/10/19 09:26:45 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/10/19 09:26:45 postgres error code is 58C01 and class is 58
		2019/10/19 09:26:45 pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 255
		
		stdout:
		rt transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1571477116.912140917,0 encountered previous write with future timestamp 1571477116.924516584,0 within uncertainty interval `t <= 1571477116.926851286,0`; observed timestamps: [{2 1571477116.912140917,0} {6 1571477116.926851286,0}]
		pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1571477116.910506567,0 encountered previous write with future timestamp 1571477116.914352153,0 within uncertainty interval `t <= 1571477116.922464279,0`; observed timestamps: [{1 1571477116.910506567,0} {6 1571477116.922464279,0}]
		pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1571477116.914617385,0 encountered previous write with future timestamp 1571477116.925127015,11 within uncertainty interval `t <= 1571477116.927919347,0`; observed timestamps: [{4 1571477116.914617385,0} {6 1571477116.927919347,0}]
		: exit status 1

@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels Oct 19, 2019
@cockroach-teamcity cockroach-teamcity added this to the 19.2 milestone Oct 19, 2019
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/f9a102814bdce90d687f6215acadf10a9d784c29

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/distributed_semaphore/nodes=6 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1555992&tab=artifacts#/scaledata/distributed_semaphore/nodes=6

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191024-1555992/scaledata/distributed_semaphore/nodes=6/run_1
	cluster.go:2159,scaledata.go:121,scaledata.go:48,test_runner.go:689: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1555992-1571899205-63-n7cpu4:7 -- ./distributed_semaphore  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.101:26257,10.128.0.114:26257,10.128.0.165:26257,10.128.0.115:26257,10.128.0.224:26257,10.128.0.216:26257'  returned:
		stderr:
		57:38 ExecuteTx retry attempt 1 failed, started at 2019-10-24 10:57:35.026759843 +0000 UTC m=+84.339935750, now = 2019-10-24 10:57:38.895245609 +0000 UTC m=+88.208421539, took 3.868485789s
		2019/10/24 10:57:38 pq error - Error code : 58C01, Error class : 58
		2019/10/24 10:57:38 pq error - Error code : 58C01, Error class : 58
		2019/10/24 10:57:38 pq error - Error code : 58C01, Error class : 58
		2019/10/24 10:57:38 pq error - Error code : 58C01, Error class : 58
		2019/10/24 10:57:38 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/10/24 10:57:38 postgres error code is 58C01 and class is 58
		2019/10/24 10:57:38 [/Users/nathan/Go/src/github.com/scaledata/rksql/src/go/src/rubrik/sqlapp/distributed_semaphore/cockroach.go:54 pq: communication error: rpc error: code = Canceled desc = context canceled]
		2019/10/24 10:57:38 pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 255
		
		stdout:
		pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1571914570.702388315,0 encountered previous write with future timestamp 1571914570.702388315,1 within uncertainty interval `t <= 1571914570.709815902,0`; observed timestamps: [{5 1571914570.709815902,0} {6 1571914570.702388315,0}]
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/1a940ddc06876a1d6511e614391fcffcbe42f664

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/distributed_semaphore/nodes=6 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1557715&tab=artifacts#/scaledata/distributed_semaphore/nodes=6

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191025-1557715/scaledata/distributed_semaphore/nodes=6/run_1
	cluster.go:2159,scaledata.go:121,scaledata.go:48,test_runner.go:689: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1557715-1571984935-57-n7cpu4:7 -- ./distributed_semaphore  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.48:26257,10.128.0.164:26257,10.128.0.19:26257,10.128.0.61:26257,10.128.0.87:26257,10.128.0.173:26257'  returned:
		stderr:
		lapp/distributed_semaphore/cockroach.go:54 pq: communication error: rpc error: code = Canceled desc = context canceled]
		2019/10/25 11:42:17 [/Users/nathan/Go/src/github.com/scaledata/rksql/src/go/src/rubrik/sqlapp/distributed_semaphore/cockroach.go:54 pq: communication error: rpc error: code = Canceled desc = context canceled]
		2019/10/25 11:42:17 ExecuteTx retry attempt 1 failed, started at 2019-10-25 11:42:16.234641992 +0000 UTC m=+573.913388395, now = 2019-10-25 11:42:17.574547451 +0000 UTC m=+575.253293863, took 1.339905468s
		2019/10/25 11:42:17 pq error - Error code : 58C01, Error class : 58
		2019/10/25 11:42:17 pq error - Error code : 58C01, Error class : 58
		2019/10/25 11:42:17 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/10/25 11:42:17 postgres error code is 58C01 and class is 58
		2019/10/25 11:42:17 pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 255
		
		stdout:
		pq: restart transaction: TransactionRetryWithProtoRefreshError: WriteTooOldError: write at timestamp 1572003162.336070817,0 too old; wrote at 1572003162.349534796,1
		pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1572003162.336122098,0 encountered previous write with future timestamp 1572003162.336122098,1 within uncertainty interval `t <= 1572003162.341096973,3`; observed timestamps: [{2 1572003162.341096973,3} {6 1572003162.336122098,0}]
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/33b96613ae532b25a1b6b716453bece9b60ba2d6

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/distributed_semaphore/nodes=6 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1583742&tab=artifacts#/scaledata/distributed_semaphore/nodes=6

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191109-1583742/scaledata/distributed_semaphore/nodes=6/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1583742-1573284576-60-n7cpu4:7 -- ./distributed_semaphore  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.64:26257,10.128.0.58:26257,10.128.0.72:26257,10.128.0.73:26257,10.128.0.69:26257,10.128.0.62:26257'  returned:
		stderr:
		pt failed with error dial tcp 10.128.0.69:26257: connect: connection refused: ... Retrying after sleeping 10ns
		2019/11/09 11:54:17 RobustDB.RandomDB chose DB at index 5
		2019/11/09 11:54:17 [/Users/nathan/Go/src/github.com/scaledata/rksql/src/go/src/rubrik/sqlapp/tx.go:43 dial tcp 10.128.0.69:26257: connect: connection refused]
		2019/11/09 11:54:17 ExecuteTx retry attempt 1 failed, started at 2019-11-09 11:54:16.741390452 +0000 UTC m=+209.771064911, now = 2019-11-09 11:54:17.123082601 +0000 UTC m=+210.152757079, took 381.692168ms
		2019/11/09 11:54:17 pq error - Error code : 58C01, Error class : 58
		2019/11/09 11:54:17 pq error - Error code : 58C01, Error class : 58
		2019/11/09 11:54:17 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/11/09 11:54:17 postgres error code is 58C01 and class is 58
		2019/11/09 11:54:17 pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 255
		
		stdout:
		pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1573300246.994398836,0 encountered previous write with future timestamp 1573300246.994398836,1 within uncertainty interval `t <= 1573300247.002134503,0`; observed timestamps: [{1 1573300246.994398836,0} {6 1573300247.002134503,0}]
		pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1573300246.983691538,0 encountered previous write with future timestamp 1573300247.004295815,1 within uncertainty interval `t <= 1573300247.007930777,0`; observed timestamps: [{3 1573300246.983691538,0} {6 1573300247.007930777,0}]
		pq: restart transaction: TransactionRetryWithProtoRefreshError: WriteTooOldError: write at timestamp 1573300246.983226073,0 too old; wrote at 1573300247.007412140,2
		pq: restart transaction: TransactionRetryWithProtoRefreshError: WriteTooOldError: write at timestamp 1573300247.004659257,0 too old; wrote at 1573300247.007412140,2
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/35e138aa3c2be545fb4e17a85ea6f1b8d6525e53

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/distributed_semaphore/nodes=6 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1584763&tab=artifacts#/scaledata/distributed_semaphore/nodes=6

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191110-1584763/scaledata/distributed_semaphore/nodes=6/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1584763-1573370983-60-n7cpu4:7 -- ./distributed_semaphore  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.21:26257,10.128.0.39:26257,10.128.0.23:26257,10.128.0.22:26257,10.128.0.55:26257,10.128.0.25:26257'  returned:
		stderr:
		led]
		2019/11/10 12:04:12 [/Users/nathan/Go/src/github.com/scaledata/rksql/src/go/src/rubrik/sqlapp/distributed_semaphore/cockroach.go:54 pq: communication error: rpc error: code = Canceled desc = context canceled]
		2019/11/10 12:04:12 ExecuteTx retry attempt 1 failed, started at 2019-11-10 12:04:11.71887077 +0000 UTC m=+574.778961384, now = 2019-11-10 12:04:12.414454163 +0000 UTC m=+575.474544787, took 695.583403ms
		2019/11/10 12:04:12 pq error - Error code : 58C01, Error class : 58
		2019/11/10 12:04:12 pq error - Error code : 58C01, Error class : 58
		2019/11/10 12:04:12 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/11/10 12:04:12 RobustDB.RandomDB chose DB at index 3
		2019/11/10 12:04:12 RobustDB.RandomDB chose DB at index 5
		2019/11/10 12:04:12 postgres error code is 58C01 and class is 58
		2019/11/10 12:04:12 pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 255
		
		stdout:
		pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1573386876.970138631,32 encountered previous write with future timestamp 1573386876.970138631,58 within uncertainty interval `t <= 1573386876.972268794,26`; observed timestamps: [{2 1573386876.970138631,32} {6 1573386876.972268794,26}]
		pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1573386876.958370686,14 encountered previous write with future timestamp 1573386876.958370686,15 within uncertainty interval `t <= 1573386876.966469026,0`; observed timestamps: [{2 1573386876.958370686,14} {6 1573386876.966469026,0}]
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/0e9dd73f803247cdcfd06f51ce6b23396af1b9f5

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/distributed_semaphore/nodes=6 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1587121&tab=artifacts#/scaledata/distributed_semaphore/nodes=6

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191112-1587121/scaledata/distributed_semaphore/nodes=6/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1587121-1573544175-60-n7cpu4:7 -- ./distributed_semaphore  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.188:26257,10.128.0.216:26257,10.128.0.205:26257,10.128.0.182:26257,10.128.0.186:26257,10.128.0.208:26257'  returned:
		stderr:
		9:33 pq error - Error code : 58C01, Error class : 58
		2019/11/12 11:49:33 pq error - Error code : 58C01, Error class : 58
		2019/11/12 11:49:33 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/11/12 11:49:33 ExecuteTx retry attempt 2 failed, started at 2019-11-12 11:49:33.527557889 +0000 UTC m=+575.399152956, now = 2019-11-12 11:49:33.528167838 +0000 UTC m=+575.399762932, took 609.976µs
		2019/11/12 11:49:33 Attempt failed with error dial tcp 10.128.0.186:26257: connect: connection refused: ... Retrying after sleeping 10ns
		2019/11/12 11:49:33 postgres error code is 58C01 and class is 58
		2019/11/12 11:49:33 pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/11/12 11:49:33 [/Users/nathan/Go/src/github.com/scaledata/rksql/src/go/src/rubrik/sqlapp/distributed_semaphore/cockroach.go:54 pq: communication error: rpc error: code = Canceled desc = context canceled]
		Error:  exit status 255
		
		stdout:
		pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1573558798.151297032,0 encountered previous write with future timestamp 1573558798.151297032,1 within uncertainty interval `t <= 1573558798.151297032,1`; observed timestamps: [{1 1573558798.151297032,0} {4 1573558798.151297032,1}]
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/35e138aa3c2be545fb4e17a85ea6f1b8d6525e53

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/distributed_semaphore/nodes=6 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1587967&tab=artifacts#/scaledata/distributed_semaphore/nodes=6

The test failed on branch=provisional_201911111508_v20.1.0-alpha.20191118, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191112-1587967/scaledata/distributed_semaphore/nodes=6/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1587967-1573572266-63-n7cpu4:7 -- ./distributed_semaphore  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.149:26257,10.128.0.50:26257,10.128.0.151:26257,10.128.0.150:26257,10.128.0.46:26257,10.128.0.45:26257'  returned:
		stderr:
		sqlapp/distributed_semaphore/cockroach.go:54 pq: communication error: rpc error: code = Canceled desc = context canceled]
		2019/11/12 19:47:44 ExecuteTx retry attempt 1 failed, started at 2019-11-12 19:47:41.922754913 +0000 UTC m=+85.963346979, now = 2019-11-12 19:47:44.335833597 +0000 UTC m=+88.376425668, took 2.413078689s
		2019/11/12 19:47:44 pq error - Error code : 58C01, Error class : 58
		2019/11/12 19:47:44 pq error - Error code : 58C01, Error class : 58
		2019/11/12 19:47:44 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/11/12 19:47:44 postgres error code is 58C01 and class is 58
		2019/11/12 19:47:44 pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/11/12 19:47:44 [/Users/nathan/Go/src/github.com/scaledata/rksql/src/go/src/rubrik/sqlapp/distributed_semaphore/cockroach.go:54 pq: communication error: rpc error: code = Canceled desc = context canceled]
		Error:  exit status 255
		
		stdout:
		pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1573587975.991861820,0 encountered previous write with future timestamp 1573587975.991861820,1 within uncertainty interval `t <= 1573587975.997076881,0`; observed timestamps: [{3 1573587975.991861820,0} {4 1573587975.997076881,0}]
		pq: restart transaction: TransactionRetryWithProtoRefreshError: WriteTooOldError: write at timestamp 1573587975.978426187,0 too old; wrote at 1573587975.998434922,2
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/8622fad01478fb4a4f05b5579eb0b8561c02e491

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/distributed_semaphore/nodes=6 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1591079&tab=artifacts#/scaledata/distributed_semaphore/nodes=6

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191114-1591079/scaledata/distributed_semaphore/nodes=6/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1591079-1573717670-64-n7cpu4:7 -- ./distributed_semaphore  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.37:26257,10.128.0.27:26257,10.128.0.32:26257,10.128.0.42:26257,10.128.0.38:26257,10.128.0.43:26257'  returned:
		stderr:
		qlapp/distributed_semaphore/cockroach.go:54 pq: communication error: rpc error: code = Canceled desc = context canceled]
		2019/11/14 12:30:32 ExecuteTx retry attempt 1 failed, started at 2019-11-14 12:30:29.370588771 +0000 UTC m=+449.991334012, now = 2019-11-14 12:30:32.772245738 +0000 UTC m=+453.392990992, took 3.40165698s
		2019/11/14 12:30:32 pq error - Error code : 58C01, Error class : 58
		2019/11/14 12:30:32 [/Users/nathan/Go/src/github.com/scaledata/rksql/src/go/src/rubrik/sqlapp/distributed_semaphore/cockroach.go:54 pq: communication error: rpc error: code = Canceled desc = context canceled]
		2019/11/14 12:30:32 pq error - Error code : 58C01, Error class : 58
		2019/11/14 12:30:32 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/11/14 12:30:32 postgres error code is 58C01 and class is 58
		2019/11/14 12:30:32 pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 255
		
		stdout:
		pq: restart transaction: TransactionRetryWithProtoRefreshError: WriteTooOldError: write at timestamp 1573734179.411047696,0 too old; wrote at 1573734179.413384683,1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/f97dc13163020a032b098ef3eb88e4d9f54a04ba

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/distributed_semaphore/nodes=6 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1613952&tab=artifacts#/scaledata/distributed_semaphore/nodes=6

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191127-1613952/scaledata/distributed_semaphore/nodes=6/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1613952-1574840180-66-n7cpu4:7 -- ./distributed_semaphore  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.166:26257,10.128.0.131:26257,10.128.0.30:26257,10.128.0.186:26257,10.128.0.59:26257,10.128.0.193:26257'  returned:
		stderr:
		/sqlapp/distributed_semaphore/cockroach.go:54 pq: communication error: rpc error: code = Canceled desc = context canceled]
		2019/11/27 11:59:28 [/Users/nathan/Go/src/github.com/scaledata/rksql/src/go/src/rubrik/sqlapp/distributed_semaphore/cockroach.go:54 pq: communication error: rpc error: code = Canceled desc = context canceled]
		2019/11/27 11:59:28 ExecuteTx retry attempt 1 failed, started at 2019-11-27 11:59:28.483607249 +0000 UTC m=+88.414893040, now = 2019-11-27 11:59:28.509978593 +0000 UTC m=+88.441264415, took 26.371375ms
		2019/11/27 11:59:28 pq error - Error code : 58C01, Error class : 58
		2019/11/27 11:59:28 pq error - Error code : 58C01, Error class : 58
		2019/11/27 11:59:28 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/11/27 11:59:28 postgres error code is 58C01 and class is 58
		2019/11/27 11:59:28 pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 255
		
		stdout:
		pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1574855880.106019065,0 encountered previous write with future timestamp 1574855880.108975204,25 within uncertainty interval `t <= 1574855880.111999648,0`; observed timestamps: [{3 1574855880.106019065,0} {5 1574855880.111999648,0}]
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/d3574ad671bd3631e780510235485681720c2b8f

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/distributed_semaphore/nodes=6 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1622074&tab=artifacts#/scaledata/distributed_semaphore/nodes=6

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191203-1622074/scaledata/distributed_semaphore/nodes=6/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1622074-1575358204-61-n7cpu4:7 -- ./distributed_semaphore  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.134:26257,10.128.0.149:26257,10.128.0.140:26257,10.128.0.148:26257,10.128.0.170:26257,10.128.0.141:26257'  returned:
		stderr:
		on error: rpc error: code = Canceled desc = context canceled]
		2019/12/03 11:53:15 [/Users/nathan/Go/src/github.com/scaledata/rksql/src/go/src/rubrik/sqlapp/distributed_semaphore/cockroach.go:54 pq: communication error: rpc error: code = Canceled desc = context canceled]
		2019/12/03 11:53:15 RobustDB.RandomDB chose DB at index 1
		2019/12/03 11:53:15 ExecuteTx retry attempt 1 failed, started at 2019-12-03 11:53:13.528221996 +0000 UTC m=+208.388155720, now = 2019-12-03 11:53:15.234882297 +0000 UTC m=+210.094816039, took 1.706660319s
		2019/12/03 11:53:15 pq error - Error code : 58C01, Error class : 58
		2019/12/03 11:53:15 pq error - Error code : 58C01, Error class : 58
		2019/12/03 11:53:15 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/12/03 11:53:15 postgres error code is 58C01 and class is 58
		2019/12/03 11:53:15 pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 255
		
		stdout:
		e timestamp 1575373785.156921042,5 within uncertainty interval `t <= 1575373785.165647113,0`; observed timestamps: [{2 1575373785.156921042,4} {6 1575373785.165647113,0}]
		pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1575373785.153553669,0 encountered previous write with future timestamp 1575373785.156921042,5 within uncertainty interval `t <= 1575373785.165559508,0`; observed timestamps: [{2 1575373785.153553669,0} {6 1575373785.165559508,0}]
		pq: restart transaction: TransactionRetryWithProtoRefreshError: WriteTooOldError: write at timestamp 1575373785.165400297,0 too old; wrote at 1575373785.170913319,2
		pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1575373785.154514520,0 encountered previous write with future timestamp 1575373785.156921042,5 within uncertainty interval `t <= 1575373785.169207314,5`; observed timestamps: [{3 1575373785.154514520,0} {6 1575373785.169207314,5}]
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/1da69d917105a0280aad10e86a7ee8eb2059cc92

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/distributed_semaphore/nodes=6 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1623285&tab=artifacts#/scaledata/distributed_semaphore/nodes=6

The test failed on branch=provisional_201912031738_v20.1.0-alpha20191209, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191203-1623285/scaledata/distributed_semaphore/nodes=6/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1623285-1575402335-63-n7cpu4:7 -- ./distributed_semaphore  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.197:26257,10.128.0.196:26257,10.128.0.182:26257,10.128.0.201:26257,10.128.0.68:26257,10.128.0.200:26257'  returned:
		stderr:
		 : 58C01, Error class : 58
		2019/12/04 00:24:58 pq error - Error code : 58C01, Error class : 58
		2019/12/04 00:24:58 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/12/04 00:24:58 postgres error code is 58C01 and class is 58
		2019/12/04 00:24:58 pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/12/04 00:24:58 ExecuteTx retry attempt 1 failed, started at 2019-12-04 00:24:54.288770495 +0000 UTC m=+205.826868670, now = 2019-12-04 00:24:58.601380564 +0000 UTC m=+210.139478749, took 4.312610079s
		2019/12/04 00:24:58 pq error - Error code : 58C01, Error class : 58
		2019/12/04 00:24:58 pq error - Error code : 58C01, Error class : 58
		2019/12/04 00:24:58 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/12/04 00:24:58 postgres error code is 58C01 and class is 58
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/ed717cbaf741e3a32c76db25b16a59dc2a8221d7

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/distributed_semaphore/nodes=6 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1624103&tab=artifacts#/scaledata/distributed_semaphore/nodes=6

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191204-1624103/scaledata/distributed_semaphore/nodes=6/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1624103-1575445328-63-n7cpu4:7 -- ./distributed_semaphore  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.15:26257,10.128.0.114:26257,10.128.0.9:26257,10.128.0.102:26257,10.128.0.103:26257,10.128.0.7:26257'  returned:
		stderr:
		ad connection: ... Retrying after sleeping 5ns
		2019/12/04 12:19:43 [/Users/nathan/Go/src/github.com/scaledata/rksql/src/go/src/rubrik/sqlapp/tx.go:37 driver: bad connection]
		2019/12/04 12:19:43 ExecuteTx retry attempt 1 failed, started at 2019-12-04 12:19:43.543081318 +0000 UTC m=+453.707661418, now = 2019-12-04 12:19:43.591053183 +0000 UTC m=+453.755633303, took 47.971885ms
		2019/12/04 12:19:43 [/Users/nathan/Go/src/github.com/scaledata/rksql/src/go/src/rubrik/sqlapp/distributed_semaphore/cockroach.go:54 driver: bad connection]
		2019/12/04 12:19:43 pq error - Error code : 58C01, Error class : 58
		2019/12/04 12:19:43 pq error - Error code : 58C01, Error class : 58
		2019/12/04 12:19:43 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/12/04 12:19:43 postgres error code is 58C01 and class is 58
		2019/12/04 12:19:43 pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 255
		
		stdout:
		pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1575461529.848758404,4 encountered previous write with future timestamp 1575461529.855554057,0 within uncertainty interval `t <= 1575461529.856737568,0`; observed timestamps: [{2 1575461529.848758404,4} {6 1575461529.856737568,0}]
		: exit status 1

@tbg
Copy link
Member

tbg commented Dec 4, 2019

	2019/12/04 12:19:43 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled

consistently

This test runs chaos so it could come out of that.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/28f216e1bd53da872a759a98779144c7f70f33a3

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/distributed_semaphore/nodes=6 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1629573&tab=artifacts#/scaledata/distributed_semaphore/nodes=6

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191206-1629573/scaledata/distributed_semaphore/nodes=6/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1629573-1575617269-66-n7cpu4:7 -- ./distributed_semaphore  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.115:26257,10.128.0.119:26257,10.128.0.114:26257,10.128.0.117:26257,10.128.0.122:26257,10.128.0.120:26257'  returned:
		stderr:
		lapp/distributed_semaphore/cockroach.go:54 pq: communication error: rpc error: code = Canceled desc = context canceled]
		2019/12/06 12:01:54 [/Users/nathan/Go/src/github.com/scaledata/rksql/src/go/src/rubrik/sqlapp/distributed_semaphore/cockroach.go:54 pq: communication error: rpc error: code = Canceled desc = context canceled]
		2019/12/06 12:01:54 ExecuteTx retry attempt 1 failed, started at 2019-12-06 12:01:53.918703967 +0000 UTC m=+574.464880001, now = 2019-12-06 12:01:54.761160416 +0000 UTC m=+575.307336476, took 842.456475ms
		2019/12/06 12:01:54 pq error - Error code : 58C01, Error class : 58
		2019/12/06 12:01:54 pq error - Error code : 58C01, Error class : 58
		2019/12/06 12:01:54 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/12/06 12:01:54 postgres error code is 58C01 and class is 58
		2019/12/06 12:01:54 pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 255
		
		stdout:
		pq: restart transaction: TransactionRetryWithProtoRefreshError: WriteTooOldError: write at timestamp 1575633139.484170550,0 too old; wrote at 1575633139.484170550,2
		pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1575633139.478526839,5 encountered previous write with future timestamp 1575633139.482890775,8 within uncertainty interval `t <= 1575633139.485940878,15`; observed timestamps: [{4 1575633139.485940878,15} {5 1575633139.478526839,5}]
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/9d697080f227324220c09d1b63da7bbf969133ac

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/distributed_semaphore/nodes=6 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1633532&tab=artifacts#/scaledata/distributed_semaphore/nodes=6

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191209-1633532/scaledata/distributed_semaphore/nodes=6/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1633532-1575876499-63-n7cpu4:7 -- ./distributed_semaphore  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.117:26257,10.128.0.14:26257,10.128.0.11:26257,10.128.0.145:26257,10.128.0.168:26257,10.128.0.174:26257'  returned:
		stderr:
		7055, took 333.776654ms
		2019/12/09 11:58:40 pq error - Error code : 58C01, Error class : 58
		2019/12/09 11:58:40 pq error - Error code : 58C01, Error class : 58
		2019/12/09 11:58:40 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/12/09 11:58:40 postgres error code is 58C01 and class is 58
		2019/12/09 11:58:40 pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/12/09 11:58:40 ExecuteTx retry attempt 1 failed, started at 2019-12-09 11:58:39.557758391 +0000 UTC m=+574.440067684, now = 2019-12-09 11:58:40.173596387 +0000 UTC m=+575.055905732, took 615.838048ms
		2019/12/09 11:58:40 pq error - Error code : 58C01, Error class : 58
		2019/12/09 11:58:40 pq error - Error code : 58C01, Error class : 58
		2019/12/09 11:58:40 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 255
		
		stdout:
		pq: restart transaction: TransactionRetryWithProtoRefreshError: WriteTooOldError: write at timestamp 1575892145.132372051,8 too old; wrote at 1575892145.132714009,7
		pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1575892145.132062574,5 encountered previous write with future timestamp 1575892145.143971776,0 within uncertainty interval `t <= 1575892145.143971776,10`; observed timestamps: [{2 1575892145.132062574,5} {4 1575892145.143971776,10}]
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/distributed_semaphore/nodes=6 failed on master@e81faed54ee90bdfed1dddc63bb13d1ecf8806da:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191212-1640161/scaledata/distributed_semaphore/nodes=6/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:700: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1640161-1576136845-58-n7cpu4:7 -- ./distributed_semaphore  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.125:26257,10.128.0.118:26257,10.128.1.29:26257,10.128.1.28:26257,10.128.0.177:26257,10.128.0.19:26257'  returned:
		stderr:
		sql/src/go/src/rubrik/sqlapp/tx.go:37 driver: bad connection]
		2019/12/12 11:43:01 RobustDB.RandomDB chose DB at index 1
		2019/12/12 11:43:01 [/Users/nathan/Go/src/github.com/scaledata/rksql/src/go/src/rubrik/sqlapp/distributed_semaphore/cockroach.go:54 pq: communication error: rpc error: code = Canceled desc = context canceled]
		2019/12/12 11:43:01 ExecuteTx retry attempt 1 failed, started at 2019-12-12 11:43:00.571392582 +0000 UTC m=+330.542542357, now = 2019-12-12 11:43:01.978869762 +0000 UTC m=+331.950019610, took 1.407477253s
		2019/12/12 11:43:01 pq error - Error code : 58C01, Error class : 58
		2019/12/12 11:43:01 pq error - Error code : 58C01, Error class : 58
		2019/12/12 11:43:01 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/12/12 11:43:01 postgres error code is 58C01 and class is 58
		2019/12/12 11:43:01 pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 255
		
		stdout:
		pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1576150650.053859330,16 encountered previous write with future timestamp 1576150650.053859330,17 within uncertainty interval `t <= 1576150650.054588877,30`; observed timestamps: [{5 1576150650.054588877,30} {6 1576150650.053859330,16}]
		pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1576150650.040452609,0 encountered previous write with future timestamp 1576150650.060662879,6 within uncertainty interval `t <= 1576150650.062982971,87`; observed timestamps: [{3 1576150650.040452609,0} {5 1576150650.062982971,87}]
		: exit status 1

details

Artifacts: /scaledata/distributed_semaphore/nodes=6

make stressrace TESTS=scaledata/distributed_semaphore/nodes=6 PKG=./pkg/roachtest TESTTIMEOUT=5m STRESSFLAGS=-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/distributed_semaphore/nodes=6 failed on master@3078ef8af2f797d77869af469964a320c28c395e:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191215-1644115/scaledata/distributed_semaphore/nodes=6/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:700: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1644115-1576395566-63-n7cpu4:7 -- ./distributed_semaphore  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.143:26257,10.128.0.136:26257,10.128.0.139:26257,10.128.0.138:26257,10.128.0.142:26257,10.128.0.135:26257'  returned:
		stderr:
		,1 min=1576411303.856955284,0 seq=5} rw=true stat=PENDING rts=1576411304.721234870,1 wto=false max=1576411303.858944629,0]
		2019/12/15 12:01:45 [/Users/nathan/Go/src/github.com/scaledata/rksql/src/go/src/rubrik/sqlapp/distributed_semaphore/cockroach.go:54 pq: communication error: rpc error: code = Canceled desc = context canceled]
		2019/12/15 12:01:45 ExecuteTx retry attempt 1 failed, started at 2019-12-15 12:01:43.538558964 +0000 UTC m=+86.470426925, now = 2019-12-15 12:01:45.073702422 +0000 UTC m=+88.005570455, took 1.53514353s
		2019/12/15 12:01:45 pq error - Error code : 58C01, Error class : 58
		2019/12/15 12:01:45 pq error - Error code : 58C01, Error class : 58
		2019/12/15 12:01:45 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/12/15 12:01:45 postgres error code is 58C01 and class is 58
		2019/12/15 12:01:45 pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 255
		
		stdout:
		pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1576411217.104407009,0 encountered previous write with future timestamp 1576411217.104407009,1 within uncertainty interval `t <= 1576411217.104658752,15`; observed timestamps: [{5 1576411217.104407009,0} {6 1576411217.104658752,15}]
		: exit status 1
Repro

Artifacts: /scaledata/distributed_semaphore/nodes=6

make stressrace TESTS=scaledata/distributed_semaphore/nodes=6 PKG=./pkg/roachtest TESTTIMEOUT=5m STRESSFLAGS=-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/distributed_semaphore/nodes=6 failed on master@beb69e089b6eadc0fde6c92eb533b08d248938fe:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191216-1644933/scaledata/distributed_semaphore/nodes=6/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:700: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1644933-1576481346-64-n7cpu4:7 -- ./distributed_semaphore  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.58:26257,10.128.0.92:26257,10.128.0.59:26257,10.128.0.5:26257,10.128.0.68:26257,10.128.0.73:26257'  returned:
		stderr:
		go/src/rubrik/sqlapp/distributed_semaphore/cockroach.go:54 pq: communication error: rpc error: code = Canceled desc = context canceled]
		2019/12/16 11:50:00 [/Users/nathan/Go/src/github.com/scaledata/rksql/src/go/src/rubrik/sqlapp/distributed_semaphore/cockroach.go:54 pq: rpc error: code = Unavailable desc = transport is closing]
		2019/12/16 11:50:00 ExecuteTx retry attempt 1 failed, started at 2019-12-16 11:50:00.74052254 +0000 UTC m=+331.356257707, now = 2019-12-16 11:50:00.854842927 +0000 UTC m=+331.470578137, took 114.32043ms
		2019/12/16 11:50:00 pq error - Error code : 58C01, Error class : 58
		2019/12/16 11:50:00 pq error - Error code : 58C01, Error class : 58
		2019/12/16 11:50:00 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/12/16 11:50:00 postgres error code is 58C01 and class is 58
		2019/12/16 11:50:00 pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 255
		
		stdout:
		: exit status 1
Repro

Artifacts: /scaledata/distributed_semaphore/nodes=6

make stressrace TESTS=scaledata/distributed_semaphore/nodes=6 PKG=./pkg/roachtest TESTTIMEOUT=5m STRESSFLAGS=-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

nvanbenschoten added a commit to nvanbenschoten/rksql that referenced this issue Dec 17, 2019
Fixes cockroachdb/cockroach#36981.
Fixes cockroachdb/cockroach#39618.
Fixes cockroachdb/cockroach#40552.
Fixes cockroachdb/cockroach#41735.

cockroachdb/cockroach#41451 switched two forms
of errors that can be thrown during chaos events over to a new error code
class - 58, internal system errors. This commit updates `pqConnectionError`
to consider this error code class as retry-worthy.
@nvanbenschoten
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

4 participants