Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/partitionccl: TestRepartitioning failed under stress #37883

Closed
cockroach-teamcity opened this issue May 28, 2019 · 18 comments · Fixed by #43214
Closed

ccl/partitionccl: TestRepartitioning failed under stress #37883

cockroach-teamcity opened this issue May 28, 2019 · 18 comments · Fixed by #43214
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/c280de40c2bcab93c41fe82bef8353a5ecd95ac4

Parameters:

TAGS=
GOFLAGS=-parallel=4

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestRepartitioning PKG=github.com/cockroachdb/cockroach/pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1312022&tab=buildLog

              [n1,client=127.0.0.1:55340,user=root] planning starts: SELECT
              [n1,client=127.0.0.1:55340,user=root] generating optimizer plan
              [n1,client=127.0.0.1:55340,user=root] added table 'data.public."single col range partitioning - MAXVALUE"' to table collection
              [n1,client=127.0.0.1:55340,user=root] query cache hit
              [n1,client=127.0.0.1:55340,user=root] planning ends
              [n1,client=127.0.0.1:55340,user=root] checking distributability
              [n1,client=127.0.0.1:55340,user=root] will distribute plan: true
              [n1,client=127.0.0.1:55340,user=root] execution starts: distributed engine
               === SPAN START: consuming rows ===
              [n1,client=127.0.0.1:55340,user=root] creating DistSQL plan with isLocal=false
              [n1,client=127.0.0.1:55340,user=root] querying next range at /Table/76/1/6
              [n1,client=127.0.0.1:55340,user=root] running DistSQL plan
               === SPAN START: flow ===
              [n1,client=127.0.0.1:55340,user=root] starting (0 processors, 0 startables)
               === SPAN START: table reader ===
            cockroach.processorid: 0
            cockroach.stat.tablereader.bytes.read: 0 B
            cockroach.stat.tablereader.input.rows: 0
            cockroach.stat.tablereader.stalltime: 0s
              [n1,client=127.0.0.1:55340,user=root] starting scan with limitBatches true
              [n1,client=127.0.0.1:55340,user=root] Scan /Table/76/{1/6-2}
               === SPAN START: txn coordinator send ===
               === SPAN START: dist sender send ===
              [n1,client=127.0.0.1:55340,user=root,txn=e7753557] querying next range at /Table/76/1/6
              [n1,client=127.0.0.1:55340,user=root,txn=e7753557] r91: sending batch 1 Scan to (n1,s1):1
              [n1,client=127.0.0.1:55340,user=root,txn=e7753557] sending request to local client
               === SPAN START: /cockroach.roachpb.Internal/Batch ===
              [n1] 1 Scan
              [n1,s1] executing 1 requests
              [n1,s1,r91/1:/Table/76/{1/5-2}] read-only path
              [n1,s1,r91/1:/Table/76/{1/5-2}] read has no clock uncertainty
              [n1,s1,r91/1:/Table/76/{1/5-2}] acquire latches
              [n1,s1,r91/1:/Table/76/{1/5-2}] waited 4.819µs to acquire latches
              [n1,s1,r91/1:/Table/76/{1/5-2}] waiting for read lock
              [n1,s1,r91/1:/Table/76/{1/5-2}] read completed
               === SPAN START: count rows ===
            cockroach.processorid: 1
            cockroach.stat.aggregator.input.rows: 0
            cockroach.stat.aggregator.mem.max: 0 B
            cockroach.stat.aggregator.stalltime: 37µs
              [n1,client=127.0.0.1:55340,user=root] execution ends
              [n1,client=127.0.0.1:55340,user=root] rows affected: 1
              [n1,client=127.0.0.1:55340,user=root] AutoCommit. err: <nil>
              [n1,client=127.0.0.1:55340,user=root] releasing 1 tables
               === SPAN START: exec cmd: exec stmt ===
              [n1,client=127.0.0.1:55340,user=root] [NoTxn pos:7134] executing ExecStmt: SET TRACING = off
              [n1,client=127.0.0.1:55340,user=root] executing: SET TRACING = off in state: NoTxn
            goroutine 457667 [running]:
            runtime/debug.Stack(0xa7a358200, 0xc003937050, 0x3876e80)
            	/usr/local/go/src/runtime/debug/stack.go:24 +0xa7
            github.com/cockroachdb/cockroach/pkg/testutils.SucceedsSoon(0x38e26a0, 0xc0014b4500, 0xc003937020)
            	/go/src/github.com/cockroachdb/cockroach/pkg/testutils/soon.go:49 +0x103
            github.com/cockroachdb/cockroach/pkg/ccl/partitionccl.TestRepartitioning.func1(0xc0014b4500)
            	/go/src/github.com/cockroachdb/cockroach/pkg/ccl/partitionccl/partition_test.go:1356 +0xabc
            testing.tRunner(0xc0014b4500, 0xc005a12120)
            	/usr/local/go/src/testing/testing.go:827 +0xbf
            created by testing.(*T).Run
            	/usr/local/go/src/testing/testing.go:878 +0x35c

@cockroach-teamcity cockroach-teamcity added this to the 19.2 milestone May 28, 2019
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels May 28, 2019
@tbg
Copy link
Member

tbg commented Jun 3, 2019

    partition_test.go:1356: condition failed to evaluate within 45s: expected to scan on n3: SELECT count(*) FROM "single col range partitioning - MAXVALUE" WHERE a > 5

@tbg tbg unassigned danhhz Jun 3, 2019
@tbg
Copy link
Member

tbg commented Jun 3, 2019

3004 runs so far, 0 failures, over 25m35s

with make stress PKG=./pkg/ccl/partitionccl/ STS=TestRepartitioning/single_col_range_partitioning/single_col_range_partitioning_-_MAXVALUE STRESSFLAGS='-p 6' | tee -a /tmp/stress.log

@tbg
Copy link
Member

tbg commented Jun 3, 2019

@darinpp, could you take a look? This flake is roughly in the allocator/replication area. There's some chance that the machine was overloaded, but I'm not seeing the typical signs (failed heartbeats, and this wasn't a race build).

@darinpp darinpp self-assigned this Jun 3, 2019
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/23cc6c0c62c0a1ca9ca0558b71bc9bd32b325b46

Parameters:

TAGS=
GOFLAGS=-parallel=4

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestRepartitioning PKG=github.com/cockroachdb/cockroach/pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1354074&tab=buildLog


I190622 05:51:40.154574 701836 storage/replica_raftstorage.go:816  [n3,s3,r195/?:/{Table/94/2-Max}] applied PREEMPTIVE snapshot in 0ms [clear=0ms batch=0ms entries=0ms commit=0ms]
I190622 05:51:40.154955 701861 storage/replica_command.go:935  [n1,s1,r195/2:/{Table/94/2-Max}] change replicas (ADD_REPLICA (n3,s3):3): existing descriptor r195:/{Table/94/2-Max} [(n1,s1):2, next=3, gen=114]
I190622 05:51:40.156809 701602 storage/replica_raft.go:397  [n2,s2,r191/1:/Table/94/1/{4-5}] proposing ADD_REPLICA((n1,s1):2): updated=[(n2,s2):1 (n1,s1):2] next=3
I190622 05:51:40.157510 701832 storage/store.go:2523  [n1,replicaGC,s1,r193/2:/Table/94/{1/6-2}] removing replica r193/2
I190622 05:51:40.157797 701832 storage/replica_destroy.go:148  [n1,replicaGC,s1,r193/2:/Table/94/{1/6-2}] removed 7 (0+7) keys in 0ms [clear=0ms commit=0ms]
E190622 05:51:40.159708 701868 storage/queue.go:1035  [n1,replicate,s1,r191/2:/Table/94/1/{4-5}] no removable replicas from range that needs a removal: [1:0, 2*:25]
I190622 05:51:40.159923 701861 storage/replica_raft.go:397  [n1,s1,r195/2:/{Table/94/2-Max}] proposing ADD_REPLICA((n3,s3):3): updated=[(n1,s1):2 (n3,s3):3] next=4
E190622 05:51:40.162332 701944 storage/queue.go:1035  [n3,replicate,s3,r195/3:/{Table/94/2-Max}] no removable replicas from range that needs a removal: [2:0, 3*:24]
I190622 05:51:40.183256 387833 sql/sqlbase/structured.go:1118  [n1,client=127.0.0.1:36416,user=root,scExec] publish: descID=94 (unpartitioned) version=6 mtime=2019-06-22 05:51:40.182698005 +0000 UTC
I190622 05:51:40.207256 702446 storage/replica_command.go:935  [n1,s1,r191/2:/Table/94/1/{4-5}] change replicas (REMOVE_REPLICA (n2,s2):1): existing descriptor r191:/Table/94/1/{4-5} [(n2,s2):1, (n1,s1):2, next=3, gen=112]
I190622 05:51:40.212687 702446 storage/replica_raft.go:397  [n1,s1,r191/2:/Table/94/1/{4-5}] proposing REMOVE_REPLICA((n2,s2):1): updated=[(n1,s1):2] next=3
I190622 05:51:40.214760 701593 storage/replica_command.go:557  [n1,merge,s1,r190/2:/Table/94{-/1/4}] initiating a merge of r191:/Table/94/1/{4-5} [(n1,s1):2, next=3, gen=112] into this range (lhs+rhs has (size=0 B+0 B qps=0.00+0.00 --> 0.00qps) below threshold (size=0 B, qps=0.00))
I190622 05:51:40.219336 387833 sql/event_log.go:132  [n1,client=127.0.0.1:36416,user=root] Event: "create_database", target: 95, info: {DatabaseName:data Statement:CREATE DATABASE data User:root}
I190622 05:51:40.223255 702137 storage/store.go:2523  [n2,replicaGC,s2,r191/1:/Table/94/1/{4-5}] removing replica r191/1
I190622 05:51:40.223572 702137 storage/replica_destroy.go:148  [n2,replicaGC,s2,r191/1:/Table/94/1/{4-5}] removed 9 (0+9) keys in 0ms [clear=0ms commit=0ms]
I190622 05:51:40.224900 387546 storage/store.go:2523  [n1,s1,r190/2:/Table/94{-/1/4}] removing replica r191/2
I190622 05:51:40.226551 702962 storage/store_snapshot.go:784  [n2,s2,r192/3:/Table/94/1/{5-6}] sending PREEMPTIVE snapshot efe4238b at applied index 33
I190622 05:51:40.226745 702962 storage/store_snapshot.go:827  [n2,s2,r192/3:/Table/94/1/{5-6}] streamed snapshot to (n1,s1):?: kv pairs: 14, log entries: 0, rate-limit: 8.0 MiB/sec, 0.00s
I190622 05:51:40.227258 703077 storage/replica_raftstorage.go:810  [n1,s1,r192/?:{-}] applying PREEMPTIVE snapshot at index 33 (id=efe4238b, encoded size=1273, 1 rocksdb batches, 0 log entries)
I190622 05:51:40.227515 703077 storage/replica_raftstorage.go:816  [n1,s1,r192/?:/Table/94/1/{5-6}] applied PREEMPTIVE snapshot in 0ms [clear=0ms batch=0ms entries=0ms commit=0ms]
I190622 05:51:40.227914 702962 storage/replica_command.go:935  [n2,s2,r192/3:/Table/94/1/{5-6}] change replicas (ADD_REPLICA (n1,s1):4): existing descriptor r192:/Table/94/1/{5-6} [(n2,s2):3, next=4, gen=113]
I190622 05:51:40.229117 387833 sql/event_log.go:132  [n1,client=127.0.0.1:36416,user=root] Event: "create_table", target: 96, info: {TableName:data.public."single col range partitioning" Statement:CREATE TABLE "single col range partitioning" (a INT8 PRIMARY KEY) PARTITION BY RANGE (a) (PARTITION p3 VALUES FROM (minvalue) TO (3), PARTITION p4 VALUES FROM (3) TO (4)) User:root}

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/badc7f4851bcc0d2f799a9ab4e5871753d7fe693

Parameters:

TAGS=
GOFLAGS=-parallel=4

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestRepartitioning PKG=github.com/cockroachdb/cockroach/pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1354929&tab=buildLog


I190624 05:30:09.594176 670605 storage/replica_command.go:935  [n3,s3,r174/3:/Table/98/1{-/4}] change replicas (REMOVE_REPLICA (n1,s1):2): existing descriptor r174:/Table/98/1{-/4} [(n1,s1):2, (n3,s3):3, next=4, gen=128]
I190624 05:30:09.596771 670605 storage/replica_raft.go:397  [n3,s3,r174/3:/Table/98/1{-/4}] proposing REMOVE_REPLICA((n1,s1):2): updated=[(n3,s3):3] next=4
I190624 05:30:09.598391 669884 storage/replica_command.go:557  [n3,merge,s3,r173/1:/Table/98{-/1}] initiating a merge of r174:/Table/98/1{-/4} [(n3,s3):3, next=4, gen=128] into this range (lhs+rhs has (size=0 B+0 B qps=0.00+0.00 --> 0.00qps) below threshold (size=0 B, qps=0.00))
I190624 05:30:09.598430 670430 storage/replica_command.go:557  [n2,merge,s2,r175/2:/Table/98/1/{4-5}] initiating a merge of r176:/Table/98/{1/5-2} [(n3,s3):1, (n2,s2):2, next=3, gen=130] into this range (lhs+rhs has (size=0 B+0 B qps=0.00+0.00 --> 0.00qps) below threshold (size=0 B, qps=0.00))
I190624 05:30:09.604848 670587 storage/store.go:2523  [n1,replicaGC,s1,r174/2:/Table/98/1{-/4}] removing replica r174/2
I190624 05:30:09.605178 670587 storage/replica_destroy.go:148  [n1,replicaGC,s1,r174/2:/Table/98/1{-/4}] removed 8 (0+8) keys in 0ms [clear=0ms commit=0ms]
I190624 05:30:09.605452 341288 server/status/runtime.go:500  [n3] runtime stats: 336 MiB RSS, 698 goroutines, 122 MiB/31 MiB/205 MiB GO alloc/idle/total, 95 MiB/122 MiB CGO alloc/total, 7394.0 CGO/sec, 57.2/10.1 %(u/s)time, 0.0 %gc (5x), 15 MiB/15 MiB (r/w)net
I190624 05:30:09.606189 341254 storage/store.go:2523  [n3,s3,r173/1:/Table/98{-/1}] removing replica r174/3
I190624 05:30:09.607130 340525 sql/sqlbase/structured.go:1118  [n1,client=127.0.0.1:42348,user=root] publish: descID=98 (unpartitioned) version=5 mtime=2019-06-24 05:30:09.587008205 +0000 UTC
I190624 05:30:09.619908 340525 sql/event_log.go:132  [n1,client=127.0.0.1:42348,user=root] Event: "drop_database", target: 97, info: {DatabaseName:data Statement:DROP DATABASE IF EXISTS data User:root DroppedSchemaObjects:[data.public.unpartitioned]}
E190624 05:30:09.624047 671298 storage/queue.go:1035  [n3,replicate,s3,r175/1:/Table/98/1/{4-5}] no removable replicas from range that needs a removal: [1*:24, 2:0]
I190624 05:30:09.625378 671151 storage/replica_command.go:935  [n3,s3,r175/1:/Table/98/1/{4-5}] change replicas (REMOVE_REPLICA (n2,s2):2): existing descriptor r175:/Table/98/1/{4-5} [(n3,s3):1, (n2,s2):2, next=3, gen=129]
I190624 05:30:09.627705 340525 sql/sqlbase/structured.go:1118  [n1,client=127.0.0.1:42348,user=root,scExec] publish: descID=98 (unpartitioned) version=6 mtime=2019-06-24 05:30:09.627229803 +0000 UTC
I190624 05:30:09.628337 671151 storage/replica_raft.go:397  [n3,s3,r175/1:/Table/98/1/{4-5}] proposing REMOVE_REPLICA((n2,s2):2): updated=[(n3,s3):1] next=3
I190624 05:30:09.631849 670993 storage/replica_command.go:557  [n3,merge,s3,r173/1:/Table/98{-/1/4}] initiating a merge of r175:/Table/98/1/{4-5} [(n3,s3):1, next=3, gen=129] into this range (lhs+rhs has (size=0 B+0 B qps=0.00+0.00 --> 0.00qps) below threshold (size=0 B, qps=0.00))
I190624 05:30:09.636473 671350 storage/store.go:2523  [n2,replicaGC,s2,r175/2:/Table/98/1/{4-5}] removing replica r175/2
I190624 05:30:09.636786 671350 storage/replica_destroy.go:148  [n2,replicaGC,s2,r175/2:/Table/98/1/{4-5}] removed 6 (0+6) keys in 0ms [clear=0ms commit=0ms]
I190624 05:30:09.638074 341217 storage/store.go:2523  [n3,s3,r173/1:/Table/98{-/1/4}] removing replica r175/1
I190624 05:30:09.657252 340525 sql/event_log.go:132  [n1,client=127.0.0.1:42348,user=root] Event: "create_database", target: 99, info: {DatabaseName:data Statement:CREATE DATABASE data User:root}
I190624 05:30:09.665747 340525 sql/event_log.go:132  [n1,client=127.0.0.1:42348,user=root] Event: "create_table", target: 100, info: {TableName:data.public."secondary index - unpartitioned" Statement:CREATE TABLE "secondary index - unpartitioned" (a INT8 PRIMARY KEY, b INT8, INDEX b_idx (b)) User:root}
I190624 05:30:09.666175 672343 storage/replica_command.go:261  [n3,split,s3,r177/1:/{Table/98/2-Max}] initiating a split of this range at key /Table/100 [r178] (zone config)

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/6534f6c0ff94187a284f99bedbe601689f5ef714

Parameters:

TAGS=
GOFLAGS=-parallel=4

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestRepartitioning PKG=github.com/cockroachdb/cockroach/pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1357316&tab=buildLog


I190625 07:07:17.818843 498241 storage/replica_command.go:259  [n2,split,s2,r76/1:/{Table/78/1/5-Max}] initiating a split of this range at key /Table/78/2 [r77] (zone config)
I190625 07:07:17.826031 495457 ccl/partitionccl/partition_test.go:206  query: SELECT count(*) FROM "single col range partitioning - MAXVALUE" WHERE a < 4
I190625 07:07:17.828886 495457 ccl/partitionccl/partition_test.go:206  query: SELECT count(*) FROM "single col range partitioning - MAXVALUE" WHERE a < 4
I190625 07:07:17.831865 495457 ccl/partitionccl/partition_test.go:206  query: SELECT count(*) FROM "single col range partitioning - MAXVALUE" WHERE a < 4
I190625 07:07:17.833608 498281 storage/store_snapshot.go:782  [n2,replicate,s2,r76/1:/Table/78/{1/5-2}] sending PREEMPTIVE snapshot 74bd312a at applied index 15
I190625 07:07:17.833715 498281 storage/store_snapshot.go:825  [n2,replicate,s2,r76/1:/Table/78/{1/5-2}] streamed snapshot to (n3,s3):?: kv pairs: 7, log entries: 0, rate-limit: 8.0 MiB/sec, 0.00s
I190625 07:07:17.834196 498284 storage/replica_raftstorage.go:808  [n3,s3,r76/?:{-}] applying PREEMPTIVE snapshot at index 15 (id=74bd312a, encoded size=620, 1 rocksdb batches, 0 log entries)
I190625 07:07:17.834461 498284 storage/replica_raftstorage.go:814  [n3,s3,r76/?:/Table/78/{1/5-2}] applied PREEMPTIVE snapshot in 0ms [clear=0ms batch=0ms entries=0ms commit=0ms]
I190625 07:07:17.834727 498281 storage/replica_command.go:933  [n2,replicate,s2,r76/1:/Table/78/{1/5-2}] change replicas (ADD_REPLICA (n3,s3):2): existing descriptor r76:/Table/78/{1/5-2} [(n2,s2):1, next=2, gen=69]
I190625 07:07:17.835107 495457 ccl/partitionccl/partition_test.go:206  query: SELECT count(*) FROM "single col range partitioning - MAXVALUE" WHERE a < 4
I190625 07:07:17.837766 495457 ccl/partitionccl/partition_test.go:206  query: SELECT count(*) FROM "single col range partitioning - MAXVALUE" WHERE a < 4
I190625 07:07:17.840482 495457 ccl/partitionccl/partition_test.go:206  query: SELECT count(*) FROM "single col range partitioning - MAXVALUE" WHERE a < 4
I190625 07:07:17.840914 498281 storage/replica_raft.go:395  [n2,s2,r76/1:/Table/78/{1/5-2}] proposing ADD_REPLICA((n3,s3):2): updated=[(n2,s2):1 (n3,s3):2] next=3
E190625 07:07:17.841238 498281 storage/queue.go:1033  [n2,replicate,s2,r76/1:/Table/78/{1/5-2}] no removable replicas from range that needs a removal: [1*:17, 2:0]
I190625 07:07:17.843813 495457 ccl/partitionccl/partition_test.go:206  query: SELECT count(*) FROM "single col range partitioning - MAXVALUE" WHERE a >= 4 AND a < 5
I190625 07:07:17.847715 495457 ccl/partitionccl/partition_test.go:206  query: SELECT count(*) FROM "single col range partitioning - MAXVALUE" WHERE a > 5
I190625 07:07:17.850677 495457 ccl/partitionccl/partition_test.go:206  query: SELECT count(*) FROM "single col range partitioning - MAXVALUE" WHERE a > 5
I190625 07:07:17.853245 495457 ccl/partitionccl/partition_test.go:206  query: SELECT count(*) FROM "single col range partitioning - MAXVALUE" WHERE a >= 4 AND a < 5
I190625 07:07:17.856475 495457 ccl/partitionccl/partition_test.go:206  query: SELECT count(*) FROM "single col range partitioning - MAXVALUE" WHERE a > 5
I190625 07:07:17.859102 495457 ccl/partitionccl/partition_test.go:206  query: SELECT count(*) FROM "single col range partitioning - MAXVALUE" WHERE a < 4
I190625 07:07:17.861881 495457 ccl/partitionccl/partition_test.go:206  query: SELECT count(*) FROM "single col range partitioning - MAXVALUE" WHERE a < 4
I190625 07:07:17.864365 495457 ccl/partitionccl/partition_test.go:206  query: SELECT count(*) FROM "single col range partitioning - MAXVALUE" WHERE a < 4
I190625 07:07:17.867327 495457 ccl/partitionccl/partition_test.go:206  query: SELECT count(*) FROM "single col range partitioning - MAXVALUE" WHERE a >= 4 AND a < 5
I190625 07:07:17.869996 495457 ccl/partitionccl/partition_test.go:206  query: SELECT count(*) FROM "single col range partitioning - MAXVALUE" WHERE a > 5
I190625 07:07:17.872468 495457 ccl/partitionccl/partition_test.go:206  query: SELECT count(*) FROM "single col range partitioning - MAXVALUE" WHERE a >= 4 AND a < 5

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/5f358ed804af05f8c4b404efc4d8a282d8e0916c

Parameters:

TAGS=
GOFLAGS=-parallel=4

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestRepartitioning PKG=github.com/cockroachdb/cockroach/pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1360489&tab=buildLog


I190626 05:33:27.854791 658047 storage/replica_raft.go:395  [n2,s2,r153/2:/Table/98/{1/4-2}] proposing ADD_REPLICA((n3,s3):6): updated=[(n2,s2):2 (n1,s1):5 (n3,s3):6] next=7
I190626 05:33:27.854920 331180 sql/event_log.go:130  [n1,client=127.0.0.1:47858,user=root] Event: "alter_index", target: 100, info: {TableName:"".""."secondary index - list partitioning" IndexName:b_idx Statement:ALTER INDEX "secondary index - list partitioning"@b_idx PARTITION BY LIST (b) (PARTITION p3 VALUES IN ((3)), PARTITION p4 VALUES IN ((4))) User:root MutationID:0}
I190626 05:33:27.875479 660230 storage/replica_raft.go:395  [n3,s3,r155/2:/Table/{98/2-100}] proposing ADD_REPLICA((n1,s1):4): updated=[(n3,s3):2 (n2,s2):3 (n1,s1):4] next=5
I190626 05:33:27.884469 660433 storage/replica_command.go:933  [n2,replicate,s2,r155/3:/Table/{98/2-100}] change replicas (REMOVE_REPLICA (n3,s3):2): existing descriptor r155:/Table/{98/2-100} [(n3,s3):2, (n2,s2):3, (n1,s1):4, next=5, gen=142]
I190626 05:33:27.896109 660433 storage/replica_raft.go:395  [n2,s2,r155/3:/Table/{98/2-100}] proposing REMOVE_REPLICA((n3,s3):2): updated=[(n2,s2):3 (n1,s1):4] next=5
I190626 05:33:27.896961 660433 storage/replica_command.go:933  [n2,replicate,s2,r155/3:/Table/{98/2-100}] change replicas (REMOVE_REPLICA (n1,s1):4): existing descriptor r155:/Table/{98/2-100} [(n2,s2):3, (n1,s1):4, next=5, gen=142]
I190626 05:33:27.897888 331180 sql/event_log.go:130  [n1,client=127.0.0.1:47858,user=root] Event: "set_zone_config", target: 100, info: {Target:data."secondary index - list partitioning"@b_idx Config: Options:constraints = '[+n1]' User:root}
I190626 05:33:27.901657 661157 storage/replica_command.go:259  [n2,split,s2,r194/2:/{Table/100-Max}] initiating a split of this range at key /Table/100/2 [r195] (zone config)
I190626 05:33:27.905989 660821 storage/store.go:2522  [n3,replicaGC,s3,r155/2:/Table/{98/2-100}] removing replica r155/2
I190626 05:33:27.906438 660821 storage/replica_destroy.go:146  [n3,replicaGC,s3,r155/2:/Table/{98/2-100}] removed 11 (0+11) keys in 0ms [clear=0ms commit=0ms]
I190626 05:33:27.908545 331180 sql/event_log.go:130  [n1,client=127.0.0.1:47858,user=root] Event: "set_zone_config", target: 100, info: {Target:data."secondary index - list partitioning".p3 Config: Options:constraints = '[+n2]' User:root}
I190626 05:33:27.908889 660433 storage/replica_raft.go:395  [n2,s2,r155/3:/Table/{98/2-100}] proposing REMOVE_REPLICA((n1,s1):4): updated=[(n2,s2):3] next=5
I190626 05:33:27.910730 661425 storage/replica_command.go:259  [n2,split,s2,r195/2:/{Table/100/2-Max}] initiating a split of this range at key /Table/100/2/3 [r196] (zone config)
I190626 05:33:27.911950 660889 storage/store_snapshot.go:774  [n2,s2,r155/3:/Table/{98/2-100}] sending PREEMPTIVE snapshot d7cea1f3 at applied index 37
I190626 05:33:27.912115 660889 storage/store_snapshot.go:817  [n2,s2,r155/3:/Table/{98/2-100}] streamed snapshot to (n3,s3):?: kv pairs: 20, log entries: 0, rate-limit: 8.0 MiB/sec, 0.00s
E190626 05:33:27.913360 661630 storage/queue.go:1033  [n3,replicate,s3,r151/2:/Table/98{-/1/4}] no removable replicas from range that needs a removal: [1:0, 2*:25]
I190626 05:33:27.913761 661478 storage/replica_raftstorage.go:808  [n3,s3,r155/?:{-}] applying PREEMPTIVE snapshot at index 37 (id=d7cea1f3, encoded size=2733, 1 rocksdb batches, 0 log entries)
E190626 05:33:27.913789 661544 storage/queue.go:1033  [n3,replicate,s3,r194/1:/Table/100{-/2}] no removable replicas from range that needs a removal: [1*:16, 2:0]
I190626 05:33:27.914210 661478 storage/replica_raftstorage.go:814  [n3,s3,r155/?:/Table/{98/2-100}] applied PREEMPTIVE snapshot in 0ms [clear=0ms batch=0ms entries=0ms commit=0ms]
I190626 05:33:27.914524 660889 storage/replica_command.go:933  [n2,s2,r155/3:/Table/{98/2-100}] change replicas (ADD_REPLICA (n3,s3):5): existing descriptor r155:/Table/{98/2-100} [(n2,s2):3, next=5, gen=142]
I190626 05:33:27.918747 331180 sql/event_log.go:130  [n1,client=127.0.0.1:47858,user=root] Event: "set_zone_config", target: 100, info: {Target:data."secondary index - list partitioning".p4 Config: Options:constraints = '[+n3]' User:root}

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/02f239d810711786174736328e3360dccf7ef412

Parameters:

TAGS=
GOFLAGS=-parallel=4

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestRepartitioning PKG=github.com/cockroachdb/cockroach/pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1439815&tab=buildLog

            cockroach.processorid: 2
              [n2,s2] executing 1 requests
              [n2,s2,r77/2:/Table/68/1/5{-/6}] read-only path
              [n2,s2,r77/2:/Table/68/1/5{-/6}] acquire latches
              [n2,s2,r77/2:/Table/68/1/5{-/6}] waited 3.258µs to acquire latches
              [n2,s2,r77/2:/Table/68/1/5{-/6}] waiting for read lock
              [n2,s2,r77/2:/Table/68/1/5{-/6}] read completed
              [n2,txn=d93e3c0f] querying next range at /Table/68/1/5/6
              [n2,txn=d93e3c0f] r72: sending batch 1 Scan to (n3,s3):2
              [n2,txn=d93e3c0f] sending request to 127.0.0.1:39861
               === SPAN START: /cockroach.roachpb.Internal/Batch ===
              [n2] Consumer sent handshake. Consuming flow scheduled: true
               === SPAN START: /cockroach.roachpb.Internal/Batch ===
              [n3] 1 Scan
              [n3,s3] executing 1 requests
              [n3] *roachpb.RangeNotFoundError
              [n2,txn=d93e3c0f] application error: r72 was not found on s3
              [n2,txn=d93e3c0f] failed to send RPC: sending to all 1 replicas failed; last error: (err: r72 was not found on s3) <nil>
              [n2,txn=d93e3c0f] reply error [txn: d93e3c0f], Scan [/Table/68/1/5/6,/Table/68/1/5/6/#): failed to send RPC: sending to all 1 replicas failed; last error: (err: r72 was not found on s3) <nil>
              [n2,txn=d93e3c0f] evicting range descriptor on *roachpb.SendError and backoff for re-lookup: r72:/Table/68/1/5/{6-7} [(n3,s3):2, next=3, gen=57]
              [n2,txn=d93e3c0f] evicting cached range descriptor
              [n2,txn=d93e3c0f] r72: sending batch 1 Scan to (n2,s2):4
              [n2,txn=d93e3c0f] sending request to local client
               === SPAN START: /cockroach.roachpb.Internal/Batch ===
              [n2] 1 Scan
              [n2,s2] executing 1 requests
              [n2,s2,r72/4:/Table/68/1/5/{6-7}] read-only path
              [n2,s2,r72/4:/Table/68/1/5/{6-7}] acquire latches
              [n2,s2,r72/4:/Table/68/1/5/{6-7}] waited 4.82µs to acquire latches
              [n2,s2,r72/4:/Table/68/1/5/{6-7}] waiting on in-progress merge
              [n2,s2,r72/4:/Table/68/1/5/{6-7}] replica.Send got error: merge in progress
              [n2,s2,r72/4:/Table/68/1/5/{6-7}] read-only path
              [n2,s2,r72/4:/Table/68/1/5/{6-7}] acquire latches
              [n2,s2,r72/4:/Table/68/1/5/{6-7}] waited 7.519µs to acquire latches
              [n2,s2,r72/4:/Table/68/1/5/{6-7}] waiting for read lock
              [n2,s2,r72/4:/Table/68/1/5/{6-7}] read completed
               === SPAN START: count rows ===
            cockroach.processorid: 1
            cockroach.stat.aggregator.input.rows: 0
            cockroach.stat.aggregator.mem.max: 0 B
            cockroach.stat.aggregator.stalltime: 71µs
              [n1,client=127.0.0.1:57696,user=root] execution ends
              [n1,client=127.0.0.1:57696,user=root] rows affected: 1
              [n1,client=127.0.0.1:57696,user=root] AutoCommit. err: <nil>
              [n1,client=127.0.0.1:57696,user=root] releasing 1 tables
               === SPAN START: exec stmt ===
              [n1,client=127.0.0.1:57696,user=root] [NoTxn pos:2951] executing ExecStmt: SET TRACING = off
              [n1,client=127.0.0.1:57696,user=root] executing: SET TRACING = off in state: NoTxn
            goroutine 316088 [running]:
            runtime/debug.Stack(0x41d0d60, 0xc001415b00, 0xc003003e00)
            	/usr/local/go/src/runtime/debug/stack.go:24 +0x9d
            github.com/cockroachdb/cockroach/pkg/testutils.SucceedsSoon(0x41d0d60, 0xc001415b00, 0xc003003e00)
            	/go/src/github.com/cockroachdb/cockroach/pkg/testutils/soon.go:36 +0x6b
            github.com/cockroachdb/cockroach/pkg/ccl/partitionccl.TestRepartitioning.func1(0xc001415b00)
            	/go/src/github.com/cockroachdb/cockroach/pkg/ccl/partitionccl/partition_test.go:1355 +0x959
            testing.tRunner(0xc001415b00, 0xc002b1d770)
            	/usr/local/go/src/testing/testing.go:865 +0xc0
            created by testing.(*T).Run
            	/usr/local/go/src/testing/testing.go:916 +0x35a

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/720a6605f182c4459c56f055c23e805d48b89344

Parameters:

TAGS=
GOFLAGS=-parallel=4

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestRepartitioning PKG=github.com/cockroachdb/cockroach/pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1441378&tab=buildLog

              [n1,client=127.0.0.1:42170,user=root] planning starts: SELECT
              [n1,client=127.0.0.1:42170,user=root] generating optimizer plan
              [n1,client=127.0.0.1:42170,user=root] added table 'data.public."secondary index - list partitioning"' to table collection
              [n1,client=127.0.0.1:42170,user=root] query cache hit
              [n1,client=127.0.0.1:42170,user=root] planning ends
              [n1,client=127.0.0.1:42170,user=root] checking distributability
              [n1,client=127.0.0.1:42170,user=root] will distribute plan: true
              [n1,client=127.0.0.1:42170,user=root] execution starts: distributed engine
               === SPAN START: consuming rows ===
              [n1,client=127.0.0.1:42170,user=root] creating DistSQL plan with isLocal=false
              [n1,client=127.0.0.1:42170,user=root] querying next range at /Table/100/2/!NULL
              [n1,client=127.0.0.1:42170,user=root] running DistSQL plan
               === SPAN START: /cockroach.sql.distsqlrun.DistSQL/SetupFlow ===
               === SPAN START: outbox ===
            cockroach.stat.outbox.bytes_sent: ࢞
            cockroach.streamid: 1
               === SPAN START: table reader ===
            cockroach.processorid: 0
            cockroach.stat.tablereader.bytes.read: 0 B
            cockroach.stat.tablereader.input.rows: 0
            cockroach.stat.tablereader.stalltime: 330µs
              [n2] starting scan with limitBatches true
              [n2] Scan /Table/100/2/{!NULL-3}
               === SPAN START: txn coordinator send ===
               === SPAN START: [async] drain ===
               === SPAN START: dist sender send ===
              [n2,txn=eabdc993] querying next range at /Table/100/2/!NULL
              [n2,txn=eabdc993] r196: sending batch 1 Scan to (n2,s2):1
              [n2,txn=eabdc993] sending request to local client
               === SPAN START: /cockroach.roachpb.Internal/Batch ===
              [n2] 1 Scan
               === SPAN START: flow ===
              [n2,s2] executing 1 requests
              [n1,client=127.0.0.1:42170,user=root] starting (0 processors, 0 startables)
              [n2,s2,r196/1:/Table/100/2{-/3}] read-only path
               === SPAN START: noop ===
            cockroach.processorid: 2
              [n2,s2,r196/1:/Table/100/2{-/3}] acquire latches
              [n2,s2,r196/1:/Table/100/2{-/3}] waited 6.49µs to acquire latches
              [n2,s2,r196/1:/Table/100/2{-/3}] waiting for read lock
              [n2,s2,r196/1:/Table/100/2{-/3}] read completed
               === SPAN START: count rows ===
            cockroach.processorid: 1
            cockroach.stat.aggregator.input.rows: 0
            cockroach.stat.aggregator.mem.max: 0 B
            cockroach.stat.aggregator.stalltime: 71µs
              [n1,client=127.0.0.1:42170,user=root] execution ends
              [n1,client=127.0.0.1:42170,user=root] rows affected: 1
              [n1,client=127.0.0.1:42170,user=root] AutoCommit. err: <nil>
              [n1,client=127.0.0.1:42170,user=root] releasing 1 tables
               === SPAN START: exec stmt ===
              [n1,client=127.0.0.1:42170,user=root] [NoTxn pos:11008] executing ExecStmt: SET TRACING = off
              [n1,client=127.0.0.1:42170,user=root] executing: SET TRACING = off in state: NoTxn
            goroutine 536385 [running]:
            runtime/debug.Stack(0x41d2b00, 0xc007645b00, 0xc00a0b0870)
            	/usr/local/go/src/runtime/debug/stack.go:24 +0x9d
            github.com/cockroachdb/cockroach/pkg/testutils.SucceedsSoon(0x41d2b00, 0xc007645b00, 0xc00a0b0870)
            	/go/src/github.com/cockroachdb/cockroach/pkg/testutils/soon.go:36 +0x6b
            github.com/cockroachdb/cockroach/pkg/ccl/partitionccl.TestRepartitioning.func1(0xc007645b00)
            	/go/src/github.com/cockroachdb/cockroach/pkg/ccl/partitionccl/partition_test.go:1355 +0x959
            testing.tRunner(0xc007645b00, 0xc005b3a870)
            	/usr/local/go/src/testing/testing.go:865 +0xc0
            created by testing.(*T).Run
            	/usr/local/go/src/testing/testing.go:916 +0x35a

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/7ca0a86b8595c097fd8f27581b1509c47f17e8a3

Parameters:

TAGS=
GOFLAGS=-parallel=4

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestRepartitioning PKG=github.com/cockroachdb/cockroach/pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1450725&tab=buildLog


I190823 05:43:16.156893 521322 storage/replica_raft.go:291  [n3,s3,r191/1:/{Table/94/2-Max}] proposing ADD_REPLICA[(n1,s1):2]: after=[(n3,s3):1 (n1,s1):2] next=3
I190823 05:43:16.160616 521781 storage/replica_raft.go:291  [n3,s3,r192/3:/Table/94/1{-/4}] proposing REMOVE_REPLICA[(n1,s1):2]: after=[(n3,s3):3] next=4
W190823 05:43:16.163385 17 storage/engine/rocksdb.go:116  [rocksdb] [db/version_set.cc:3086] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
I190823 05:43:16.164739 521033 storage/replica_command.go:598  [n3,merge,s3,r167/1:/Table/94{-/1}] initiating a merge of r192:/Table/94/1{-/4} [(n3,s3):3, next=4, gen=159] into this range (lhs+rhs has (size=0 B+0 B qps=0.00+0.00 --> 0.00qps) below threshold (size=0 B, qps=0.00))
E190823 05:43:16.165689 521861 storage/queue.go:1027  [n1,replicate,s1,r191/2:/{Table/94/2-Max}] no removable replicas from range that needs a removal: [1:0, 2*:19]
I190823 05:43:16.173630 269305 storage/store.go:2593  [n3,s3,r167/1:/Table/94{-/1}] removing replica r192/3
W190823 05:43:16.177399 17 storage/engine/rocksdb.go:116  [rocksdb] [db/version_set.cc:3086] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
I190823 05:43:16.191872 268437 sql/sqlbase/structured.go:1511  [n1,client=127.0.0.1:44728,user=root] publish: descID=94 (unpartitioned) version=5 mtime=2019-08-23 05:43:16.152242091 +0000 UTC
I190823 05:43:16.212767 268437 sql/event_log.go:130  [n1,client=127.0.0.1:44728,user=root] Event: "drop_database", target: 93, info: {DatabaseName:data Statement:DROP DATABASE IF EXISTS data User:root DroppedSchemaObjects:[data.public.unpartitioned]}
I190823 05:43:16.220089 268437 sql/lease.go:329  [n1,client=127.0.0.1:44728,user=root,scExec] waiting for 1 leases to expire: desc=[{unpartitioned 94 4}]
I190823 05:43:16.226193 521594 storage/replica_command.go:1264  [n2,s2,r169/2:/Table/94/1/{5-6}] change replicas (add [] remove [(n3,s3):1]): existing descriptor r169:/Table/94/1/{5-6} [(n3,s3):1, (n2,s2):2, next=3, gen=156]
I190823 05:43:16.226815 521598 storage/replica_command.go:1264  [n2,s2,r168/2:/Table/94/1/{4-5}] change replicas (add [(n3,s3):3LEARNER] remove []): existing descriptor r168:/Table/94/1/{4-5} [(n2,s2):2, next=3, gen=156]
I190823 05:43:16.228843 521807 storage/store.go:2593  [n1,replicaGC,s1,r192/2:/Table/94/1{-/4}] removing replica r192/2
I190823 05:43:16.229428 521807 storage/replica_destroy.go:152  [n1,replicaGC,s1,r192/2:/Table/94/1{-/4}] removed 6 (0+6) keys in 0ms [clear=0ms commit=0ms]
I190823 05:43:16.232684 521598 storage/replica_raft.go:291  [n2,s2,r168/2:/Table/94/1/{4-5}] proposing ADD_REPLICA[(n3,s3):3LEARNER]: after=[(n2,s2):2 (n3,s3):3LEARNER] next=4
I190823 05:43:16.233526 521594 storage/replica_raft.go:291  [n2,s2,r169/2:/Table/94/1/{5-6}] proposing REMOVE_REPLICA[(n3,s3):1]: after=[(n2,s2):2] next=3
I190823 05:43:16.234330 521598 storage/store_snapshot.go:995  [n2,s2,r168/2:/Table/94/1/{4-5}] sending LEARNER snapshot d33085a3 at applied index 30
I190823 05:43:16.234500 521598 storage/store_snapshot.go:1038  [n2,s2,r168/2:/Table/94/1/{4-5}] streamed snapshot to (n3,s3):3: kv pairs: 10, log entries: 0, rate-limit: 8.0 MiB/sec, 0.00s
I190823 05:43:16.235559 522563 storage/replica_raftstorage.go:808  [n3,s3,r168/3:{-}] applying LEARNER snapshot [id=d33085a3 index=30]
W190823 05:43:16.235767 522563 storage/engine/rocksdb.go:116  [rocksdb] [db/version_set.cc:3086] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
W190823 05:43:16.236566 522563 storage/engine/rocksdb.go:116  [rocksdb] [db/version_set.cc:3086] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
I190823 05:43:16.236968 522563 storage/replica_raftstorage.go:829  [n3,s3,r168/3:/Table/94/1/{4-5}] applied LEARNER snapshot [total=1ms ingestion=4@1ms id=d33085a3 index=30]
I190823 05:43:16.237517 521598 storage/replica_command.go:1264  [n2,s2,r168/2:/Table/94/1/{4-5}] change replicas (add [(n3,s3):3] remove []): existing descriptor r168:/Table/94/1/{4-5} [(n2,s2):2, (n3,s3):3LEARNER, next=4, gen=157]

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/a4b16706455609075c07ddd382907acc50be3c21

Parameters:

TAGS=
GOFLAGS=-parallel=4

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestRepartitioning PKG=github.com/cockroachdb/cockroach/pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1459685&tab=buildLog


I190829 05:44:44.935121 542815 storage/replica_raft.go:291  [n1,s1,r203/2:/Table/94/{1/6-2}] proposing ADD_REPLICA[(n3,s3):3]: after=[(n1,s1):2 (n3,s3):3] next=4
E190829 05:44:44.941019 543353 storage/queue.go:1027  [n3,replicate,s3,r203/3:/Table/94/{1/6-2}] no removable replicas from range that needs a removal: [2:0, 3*:38]
I190829 05:44:44.941363 543374 storage/replica_command.go:1430  [n3,s3,r203/3:/Table/94/{1/6-2}] change replicas (add [] remove [(n1,s1):2]): existing descriptor r203:/Table/94/{1/6-2} [(n1,s1):2, (n3,s3):3, next=4, gen=147]
I190829 05:44:44.947106 543374 storage/replica_raft.go:291  [n3,s3,r203/3:/Table/94/{1/6-2}] proposing REMOVE_REPLICA[(n1,s1):2]: after=[(n3,s3):3] next=4
I190829 05:44:44.953656 543424 storage/replica_command.go:1430  [n1,s1,r204/2:/Table/94/1{-/4}] change replicas (add [(n3,s3):3LEARNER] remove []): existing descriptor r204:/Table/94/1{-/4} [(n1,s1):2, next=3, gen=143]
I190829 05:44:44.959187 543391 storage/store.go:2593  [n1,replicaGC,s1,r203/2:/Table/94/{1/6-2}] removing replica r203/2
I190829 05:44:44.959588 543391 storage/replica_destroy.go:152  [n1,replicaGC,s1,r203/2:/Table/94/{1/6-2}] removed 8 (0+8) keys in 0ms [clear=0ms commit=0ms]
I190829 05:44:44.972059 543424 storage/replica_raft.go:291  [n1,s1,r204/2:/Table/94/1{-/4}] proposing ADD_REPLICA[(n3,s3):3LEARNER]: after=[(n1,s1):2 (n3,s3):3LEARNER] next=4
I190829 05:44:44.973342 543424 storage/store_snapshot.go:995  [n1,s1,r204/2:/Table/94/1{-/4}] sending LEARNER snapshot 88e2665f at applied index 33
I190829 05:44:44.973498 543424 storage/store_snapshot.go:1038  [n1,s1,r204/2:/Table/94/1{-/4}] streamed snapshot to (n3,s3):3LEARNER: kv pairs: 10, log entries: 0, rate-limit: 8.0 MiB/sec, 0.00s
I190829 05:44:44.975782 543512 storage/replica_raftstorage.go:793  [n3,s3,r204/3:{-}] applying LEARNER snapshot [id=88e2665f index=33]
I190829 05:44:44.977271 543512 storage/replica_raftstorage.go:814  [n3,s3,r204/3:/Table/94/1{-/4}] applied LEARNER snapshot [total=1ms ingestion=4@1ms id=88e2665f index=33]
I190829 05:44:44.978076 543424 storage/replica_command.go:1430  [n1,s1,r204/2:/Table/94/1{-/4}] change replicas (add [(n3,s3):3] remove []): existing descriptor r204:/Table/94/1{-/4} [(n1,s1):2, (n3,s3):3LEARNER, next=4, gen=144]
I190829 05:44:44.985647 286213 sql/sqlbase/structured.go:1511  [n1,client=127.0.0.1:52522,user=root] publish: descID=94 (unpartitioned) version=5 mtime=2019-08-29 05:44:44.928621821 +0000 UTC
I190829 05:44:44.989232 543376 storage/replica_command.go:1430  [n2,s2,r202/2:/Table/94/1/{5-6}] change replicas (add [] remove [(n3,s3):1]): existing descriptor r202:/Table/94/1/{5-6} [(n3,s3):1, (n2,s2):2, next=3, gen=143]
I190829 05:44:44.991712 543424 storage/replica_raft.go:291  [n1,s1,r204/2:/Table/94/1{-/4}] proposing ADD_REPLICA[(n3,s3):3]: after=[(n1,s1):2 (n3,s3):3] next=4
E190829 05:44:44.994563 543443 storage/queue.go:1027  [n3,replicate,s3,r204/3:/Table/94/1{-/4}] no removable replicas from range that needs a removal: [2:0, 3*:37]
I190829 05:44:44.995104 543583 storage/replica_command.go:1430  [n3,s3,r204/3:/Table/94/1{-/4}] change replicas (add [] remove [(n1,s1):2]): existing descriptor r204:/Table/94/1{-/4} [(n1,s1):2, (n3,s3):3, next=4, gen=145]
I190829 05:44:44.998272 543376 storage/replica_raft.go:291  [n2,s2,r202/2:/Table/94/1/{5-6}] proposing REMOVE_REPLICA[(n3,s3):1]: after=[(n2,s2):2] next=3
I190829 05:44:45.012592 286213 sql/event_log.go:130  [n1,client=127.0.0.1:52522,user=root] Event: "drop_database", target: 93, info: {DatabaseName:data Statement:DROP DATABASE IF EXISTS data User:root DroppedSchemaObjects:[data.public.unpartitioned]}
I190829 05:44:45.014033 544011 storage/replica_command.go:1430  [n3,s3,r203/3:/Table/94/{1/6-2}] change replicas (add [(n2,s2):4LEARNER] remove []): existing descriptor r203:/Table/94/{1/6-2} [(n3,s3):3, next=4, gen=148]
I190829 05:44:45.015539 543587 storage/store.go:2593  [n3,replicaGC,s3,r202/1:/Table/94/1/{5-6}] removing replica r202/1
I190829 05:44:45.015830 543587 storage/replica_destroy.go:152  [n3,replicaGC,s3,r202/1:/Table/94/1/{5-6}] removed 6 (0+6) keys in 0ms [clear=0ms commit=0ms]

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/66bd279c9aa682c2b7adcec87ec0c639b8039a33

Parameters:

TAGS=
GOFLAGS=-parallel=4

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestRepartitioning PKG=github.com/cockroachdb/cockroach/pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1461711&tab=buildLog


I190830 05:25:29.423149 488847 storage/store_snapshot.go:1038  [n3,s3,r201/1:/Table/9{4/2-6}] streamed snapshot to (n2,s2):2LEARNER: kv pairs: 8, log entries: 0, rate-limit: 8.0 MiB/sec, 0.00s
E190830 05:25:29.423169 488485 storage/queue.go:1027  [n3,replicate,s3,r203/1:/Table/96/1{-/3}] change replicas of r203 failed: storage/replica_raft.go:305: [n3,s3,r203/1:/Table/96/1{-/3}]: received invalid ChangeReplicasTrigger REMOVE_REPLICA[(n3,s3):1]: after=[(n2,s2):2] next=3 to remove self (leaseholder)
I190830 05:25:29.423615 488701 storage/replica_command.go:1430  [n3,replicate,s3,r206/1:/{Table/96/2-Max}] change replicas (add [(n1,s1):2LEARNER] remove []): existing descriptor r206:/{Table/96/2-Max} [(n3,s3):1, next=2, gen=180]
I190830 05:25:29.424113 488905 storage/replica_raftstorage.go:793  [n2,s2,r201/2:{-}] applying LEARNER snapshot [id=d94fcccd index=19]
I190830 05:25:29.425353 488905 storage/replica_raftstorage.go:814  [n2,s2,r201/2:/Table/9{4/2-6}] applied LEARNER snapshot [total=1ms ingestion=4@1ms id=d94fcccd index=19]
I190830 05:25:29.425805 488847 storage/replica_command.go:1430  [n3,s3,r201/1:/Table/9{4/2-6}] change replicas (add [(n2,s2):2] remove []): existing descriptor r201:/Table/9{4/2-6} [(n3,s3):1, (n2,s2):2LEARNER, next=3, gen=177]
I190830 05:25:29.425969 485017 ccl/partitionccl/partition_test.go:216  query: SELECT count(*) FROM "single col range partitioning" WHERE a < 3
I190830 05:25:29.429204 488701 storage/replica_raft.go:291  [n3,s3,r206/1:/{Table/96/2-Max}] proposing ADD_REPLICA[(n1,s1):2LEARNER]: after=[(n3,s3):1 (n1,s1):2LEARNER] next=3
I190830 05:25:29.429965 488701 storage/store_snapshot.go:995  [n3,replicate,s3,r206/1:/{Table/96/2-Max}] sending LEARNER snapshot defa0dd4 at applied index 14
I190830 05:25:29.430081 488701 storage/store_snapshot.go:1038  [n3,replicate,s3,r206/1:/{Table/96/2-Max}] streamed snapshot to (n1,s1):2LEARNER: kv pairs: 6, log entries: 0, rate-limit: 8.0 MiB/sec, 0.00s
I190830 05:25:29.430901 485017 ccl/partitionccl/partition_test.go:216  query: SELECT count(*) FROM "single col range partitioning" WHERE a < 3
I190830 05:25:29.431362 488867 storage/replica_raftstorage.go:793  [n1,s1,r206/2:{-}] applying LEARNER snapshot [id=defa0dd4 index=14]
I190830 05:25:29.431574 488847 storage/replica_raft.go:291  [n3,s3,r201/1:/Table/9{4/2-6}] proposing ADD_REPLICA[(n2,s2):2]: after=[(n3,s3):1 (n2,s2):2] next=3
I190830 05:25:29.432651 488867 storage/replica_raftstorage.go:814  [n1,s1,r206/2:/{Table/96/2-Max}] applied LEARNER snapshot [total=1ms ingestion=4@1ms id=defa0dd4 index=14]
I190830 05:25:29.433090 488701 storage/replica_command.go:1430  [n3,replicate,s3,r206/1:/{Table/96/2-Max}] change replicas (add [(n1,s1):2] remove []): existing descriptor r206:/{Table/96/2-Max} [(n3,s3):1, (n1,s1):2LEARNER, next=3, gen=181]
I190830 05:25:29.433415 485017 ccl/partitionccl/partition_test.go:216  query: SELECT count(*) FROM "single col range partitioning" WHERE a < 3
E190830 05:25:29.433737 489061 storage/queue.go:1027  [n2,replicate,s2,r201/2:/Table/9{4/2-6}] no removable replicas from range that needs a removal: [1:0, 2*:23]
I190830 05:25:29.436480 485017 ccl/partitionccl/partition_test.go:216  query: SELECT count(*) FROM "single col range partitioning" WHERE a < 3
I190830 05:25:29.441609 485017 ccl/partitionccl/partition_test.go:216  query: SELECT count(*) FROM "single col range partitioning" WHERE a < 3
I190830 05:25:29.443999 485017 ccl/partitionccl/partition_test.go:216  query: SELECT count(*) FROM "single col range partitioning" WHERE a < 3
I190830 05:25:29.446012 488467 storage/replica_command.go:1430  [n3,s3,r177/4:/Table/94/1/{4-5}] change replicas (add [] remove [(n1,s1):3]): existing descriptor r177:/Table/94/1/{4-5} [(n1,s1):3, (n3,s3):4, next=5, gen=181]
I190830 05:25:29.446323 488701 storage/replica_raft.go:291  [n3,s3,r206/1:/{Table/96/2-Max}] proposing ADD_REPLICA[(n1,s1):2]: after=[(n3,s3):1 (n1,s1):2] next=3
I190830 05:25:29.446969 488701 storage/replica_command.go:1430  [n3,replicate,s3,r206/1:/{Table/96/2-Max}] change replicas (add [] remove [(n3,s3):1]): existing descriptor r206:/{Table/96/2-Max} [(n3,s3):1, (n1,s1):2, next=3, gen=182]

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/18bdfe1a691fa6d785d510e86d27cecdac9c436e

Parameters:

TAGS=
GOFLAGS=-parallel=4

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestRepartitioning PKG=github.com/cockroachdb/cockroach/pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1468649&tab=buildLog


I190905 06:07:03.433484 577643 storage/store_snapshot.go:1038  [n2,replicate,s2,r236/1:/Table/106/2{-/4}] streamed snapshot to (n1,s1):2LEARNER: kv pairs: 6, log entries: 0, rate-limit: 8.0 MiB/sec, 0.00s
I190905 06:07:03.433510 577733 storage/raft_snapshot_queue.go:122  [n2,raftsnapshot,s2,r236/1:/Table/106/2{-/4}] skipping snapshot; replica is likely a learner in the process of being added: (n1,s1):2LEARNER
I190905 06:07:03.434230 577570 storage/replica_raftstorage.go:795  [n1,s1,r236/2:{-}] applying LEARNER snapshot [id=7f3e3c34 index=14]
I190905 06:07:03.436185 573403 ccl/partitionccl/partition_test.go:216  query: SELECT count(*) FROM "secondary index - list partitioning - DEFAULT" WHERE b = 4
I190905 06:07:03.438057 577570 storage/replica_raftstorage.go:816  [n1,s1,r236/2:/Table/106/2{-/4}] applied LEARNER snapshot [total=4ms ingestion=4@3ms id=7f3e3c34 index=14]
I190905 06:07:03.438580 577643 storage/replica_command.go:1559  [n2,replicate,s2,r236/1:/Table/106/2{-/4}] change replicas (add [(n1,s1):2] remove []): existing descriptor r236:/Table/106/2{-/4} [(n2,s2):1, (n1,s1):2LEARNER, next=3, gen=193]
I190905 06:07:03.438996 573403 ccl/partitionccl/partition_test.go:216  query: SELECT count(*) FROM "secondary index - list partitioning - DEFAULT" WHERE b = 5
I190905 06:07:03.442545 573403 ccl/partitionccl/partition_test.go:216  query: SELECT count(*) FROM "secondary index - list partitioning - DEFAULT" WHERE b > 5
I190905 06:07:03.446819 573403 ccl/partitionccl/partition_test.go:216  query: SELECT count(*) FROM "secondary index - list partitioning - DEFAULT" WHERE b < 4
I190905 06:07:03.450565 577849 storage/replica_command.go:1559  [n3,replicate,s3,r230/4:/Table/104/2/{5-6}] change replicas (add [] remove [(n1,s1):3]): existing descriptor r230:/Table/104/2/{5-6} [(n1,s1):3, (n3,s3):4, next=5, gen=200]
I190905 06:07:03.450817 573403 ccl/partitionccl/partition_test.go:216  query: SELECT count(*) FROM "secondary index - list partitioning - DEFAULT" WHERE b = 5
I190905 06:07:03.453171 577643 storage/replica_raft.go:291  [n2,s2,r236/1:/Table/106/2{-/4}] proposing ADD_REPLICA[(n1,s1):2]: after=[(n2,s2):1 (n1,s1):2] next=3
I190905 06:07:03.454416 573403 ccl/partitionccl/partition_test.go:216  query: SELECT count(*) FROM "secondary index - list partitioning - DEFAULT" WHERE b > 5
I190905 06:07:03.454706 577849 storage/replica_raft.go:291  [n3,s3,r230/4:/Table/104/2/{5-6}] proposing REMOVE_REPLICA[(n1,s1):3]: after=[(n3,s3):4] next=5
I190905 06:07:03.459837 573403 ccl/partitionccl/partition_test.go:216  query: SELECT count(*) FROM "secondary index - list partitioning - DEFAULT" WHERE b < 4
I190905 06:07:03.463585 573403 ccl/partitionccl/partition_test.go:216  query: SELECT count(*) FROM "secondary index - list partitioning - DEFAULT" WHERE b < 4
I190905 06:07:03.463748 577789 storage/store.go:2577  [n1,replicaGC,s1,r230/3:/Table/104/2/{5-6}] removing replica r230/3
I190905 06:07:03.464207 577789 storage/replica_destroy.go:152  [n1,replicaGC,s1,r230/3:/Table/104/2/{5-6}] removed 5 (0+5) keys in 0ms [clear=0ms commit=0ms]
I190905 06:07:03.468557 573403 ccl/partitionccl/partition_test.go:216  query: SELECT count(*) FROM "secondary index - list partitioning - DEFAULT" WHERE b < 4
I190905 06:07:03.475369 573403 ccl/partitionccl/partition_test.go:216  query: SELECT count(*) FROM "secondary index - list partitioning - DEFAULT" WHERE b < 4
I190905 06:07:03.486206 573403 ccl/partitionccl/partition_test.go:216  query: SELECT count(*) FROM "secondary index - list partitioning - DEFAULT" WHERE b < 4
I190905 06:07:03.506214 573403 ccl/partitionccl/partition_test.go:216  query: SELECT count(*) FROM "secondary index - list partitioning - DEFAULT" WHERE b < 4
I190905 06:07:03.542735 573403 ccl/partitionccl/partition_test.go:216  query: SELECT count(*) FROM "secondary index - list partitioning - DEFAULT" WHERE b < 4
I190905 06:07:03.612426 573403 ccl/partitionccl/partition_test.go:216  query: SELECT count(*) FROM "secondary index - list partitioning - DEFAULT" WHERE b = 5

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/073999b81ddfed3bbc8409d534912fea12b6d500

Parameters:

TAGS=
GOFLAGS=-parallel=4

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestRepartitioning PKG=github.com/cockroachdb/cockroach/pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1498811&tab=buildLog


I190920 05:54:36.509000 577874 storage/replica_raft.go:291  [n1,s1,r196/3:/Table/98/1/{4-5}] proposing REMOVE_REPLICA[(n2,s2):2]: after=[(n1,s1):3] next=4
I190920 05:54:36.510255 577724 storage/replica_raft.go:291  [n1,s1,r198/1:/{Table/98/2-Max}] proposing ADD_REPLICA[(n3,s3):2]: after=[(n1,s1):1 (n3,s3):2] next=3
I190920 05:54:36.513251 577378 storage/replica_command.go:673  [n1,merge,s1,r195/1:/Table/98/1{-/4}] initiating a merge of r196:/Table/98/1/{4-5} [(n1,s1):3, next=4, gen=165] into this range (lhs+rhs has (size=0 B+0 B qps=0.00+0.00 --> 0.00qps) below threshold (size=0 B, qps=0.00))
I190920 05:54:36.518801 578114 storage/store.go:2613  [n2,replicaGC,s2,r196/2:/Table/98/1/{4-5}] removing replica r196/2
I190920 05:54:36.519186 578114 storage/replica_destroy.go:152  [n2,replicaGC,s2,r196/2:/Table/98/1/{4-5}] removed 8 (0+8) keys in 0ms [clear=0ms commit=0ms]
I190920 05:54:36.528528 577378 storage/replica_command.go:673  [n1,merge,s1,r195/1:/Table/98/1{-/4}] initiating a merge of r196:/Table/98/1/{4-5} [(n1,s1):3, next=4, gen=165] into this range (lhs+rhs has (size=0 B+0 B qps=0.00+0.00 --> 0.00qps) below threshold (size=0 B, qps=0.00))
I190920 05:54:36.533335 284496 sql/sqlbase/structured.go:1516  [n1,client=127.0.0.1:49208,user=root] publish: descID=98 (unpartitioned) version=5 mtime=1970-01-01 00:00:00 +0000 UTC
I190920 05:54:36.533861 283686 storage/store.go:2613  [n1,s1,r195/1:/Table/98/1{-/4}] removing replica r196/3
I190920 05:54:36.534824 578266 storage/replica_command.go:673  [n1,merge,s1,r124/1:/Table/98{-/1}] initiating a merge of r195:/Table/98/1{-/5} [(n1,s1):1, next=2, gen=166] into this range (lhs+rhs has (size=0 B+0 B qps=0.00+0.00 --> 0.00qps) below threshold (size=0 B, qps=0.00))
I190920 05:54:36.541460 283699 storage/store.go:2613  [n1,s1,r124/1:/Table/98{-/1}] removing replica r195/1
I190920 05:54:36.544787 577789 storage/replica_command.go:1580  [n3,s3,r197/2:/Table/98/{1/5-2}] change replicas (add [] remove [(n3,s3):2]): existing descriptor r197:/Table/98/{1/5-2} [(n3,s3):2, (n2,s2):3, next=4, gen=165]
I190920 05:54:36.548472 284496 sql/event_log.go:130  [n1,client=127.0.0.1:49208,user=root] Event: "drop_database", target: 97, info: {DatabaseName:data Statement:DROP DATABASE IF EXISTS data User:root DroppedSchemaObjects:[data.public.unpartitioned]}
I190920 05:54:36.548687 577789 storage/replica_raft.go:291  [n3,s3,r197/2:/Table/98/{1/5-2}] proposing REMOVE_REPLICA[(n3,s3):2]: after=[(n2,s2):3] next=4
E190920 05:54:36.548724 577789 storage/replica_raft.go:304  [n3,s3,r197/2:/Table/98/{1/5-2}] received invalid ChangeReplicasTrigger REMOVE_REPLICA[(n3,s3):2]: after=[(n2,s2):3] next=4 to remove self (leaseholder)
W190920 05:54:36.549408 577789 storage/replica_command.go:2193  [n3,s3,r197/2:/Table/98/{1/5-2}] while removing target n3,s3: change replicas of r197 failed: storage/replica_raft.go:305: [n3,s3,r197/2:/Table/98/{1/5-2}]: received invalid ChangeReplicasTrigger REMOVE_REPLICA[(n3,s3):2]: after=[(n2,s2):3] next=4 to remove self (leaseholder)
E190920 05:54:36.549690 577771 storage/queue.go:1032  [n2,merge,s2,r196/2:/Table/98/1/{4-5}] change replicas of r197 failed: storage/replica_raft.go:305: [n3,s3,r197/2:/Table/98/{1/5-2}]: received invalid ChangeReplicasTrigger REMOVE_REPLICA[(n3,s3):2]: after=[(n2,s2):3] next=4 to remove self (leaseholder)
W190920 05:54:36.551934 578758 storage/replica_command.go:2077  [n3,s3,r197/2:/Table/98/{1/5-2}] while transferring lease: unable to find store 1 in range r197:/Table/98/{1/5-2} [(n3,s3):2, (n2,s2):3, next=4, gen=165]
I190920 05:54:36.552278 578758 storage/replica_command.go:1580  [n3,s3,r197/2:/Table/98/{1/5-2}] change replicas (add [] remove [(n2,s2):3]): existing descriptor r197:/Table/98/{1/5-2} [(n3,s3):2, (n2,s2):3, next=4, gen=165]
I190920 05:54:36.555250 284496 sql/lease.go:330  [n1,client=127.0.0.1:49208,user=root,scExec] waiting for 1 leases to expire: desc=[{unpartitioned 98 4}]
I190920 05:54:36.557708 578751 storage/replica_command.go:1580  [n3,replicate,s3,r123/2:/Table/9{6/2-8}] change replicas (add [] remove [(n1,s1):1]): existing descriptor r123:/Table/9{6/2-8} [(n1,s1):1, (n3,s3):2, next=3, gen=158]

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/205330426e50b966ede8a4644d77570d8265fe66

Parameters:

TAGS=
GOFLAGS=-parallel=4

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestRepartitioning PKG=github.com/cockroachdb/cockroach/pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1602207&tab=buildLog

              [n1,client=127.0.0.1:53842,user=root] planning starts: SELECT
              [n1,client=127.0.0.1:53842,user=root] generating optimizer plan
              [n1,client=127.0.0.1:53842,user=root] added table 'data.public."multi col list partitioning - DEFAULT"' to table collection
              [n1,client=127.0.0.1:53842,user=root] query cache hit
              [n1,client=127.0.0.1:53842,user=root] planning ends
              [n1,client=127.0.0.1:53842,user=root] checking distributability
              [n1,client=127.0.0.1:53842,user=root] will distribute plan: true
              [n1,client=127.0.0.1:53842,user=root] execution starts: distributed engine
               === SPAN START: consuming rows ===
              [n1,client=127.0.0.1:53842,user=root] creating DistSQL plan with isLocal=false
              [n1,client=127.0.0.1:53842,user=root] querying next range at /Table/68/1/5/9
              [n1,client=127.0.0.1:53842,user=root] running DistSQL plan
               === SPAN START: flow ===
              [n1,client=127.0.0.1:53842,user=root] starting (0 processors, 0 startables)
               === SPAN START: table reader ===
            cockroach.processorid: 0
            cockroach.stat.tablereader.bytes.read: 0 B
            cockroach.stat.tablereader.input.rows: 0
            cockroach.stat.tablereader.stalltime: 169µs
              [n1,client=127.0.0.1:53842,user=root] starting scan with limitBatches true
              [n1,client=127.0.0.1:53842,user=root] Scan /Table/68/1/{5/9-6}
               === SPAN START: txn coordinator send ===
               === SPAN START: dist sender send ===
              [n1,client=127.0.0.1:53842,user=root,txn=03c2f6eb] querying next range at /Table/68/1/5/9
              [n1,client=127.0.0.1:53842,user=root,txn=03c2f6eb] r95: sending batch 1 Scan to (n1,s1):1
              [n1,client=127.0.0.1:53842,user=root,txn=03c2f6eb] sending request to local client
               === SPAN START: /cockroach.roachpb.Internal/Batch ===
              [n1] 1 Scan
              [n1,s1] executing 1 requests
              [n1,s1,r95/1:/Table/68/1/{5/9-6}] read-only path
              [n1,s1,r95/1:/Table/68/1/{5/9-6}] read has no clock uncertainty
              [n1,s1,r95/1:/Table/68/1/{5/9-6}] acquire latches
              [n1,s1,r95/1:/Table/68/1/{5/9-6}] waited 3.774µs to acquire latches
              [n1,s1,r95/1:/Table/68/1/{5/9-6}] waiting for read lock
              [n1,s1,r95/1:/Table/68/1/{5/9-6}] read completed
               === SPAN START: count rows ===
            cockroach.processorid: 1
            cockroach.stat.aggregator.input.rows: 0
            cockroach.stat.aggregator.mem.max: 0 B
            cockroach.stat.aggregator.stalltime: 33µs
              [n1,client=127.0.0.1:53842,user=root] execution ends
              [n1,client=127.0.0.1:53842,user=root] rows affected: 1
              [n1,client=127.0.0.1:53842,user=root] AutoCommit. err: <nil>
              [n1,client=127.0.0.1:53842,user=root] releasing 1 tables
               === SPAN START: exec stmt ===
              [n1,client=127.0.0.1:53842,user=root] [NoTxn pos:4361] executing ExecStmt: SET TRACING = off
              [n1,client=127.0.0.1:53842,user=root] executing: SET TRACING = off in state: NoTxn
            goroutine 337249 [running]:
            runtime/debug.Stack(0xc0055e5da0, 0x445a160, 0xc002c08580)
            	/usr/local/go/src/runtime/debug/stack.go:24 +0x9d
            github.com/cockroachdb/cockroach/pkg/testutils.SucceedsSoon(0x4542d80, 0xc00157dc00, 0xc0055e5da0)
            	/go/src/github.com/cockroachdb/cockroach/pkg/testutils/soon.go:37 +0x6b
            github.com/cockroachdb/cockroach/pkg/ccl/partitionccl.TestRepartitioning.func1(0xc00157dc00)
            	/go/src/github.com/cockroachdb/cockroach/pkg/ccl/partitionccl/partition_test.go:1359 +0x973
            testing.tRunner(0xc00157dc00, 0xc004b0de30)
            	/usr/local/go/src/testing/testing.go:865 +0xc0
            created by testing.(*T).Run
            	/usr/local/go/src/testing/testing.go:916 +0x35a

@cockroach-teamcity
Copy link
Member Author

(ccl/partitionccl).TestRepartitioning failed on master@59acbdd28072e90aed5d107416ac309c1793d339:


I191210 20:34:43.749289 1103236 storage/replica_raft.go:290  [n1,s1,r191/2:/Table/94/1/{4-5}] proposing SIMPLE(r1) REMOVE_REPLICA[(n2,s2):1LEARNER]: after=[(n1,s1):2] next=3
I191210 20:34:43.765236 535465 storage/store_remove_replica.go:129  [n2,s2,r191/1:/Table/94/1/{4-5}] removing replica r191/1
I191210 20:34:43.830201 536302 storage/queue.go:520  [n3,s3] rate limited in MaybeAdd (gc): throttled on async limiting semaphore
I191210 20:34:43.830788 536302 storage/queue.go:520  [n3,s3] rate limited in MaybeAdd (replicate): throttled on async limiting semaphore
I191210 20:34:43.981899 1103873 storage/replica_raft.go:290  [n3,s3,r193/3:/Table/94/{1/6-2}] proposing SIMPLE(r2) REMOVE_REPLICA[(n1,s1):2LEARNER]: after=[(n3,s3):3] next=4
I191210 20:34:44.009976 533308 storage/store_remove_replica.go:129  [n1,s1,r193/2:/Table/94/{1/6-2}] removing replica r193/2
I191210 20:34:44.027002 1101858 storage/replica_command.go:672  [n3,merge,s3,r192/2:/Table/94/1/{5-6}] initiating a merge of r193:/Table/94/{1/6-2} [(n3,s3):3, next=4, gen=157] into this range (lhs+rhs has (size=0 B+0 B qps=0.00+0.00 --> 0.00qps) below threshold (size=0 B, qps=0.00))
I191210 20:34:44.050783 1105306 storage/replica_command.go:1706  [n1,replicate,s1,r195/2:/Table/94/1{-/4}] change replicas (add [] remove [(n2,s2):3VOTER_DEMOTING]): existing descriptor r195:/Table/94/1{-/4} [(n1,s1):2, (n2,s2):3, next=4, gen=152]
I191210 20:34:44.233127 536509 sql/event_log.go:132  [n1,client=127.0.0.1:54928,user=root] Event: "drop_database", target: 93, info: {DatabaseName:data Statement:DROP DATABASE IF EXISTS data User:root DroppedSchemaObjects:[data.public.unpartitioned]}
I191210 20:34:44.259683 1105306 storage/replica_raft.go:290  [n1,s1,r195/2:/Table/94/1{-/4}] proposing ENTER_JOINT(r3 l3) REMOVE_REPLICA[(n2,s2):3VOTER_DEMOTING]: after=[(n1,s1):2 (n2,s2):3VOTER_DEMOTING] next=4
I191210 20:34:44.378172 1105306 storage/replica_command.go:1706  [n1,replicate,s1,r195/2:/Table/94/1{-/4}] change replicas (add [] remove []): existing descriptor r195:/Table/94/1{-/4} [(n1,s1):2, (n2,s2):3VOTER_DEMOTING, next=4, gen=153]
I191210 20:34:44.477739 536222 storage/store_remove_replica.go:129  [n3,s3,r192/2:/Table/94/1/{5-6}] removing replica r193/3
I191210 20:34:44.539525 1105306 storage/replica_raft.go:290  [n1,s1,r195/2:/Table/94/1{-/4}] proposing LEAVE_JOINT: after=[(n1,s1):2 (n2,s2):3LEARNER] next=4
I191210 20:34:44.553206 1105306 storage/replica_command.go:1706  [n1,replicate,s1,r195/2:/Table/94/1{-/4}] change replicas (add [] remove [(n2,s2):3LEARNER]): existing descriptor r195:/Table/94/1{-/4} [(n1,s1):2, (n2,s2):3LEARNER, next=4, gen=154]
I191210 20:34:44.595148 536509 sql/lease.go:326  [n1,client=127.0.0.1:54928,user=root,scExec] waiting for 1 leases to expire: desc=[{unpartitioned 94 4}]
I191210 20:34:44.707258 536509 sql/sqlbase/structured.go:1425  [n1,client=127.0.0.1:54928,user=root,scExec] publish: descID=94 (unpartitioned) version=6 mtime=1970-01-01 00:00:00 +0000 UTC
I191210 20:34:44.832172 1105230 storage/replica_command.go:1706  [n1,s1,r191/2:/Table/94/1/{4-5}] change replicas (add [(n2,s2):3LEARNER] remove []): existing descriptor r191:/Table/94/1/{4-5} [(n1,s1):2, next=3, gen=150]
I191210 20:34:45.094349 1105230 storage/replica_raft.go:290  [n1,s1,r191/2:/Table/94/1/{4-5}] proposing SIMPLE(l3) ADD_REPLICA[(n2,s2):3LEARNER]: after=[(n1,s1):2 (n2,s2):3LEARNER] next=4
I191210 20:34:45.148168 1105306 storage/replica_raft.go:290  [n1,s1,r195/2:/Table/94/1{-/4}] proposing SIMPLE(r3) REMOVE_REPLICA[(n2,s2):3LEARNER]: after=[(n1,s1):2] next=4
I191210 20:34:45.157097 1106877 storage/raft_snapshot_queue.go:125  [n1,raftsnapshot,s1,r191/2:/Table/94/1/{4-5}] skipping snapshot; replica is likely a learner in the process of being added: (n2,s2):3LEARNER
I191210 20:34:45.162680 1105230 storage/store_snapshot.go:977  [n1,s1,r191/2:/Table/94/1/{4-5}] sending LEARNER snapshot f23f8a02 at applied index 47
I191210 20:34:45.164592 1105230 storage/store_snapshot.go:1020  [n1,s1,r191/2:/Table/94/1/{4-5}] streamed snapshot to (n2,s2):3LEARNER: kv pairs: 14, log entries: 0, rate-limit: 8.0 MiB/sec, 0.04s
I191210 20:34:45.171647 535414 storage/store_remove_replica.go:129  [n2,s2,r195/3:/Table/94/1{-/4}] removing replica r195/3
I191210 20:34:45.179125 1106866 storage/replica_raftstorage.go:798  [n2,s2,r191/3:{-}] applying LEARNER snapshot [id=f23f8a02 index=47]
I191210 20:34:45.442194 1106841 storage/replica_command.go:1706  [n2,s2,r194/1:/{Table/94/2-Max}] change replicas (add [(n3,s3):2LEARNER] remove []): existing descriptor r194:/{Table/94/2-Max} [(n2,s2):1, next=2, gen=147]
I191210 20:34:45.486963 1106866 storage/replica_raftstorage.go:819  [n2,s2,r191/3:/Table/94/1/{4-5}] applied LEARNER snapshot [total=307ms ingestion=4@22ms id=f23f8a02 index=47]
details

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestRepartitioning PKG=./pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS=-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(ccl/partitionccl).TestRepartitioning failed on master@ac49d1323b105ce52b2b0980384f12c2f584ef92:

Fatal error:

F191216 21:58:13.597607 564537 storage/replica_init.go:276  [n2,s2,r9/3:/Table/1{3-4}] range descriptor ID (186) does not match replica's range ID (9)

Stack:

goroutine 564537 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xa22c601, 0x0, 0x0, 0xda07a7)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xbf
github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0x985ede0, 0xc000000004, 0x8faf459, 0x17, 0x114, 0xc005ced6e0, 0x5a)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:214 +0xc1c
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x69ec8e0, 0xc005723ce0, 0xc000000004, 0x2, 0x5ba8f39, 0x3f, 0xc003bd0330, 0x2, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x291
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x69ec8e0, 0xc005723ce0, 0x1, 0x4, 0x5ba8f39, 0x3f, 0xc003bd0330, 0x2, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:44 +0x9a
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:155
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).setDesc(0xc00231e900, 0x69ec8e0, 0xc005723ce0, 0xc00296bf10)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_init.go:276 +0x907
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleDescResult(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_result.go:263
github.com/cockroachdb/cockroach/pkg/storage.(*replicaStateMachine).handleNonTrivialReplicatedEvalResult(0xc00231e9d0, 0x69ec8e0, 0xc005723ce0, 0xc001da56d0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_state_machine.go:1118 +0xa6f
github.com/cockroachdb/cockroach/pkg/storage.(*replicaStateMachine).ApplySideEffects(0xc00231e9d0, 0x6a290c0, 0xc001210008, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_state_machine.go:1001 +0xb60
github.com/cockroachdb/cockroach/pkg/storage/apply.mapCheckedCmdIter(0x153ed6225380, 0xc00231ebe8, 0xc003bd1048, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/cmd.go:182 +0x129
github.com/cockroachdb/cockroach/pkg/storage/apply.(*Task).applyOneBatch(0xc003bd1578, 0x69ec8e0, 0xc007833ce0, 0x6a29180, 0xc00231eb88, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/task.go:281 +0x272
github.com/cockroachdb/cockroach/pkg/storage/apply.(*Task).ApplyCommittedEntries(0xc003bd1578, 0x69ec8e0, 0xc007833ce0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/task.go:247 +0x103
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReadyRaftMuLocked(0xc00231e900, 0x69ec8e0, 0xc007833ce0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:721 +0x10fa
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReady(0xc00231e900, 0x69ec8e0, 0xc007833ce0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:395 +0x1a0
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processReady(0xc000a1a700, 0x69ec8e0, 0xc001e64420, 0x9)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store_raft.go:487 +0x1a0
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc001b186c0, 0x69ec8e0, 0xc001e64420)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:226 +0x31a
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2(0x69ec8e0, 0xc001e64420)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:166 +0x56
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc00039ee90, 0xc003794b40, 0xc00039ee80)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:197 +0x160
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:190 +0xc4
Log preceding fatal error

I191216 21:58:08.910366 1281927 storage/replica_raft.go:247  [n1,s1,r186/3:/Table/98/1/{4-5}] proposing SIMPLE(l4) ADD_REPLICA[(n3,s3):4LEARNER]: after=[(n1,s1):3 (n3,s3):4LEARNER] next=5
I191216 21:58:08.980140 1282388 storage/raft_snapshot_queue.go:169  [n1,raftsnapshot,s1,r186/3:/Table/98/1/{4-5}] skipping snapshot; replica is likely a learner in the process of being added: (n3,s3):4LEARNER
I191216 21:58:09.023713 1281927 storage/store_snapshot.go:977  [n1,s1,r186/3:/Table/98/1/{4-5}] sending LEARNER snapshot 204b06a7 at applied index 66
I191216 21:58:09.025533 1281927 storage/store_snapshot.go:1020  [n1,s1,r186/3:/Table/98/1/{4-5}] streamed snapshot to (n3,s3):4LEARNER: kv pairs: 21, log entries: 0, rate-limit: 8.0 MiB/sec, 0.08s
I191216 21:58:09.045173 1282259 storage/replica_raftstorage.go:798  [n3,s3,r186/4:{-}] applying LEARNER snapshot [id=204b06a7 index=66]
I191216 21:58:09.125718 1282259 storage/replica_raftstorage.go:819  [n3,s3,r186/4:/Table/98/1/{4-5}] applied LEARNER snapshot [total=80ms ingestion=4@77ms id=204b06a7 index=66]
I191216 21:58:09.128556 1281532 storage/replica_raft.go:247  [n3,s3,r187/1:/{Table/98/1/5-Max}] proposing SIMPLE(v2) ADD_REPLICA[(n1,s1):2]: after=[(n3,s3):1 (n1,s1):2] next=3
I191216 21:58:09.156660 1281927 storage/replica_command.go:1706  [n1,s1,r186/3:/Table/98/1/{4-5}] change replicas (add [(n3,s3):4] remove []): existing descriptor r186:/Table/98/1/{4-5} [(n1,s1):3, (n3,s3):4LEARNER, next=5, gen=171]
I191216 21:58:09.401496 1282766 storage/replica_command.go:1706  [n1,replicate,s1,r187/2:/{Table/98/1/5-Max}] change replicas (add [] remove [(n3,s3):1VOTER_DEMOTING]): existing descriptor r187:/{Table/98/1/5-Max} [(n3,s3):1, (n1,s1):2, next=3, gen=164]
I191216 21:58:09.453412 1282738 storage/replica_command.go:1706  [n1,s1,r187/2:/{Table/98/1/5-Max}] change replicas (add [] remove [(n3,s3):1VOTER_DEMOTING]): existing descriptor r187:/{Table/98/1/5-Max} [(n3,s3):1, (n1,s1):2, next=3, gen=164]
I191216 21:58:09.556362 1281927 storage/replica_raft.go:247  [n1,s1,r186/3:/Table/98/1/{4-5}] proposing SIMPLE(v4) ADD_REPLICA[(n3,s3):4]: after=[(n1,s1):3 (n3,s3):4] next=5
I191216 21:58:09.768216 1282766 storage/replica_raft.go:247  [n1,s1,r187/2:/{Table/98/1/5-Max}] proposing ENTER_JOINT(r1 l1) REMOVE_REPLICA[(n3,s3):1VOTER_DEMOTING]: after=[(n3,s3):1VOTER_DEMOTING (n1,s1):2] next=3
I191216 21:58:09.792620 562466 server/status/runtime.go:498  [n1] runtime stats: 2.1 GiB RSS, 832 goroutines, 112 MiB/57 MiB/205 MiB GO alloc/idle/total, 96 MiB/148 MiB CGO alloc/total, 1971.2 CGO/sec, 254.3/29.4 %(u/s)time, 0.6 %gc (6x), 3.0 MiB/3.0 MiB (r/w)net
I191216 21:58:09.867427 1282766 storage/replica_command.go:1706  [n1,replicate,s1,r187/2:/{Table/98/1/5-Max}] change replicas (add [] remove []): existing descriptor r187:/{Table/98/1/5-Max} [(n3,s3):1VOTER_DEMOTING, (n1,s1):2, next=3, gen=165]
I191216 21:58:09.954194 1283384 storage/replica_command.go:1706  [n3,s3,r186/4:/Table/98/1/{4-5}] change replicas (add [] remove [(n1,s1):3VOTER_DEMOTING]): existing descriptor r186:/Table/98/1/{4-5} [(n1,s1):3, (n3,s3):4, next=5, gen=172]
I191216 21:58:10.208828 1282766 storage/replica_raft.go:247  [n1,s1,r187/2:/{Table/98/1/5-Max}] proposing LEAVE_JOINT: after=[(n3,s3):1LEARNER (n1,s1):2] next=3
I191216 21:58:10.274317 1282766 storage/replica_command.go:1706  [n1,replicate,s1,r187/2:/{Table/98/1/5-Max}] change replicas (add [] remove [(n3,s3):1LEARNER]): existing descriptor r187:/{Table/98/1/5-Max} [(n3,s3):1LEARNER, (n1,s1):2, next=3, gen=166]
I191216 21:58:10.432737 565285 storage/queue.go:524  [n3,s3] rate limited in MaybeAdd (merge): throttled on async limiting semaphore
I191216 21:58:10.552835 1283384 storage/replica_command.go:1706  [n3,s3,r186/4:/Table/98/1/{4-5}] change replicas (add [] remove [(n1,s1):3VOTER_DEMOTING]): existing descriptor r186:/Table/98/1/{4-5} [(n1,s1):3, (n3,s3):4, next=5, gen=172]
I191216 21:58:10.813384 1284517 storage/replica_raft.go:247  [n1,s1,r186/3:/Table/98/1/{4-5}] proposing ENTER_JOINT(r3 l3) REMOVE_REPLICA[(n1,s1):3VOTER_DEMOTING]: after=[(n1,s1):3VOTER_DEMOTING (n3,s3):4] next=5
E191216 21:58:10.813886 1284517 storage/replica_raft.go:267  [n1,s1,r186/3:/Table/98/1/{4-5}] received invalid ChangeReplicasTrigger ENTER_JOINT(r3 l3) REMOVE_REPLICA[(n1,s1):3VOTER_DEMOTING]: after=[(n1,s1):3VOTER_DEMOTING (n3,s3):4] next=5 to remove self (leaseholder)
I191216 21:58:10.866582 1282766 storage/replica_raft.go:247  [n1,s1,r187/2:/{Table/98/1/5-Max}] proposing SIMPLE(r1) REMOVE_REPLICA[(n3,s3):1LEARNER]: after=[(n1,s1):2] next=3
I191216 21:58:10.917664 565266 storage/store_remove_replica.go:129  [n3,s3,r187/1:/{Table/98/1/5-Max}] removing replica r187/1
E191216 21:58:10.954170 1281529 storage/queue.go:1035  [n3,merge,s3,r184/1:/Table/98{-/1/4}] while carrying out changes [{REMOVE_REPLICA n1,s1}]: change replicas of r186 failed: storage/replica_raft.go:268: [n1,s1,r186/3:/Table/98/1/{4-5}]: received invalid ChangeReplicasTrigger ENTER_JOINT(r3 l3) REMOVE_REPLICA[(n1,s1):3VOTER_DEMOTING]: after=[(n1,s1):3VOTER_DEMOTING (n3,s3):4] next=5 to remove self (leaseholder)
E191216 21:58:11.048554 1281768 storage/queue.go:1035  [n1,merge,s1,r186/3:/Table/98/1/{4-5}] while carrying out changes [{REMOVE_REPLICA n3,s3}]: change replicas of r187 failed: descriptor changed: [expected] r187:/{Table/98/1/5-Max} [(n3,s3):1, (n1,s1):2, next=3, gen=164] != [actual] r187:/{Table/98/1/5-Max} [(n1,s1):2, next=3, gen=167]
I191216 21:58:11.147611 565621 sql/event_log.go:132  [n1,client=127.0.0.1:46088,user=root] Event: "create_database", target: 99, info: {DatabaseName:data Statement:CREATE DATABASE data User:root}
I191216 21:58:11.250358 1284731 storage/replica_command.go:1706  [n1,replicate,s1,r186/3:/Table/98/1/{4-5}] change replicas (add [] remove [(n3,s3):4VOTER_DEMOTING]): existing descriptor r186:/Table/98/1/{4-5} [(n1,s1):3, (n3,s3):4, next=5, gen=172]
I191216 21:58:11.623236 564564 server/status/runtime.go:498  [n2] runtime stats: 2.2 GiB RSS, 894 goroutines, 120 MiB/54 MiB/205 MiB GO alloc/idle/total, 100 MiB/151 MiB CGO alloc/total, 1978.1 CGO/sec, 248.7/27.2 %(u/s)time, 0.5 %gc (7x), 3.0 MiB/3.0 MiB (r/w)net
I191216 21:58:11.715362 564504 storage/queue.go:524  [n2,s2,r1/3:/{Min-System/NodeL…}] rate limited in MaybeAdd (merge): throttled on async limiting semaphore
I191216 21:58:11.925656 1285134 storage/replica_command.go:1706  [n1,s1,r187/2:/{Table/98/1/5-Max}] change replicas (add [(n3,s3):3LEARNER] remove []): existing descriptor r187:/{Table/98/1/5-Max} [(n1,s1):2, next=3, gen=167]
I191216 21:58:12.011619 1285565 storage/replica_command.go:1706  [n3,s3,r186/4:/Table/98/1/{4-5}] change replicas (add [] remove [(n1,s1):3VOTER_DEMOTING]): existing descriptor r186:/Table/98/1/{4-5} [(n1,s1):3, (n3,s3):4, next=5, gen=172]
I191216 21:58:12.038516 1284731 storage/replica_command.go:1706  [n1,replicate,s1,r186/3:/Table/98/1/{4-5}] change replicas (add [] remove [(n3,s3):4VOTER_DEMOTING]): existing descriptor r186:/Table/98/1/{4-5} [(n1,s1):3, (n3,s3):4, next=5, gen=172]
I191216 21:58:12.128909 565621 sql/event_log.go:132  [n1,client=127.0.0.1:46088,user=root] Event: "create_table", target: 100, info: {TableName:data.public."secondary index - unpartitioned" Statement:CREATE TABLE "secondary index - unpartitioned" (a INT8 PRIMARY KEY, b INT8, INDEX b_idx (b)) User:root}
I191216 21:58:12.424385 562473 storage/queue.go:524  [n1,s1] rate limited in MaybeAdd (merge): throttled on async limiting semaphore
I191216 21:58:12.462383 565337 server/status/runtime.go:498  [n3] runtime stats: 2.2 GiB RSS, 936 goroutines, 101 MiB/68 MiB/205 MiB GO alloc/idle/total, 104 MiB/152 MiB CGO alloc/total, 1984.7 CGO/sec, 249.0/27.4 %(u/s)time, 0.5 %gc (6x), 3.0 MiB/3.0 MiB (r/w)net
I191216 21:58:12.542387 565621 sql/sqlbase/structured.go:1425  [n1,client=127.0.0.1:46088,user=root] publish: descID=100 (secondary index - list partitioning) version=2 mtime=1970-01-01 00:00:00 +0000 UTC
I191216 21:58:12.856143 1285565 storage/replica_command.go:1706  [n3,s3,r186/4:/Table/98/1/{4-5}] change replicas (add [] remove [(n1,s1):3VOTER_DEMOTING]): existing descriptor r186:/Table/98/1/{4-5} [(n1,s1):3, (n3,s3):4, next=5, gen=172]
I191216 21:58:12.865184 1285134 storage/replica_raft.go:247  [n1,s1,r187/2:/{Table/98/1/5-Max}] proposing SIMPLE(l3) ADD_REPLICA[(n3,s3):3LEARNER]: after=[(n1,s1):2 (n3,s3):3LEARNER] next=4
I191216 21:58:13.034560 1286952 storage/raft_snapshot_queue.go:169  [n1,raftsnapshot,s1,r187/2:/{Table/98/1/5-Max}] skipping snapshot; replica is likely a learner in the process of being added: (n3,s3):3LEARNER
I191216 21:58:13.039718 1285134 storage/store_snapshot.go:977  [n1,s1,r187/2:/{Table/98/1/5-Max}] sending LEARNER snapshot 7c8a45e7 at applied index 51
I191216 21:58:13.044633 1285134 storage/store_snapshot.go:1020  [n1,s1,r187/2:/{Table/98/1/5-Max}] streamed snapshot to (n3,s3):3LEARNER: kv pairs: 18, log entries: 0, rate-limit: 8.0 MiB/sec, 0.10s
I191216 21:58:13.113704 1286809 storage/replica_command.go:406  [n1,split,s1,r187/2:/{Table/98/1/5-Max}] initiating a split of this range at key /Table/100 [r150] (zone config); r187/3 is being probed (may or may not need a Raft snapshot)
I191216 21:58:13.153661 1286986 storage/replica_raftstorage.go:798  [n3,s3,r187/3:{-}] applying LEARNER snapshot [id=7c8a45e7 index=51]
I191216 21:58:13.159030 1286809 storage/split_queue.go:149  [n1,split,s1,r187/2:/{Table/98/1/5-Max}] split saw concurrent descriptor modification; maybe retrying
I191216 21:58:13.530618 565285 storage/queue.go:524  [n3,s3] rate limited in MaybeAdd (gc): throttled on async limiting semaphore
I191216 21:58:13.535708 565285 storage/queue.go:524  [n3,s3] rate limited in MaybeAdd (replicate): throttled on async limiting semaphore
I191216 21:58:13.536393 564573 storage/queue.go:524  [n2,s2] rate limited in MaybeAdd (gc): throttled on async limiting semaphore
I191216 21:58:13.540193 564573 storage/queue.go:524  [n2,s2] rate limited in MaybeAdd (replicate): throttled on async limiting semaphore
I191216 21:58:13.541663 1287525 storage/replica_raft.go:247  [n2,s2,r9/3:/Table/1{3-4}] proposing ENTER_JOINT(r4 l4) REMOVE_REPLICA[(n3,s3):4VOTER_DEMOTING]: after=[(n1,s1):3 (n3,s3):4VOTER_DEMOTING] next=5
I191216 21:58:13.591451 1286986 storage/replica_raftstorage.go:819  [n3,s3,r187/3:/{Table/98/1/5-Max}] applied LEARNER snapshot [total=437ms ingestion=4@55ms id=7c8a45e7 index=51]

Repro

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestRepartitioning PKG=./pkg/ccl/partitionccl TESTTIMEOUT=5m STRESSFLAGS=-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@danhhz
Copy link
Contributor

danhhz commented Dec 16, 2019

This last failure seems to be the same thing being investigated in #43067

craig bot pushed a commit that referenced this issue Dec 17, 2019
43145: storage/diskmap: remove SortedDiskMapIterator.{Key,Value} r=petermattis a=petermattis

Remove `SortedDiskMapIterator.{Key,Value}` as these accessors are
horribly slow due to performing an allocation on every call. Change the
existing uses of these methods to `Unsafe{Key,Value}` adding copying
when necessary. Most of the use cases were easy to fix, though note that
`diskRowIterator.Row()` contains a TODO because the removal of the
allocation there caused many test failures.

The `PebbleMapIteration` benchmarks still show a regression in
comparison to f5009c8. That regression
is entirely due to Pebble's new memtable sizing heuristics which start
with a small memtable size and dynamically grow the size up to the
configured limit. Adding a knob to disable that behavior for the
purposes of a benchmark does not seem worthwhile.

```
name                                     old time/op    new time/op    delta
RocksDBMapIteration/InputSize4096-16       1.18ms ± 3%    0.81ms ± 0%   -31.56%  (p=0.000 n=10+8)
RocksDBMapIteration/InputSize16384-16      5.83ms ± 1%    4.14ms ± 3%   -29.13%  (p=0.000 n=9+10)
RocksDBMapIteration/InputSize65536-16      24.8ms ± 1%    17.7ms ± 3%   -28.54%  (p=0.000 n=9+10)
RocksDBMapIteration/InputSize262144-16      137ms ± 0%     105ms ± 2%   -23.65%  (p=0.000 n=9+9)
RocksDBMapIteration/InputSize1048576-16     547ms ± 1%     430ms ± 1%   -21.44%  (p=0.000 n=8+9)
PebbleMapIteration/InputSize4096-16         594µs ± 1%     323µs ± 2%   -45.65%  (p=0.000 n=9+9)
PebbleMapIteration/InputSize16384-16       3.29ms ± 1%    2.15ms ± 1%   -34.70%  (p=0.000 n=10+9)
PebbleMapIteration/InputSize65536-16       16.0ms ± 7%    11.2ms ±11%   -30.26%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize262144-16      96.7ms ± 3%    76.5ms ± 5%   -20.88%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize1048576-16      267ms ± 0%     185ms ± 1%   -30.60%  (p=0.000 n=9+10)

name                                     old alloc/op   new alloc/op   delta
RocksDBMapIteration/InputSize4096-16        262kB ± 0%       0kB ± 0%   -99.97%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize16384-16      1.31MB ± 0%    0.00MB ± 0%   -99.99%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize65536-16      5.51MB ± 0%    0.00MB ± 3%  -100.00%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize262144-16     22.3MB ± 0%     0.0MB ± 0%  -100.00%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize1048576-16    89.4MB ± 0%     0.0MB ± 0%  -100.00%  (p=0.000 n=10+9)
PebbleMapIteration/InputSize4096-16         263kB ± 0%       0kB ± 0%   -99.91%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize16384-16       1.31MB ± 0%    0.00MB ± 0%   -99.98%  (p=0.000 n=10+8)
PebbleMapIteration/InputSize65536-16       5.50MB ± 0%    0.00MB ± 3%   -99.99%  (p=0.000 n=10+9)
PebbleMapIteration/InputSize262144-16      22.3MB ± 0%     0.0MB ± 0%   -99.99%  (p=0.000 n=10+7)
PebbleMapIteration/InputSize1048576-16     89.3MB ± 0%     0.0MB ±26%  -100.00%  (p=0.000 n=10+10)

name                                     old allocs/op  new allocs/op  delta
RocksDBMapIteration/InputSize4096-16        8.20k ± 0%     0.00k ± 0%   -99.96%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize16384-16       41.0k ± 0%      0.0k ± 0%   -99.99%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize65536-16        172k ± 0%        0k ± 0%  -100.00%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize262144-16       696k ± 0%        0k ± 0%  -100.00%  (p=0.000 n=10+10)
RocksDBMapIteration/InputSize1048576-16     2.79M ± 0%     0.00M ± 0%  -100.00%  (p=0.000 n=9+9)
PebbleMapIteration/InputSize4096-16         8.20k ± 0%     0.01k ± 0%   -99.94%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize16384-16        41.0k ± 0%      0.0k ± 0%   -99.99%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize65536-16         172k ± 0%        0k ± 0%  -100.00%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize262144-16        696k ± 0%        0k ± 0%  -100.00%  (p=0.000 n=10+10)
PebbleMapIteration/InputSize1048576-16      2.79M ± 0%     0.00M ± 9%  -100.00%  (p=0.000 n=10+10)
```

Release note: None

43207: roachprod/gce: use 2 SSDs for c2 machine types r=nvanbenschoten a=nvanbenschoten

This is required to spin up c2 instances, just as it is to spin up
n2 instances.

Release note: None

43214: sql: stop swallowing errors from UncachedPhysicalAccessor.IsValidSchema r=andreimatei a=ajwerner

Before this commit we'd swallow retriable errors during execution. This can
be very problematic as it can lead to lost writes and other general funkiness.
We're opting to not write a test for this specific case as there is ongoing
work to change interfaces to preclude this sort of bug.

Fixes #43067
Fixes #37883 (comment)

Release note: None

43219: pgwire/pgcode: fix DeprecatedInternalConnectionFailure r=nvanbenschoten a=nvanbenschoten

This was moved from "08006" in #41451, not "XXC03".

Release note: None

Co-authored-by: Peter Mattis <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
@craig craig bot closed this as completed in 892ab04 Dec 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants