kvserver: leases appear stuck when draining a node #101885

nicktrav · 2023-04-19T23:17:32Z

Describe the problem

When draining a node in a 23.1.0-beta.2 cluster, I've noticed that occasionally a node will get stuck draining:

$ ./cockroach node drain --insecure --self
node is draining... remaining: 1766
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining...

To Reproduce

We have a test cluster with instructions on the topology and workloads here.

This doesn't reproduce reliably, but I've been doing the following:

Shell to a node - roachprod ssh $CLUSTER:$NODE
Run a drain command $ ./cockroach node drain --self --insecure
If the drain completes, cycle the node - roachprod stop $CLUSTER:$NODE, then start
Wait for the node to come back up, wait for underreplicated nodes to reduce to zero, etc.
Move to the next node.

Expected behavior

We expect the drain to complete in a timely fashion, rather than hanging.

Environment:

23.1.0-beta.2

Additional context

See this internal thread (internal).

Jira issue: CRDB-27173

Epic CRDB-27235

The text was updated successfully, but these errors were encountered:

nicktrav · 2023-04-19T23:18:57Z

It's possible that this is #100761, but there are is some other weirdness that might be worth digging into.

In the latest instance of this, we noticed the following (this is after three attempts to drain):

ubuntu@grinaker-231-0006:~$ ./cockroach node drain --self --insecure
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining...

I230419 23:05:09.211117 1359047790 1@server/drain.go:295 ⋮ [T1,n6] 226463  drain details: range lease iterations: 12, distSQL execution flows: 1

root@localhost:26257/defaultdb> SELECT * FROM crdb_internal.cluster_distsql_flows WHERE now() - since > '1m'::INTERVAL;
                flow_id                | node_id | stmt |           since            | status
---------------------------------------+---------+------+----------------------------+----------
  1b7142fb-e009-4026-9341-7dfc7e4d823e |       1 |      | 2023-04-19 22:53:08.606+00 | running
  1b7142fb-e009-4026-9341-7dfc7e4d823e |       2 |      | 2023-04-19 22:53:08.606+00 | running
  1b7142fb-e009-4026-9341-7dfc7e4d823e |       3 |      | 2023-04-19 22:53:08.606+00 | running
  1b7142fb-e009-4026-9341-7dfc7e4d823e |       4 |      | 2023-04-19 22:53:08.606+00 | running
  1b7142fb-e009-4026-9341-7dfc7e4d823e |       5 |      | 2023-04-19 22:53:08.606+00 | running
  1b7142fb-e009-4026-9341-7dfc7e4d823e |       7 |      | 2023-04-19 22:53:08.606+00 | running
(6 rows)

Time: 20ms total (execution 20ms / network 0ms)

nicktrav · 2023-04-19T23:21:23Z

Also worth noting that the patch isn't in 23.1.0-beta.2. It will unlikely make beta.3 either. Ideally we'd confirm that this isn't an issue before GA, which would require either an RC or another beta.

@yuzefovich - will defer to you to figure out if we should slap a GA blocker label on this or not.

yuzefovich · 2023-04-19T23:24:30Z

I'll add a GA-blocker to justify backporting #101884 to 23.1.0.

yuzefovich · 2023-04-19T23:25:29Z

Although leases being stuck is a KV issue, so I'll add to the KV board too, cc @kvoli

nicktrav · 2023-04-19T23:26:32Z

From Austen:

Its a voter incoming - which is a joint config state… It needs to exit the joint config but doesn’t appear to be doing it

I230419 23:19:36.501199 1363779280 kv/kvserver/store.go:1701 ⋮ [T1,drain,n6,s6,r158617/4:‹/Table/633/1/3{38159…-40002…}›] 231384  attempting to transfer lease repl=(n6,s6):4VOTER_INCOMING seq=2315 start=1681940763.877170751,0 epo=2 pro=1681940763.886123978,0 for range r158617:‹/Table/633/1/3{381595941584118592-400024257342070192}› [(n8,s8):1, (n3,s3):5, (n6,s6):4, next=6, gen=10381]

kvoli · 2023-04-19T23:45:09Z

Looks like the voter incoming was transient. Still stuck but with leader/lease split only.
Before I lose the context, one of the offending ranges has this status entry:

When running manually via the RQ:

2023-04-19 23:41:02	kv/kvserver/store.go:3500 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] running replicate.process
2023-04-19 23:41:02	kv/kvserver/store.go:3502 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] processed: true (err: )
2023-04-19 23:41:02	kv/kvserver/replicate_queue.go:1054 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] next replica action: consider rebalance
2023-04-19 23:41:02	kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s1 is a live target, candidate for rebalancing
2023-04-19 23:41:02	kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s5 is a live target, candidate for rebalancing
2023-04-19 23:41:02	kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s7 is a live target, candidate for rebalancing
2023-04-19 23:41:02	kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s2 is a live target, candidate for rebalancing
2023-04-19 23:41:02	kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s8 is a live target, candidate for rebalancing
2023-04-19 23:41:02	kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s4 is a live target, candidate for rebalancing
2023-04-19 23:41:02	kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s1 is a live target, candidate for rebalancing
2023-04-19 23:41:02	kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s5 is a live target, candidate for rebalancing
2023-04-19 23:41:02	kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s7 is a live target, candidate for rebalancing
2023-04-19 23:41:02	kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s2 is a live target, candidate for rebalancing
2023-04-19 23:41:02	kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s3 is a live target, candidate for rebalancing
2023-04-19 23:41:02	kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s4 is a live target, candidate for rebalancing
2023-04-19 23:41:02	kv/kvserver/allocator/allocatorimpl/allocator.go:2860 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] not considering [n8, s8] as a potential candidate for a lease transfer because the replica may be waiting for a snapshot
2023-04-19 23:41:02	kv/kvserver/allocator/allocatorimpl/allocator.go:2860 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] not considering [n3, s3] as a potential candidate for a lease transfer because the replica may be waiting for a snapshot
2023-04-19 23:41:02	kv/kvserver/allocator/allocatorimpl/allocator.go:2252 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] no lease transfer target found for r158617

I230419 23:19:33.997372 1363710931 kv/kvserver/store.go:1701 ⋮ [T1,drain,n6,s6,r158617/4:‹/Table/633/1/3{38159…-40002…}›] 231351  attempting to transfer lease repl=(n6,s6):4VOTER_INCOMING seq=2315 start=1681940763.877170751,0 epo=2 pro=1681940763.886123978,0 for range r158617:‹/Table/633/1/3{381595941584118592-400024257342070192}› [(n8,s8):1, (n3,s3):5, (n6,s6):4, next=6, gen=10381]
I230419 23:19:33.997606 1363710931 kv/kvserver/store.go:1729 ⋮ [T1,drain,n6,s6,r158617/4:‹/Table/633/1/3{38159…-40002…}›] 231352  failed to transfer lease repl=(n6,s6):4VOTER_INCOMING seq=2315 start=1681940763.877170751,0 epo=2 pro=1681940763.886123978,0 for range r158617:‹/Table/633/1/3{381595941584118592-400024257342070192}› [(n8,s8):1, (n3,s3):5, (n6,s6):4, next=6, gen=10381] when draining: ‹no suitable transfer target found›

The full status entry for `r158617`

{
  "nodeId": 2,
  "rangeId": "158617",
  "responsesByNodeId": {
    "1": {
      "response": true,
      "errorMessage": "",
      "infos": [
      ]
    },
    "2": {
      "response": true,
      "errorMessage": "",
      "infos": [
      ]
    },
    "3": {
      "response": true,
      "errorMessage": "",
      "infos": [
        {
          "span": {
            "startKey": "/Table/633/1/3381595941584118592",
            "endKey": "/Table/633/1/3400024257342070192"
          },
          "raftState": {
            "replicaId": "5",
            "hardState": {
              "term": "10",
              "vote": "5",
              "commit": "51"
            },
            "lead": "5",
            "state": "StateLeader",
            "applied": "51",
            "progress": {
              "1": {
                "match": "51",
                "next": "52",
                "state": "StateReplicate",
                "paused": false,
                "pendingSnapshot": "0"
              },
              "4": {
                "match": "51",
                "next": "52",
                "state": "StateReplicate",
                "paused": false,
                "pendingSnapshot": "0"
              },
              "5": {
                "match": "51",
                "next": "52",
                "state": "StateReplicate",
                "paused": false,
                "pendingSnapshot": "0"
              }
            },
            "leadTransferee": "0"
          },
          "state": {
            "state": {
              "raftAppliedIndex": "51",
              "leaseAppliedIndex": "27",
              "desc": {
                "rangeId": "158617",
                "startKey": "9wJ5if0u7deXqAYjQA==",
                "endKey": "9wJ5if0vL1AMRpXdsA==",
                "internalReplicas": [
                  {
                    "nodeId": 8,
                    "storeId": 8,
                    "replicaId": 1,
                    "type": 0
                  },
                  {
                    "nodeId": 3,
                    "storeId": 3,
                    "replicaId": 5,
                    "type": 0
                  },
                  {
                    "nodeId": 6,
                    "storeId": 6,
                    "replicaId": 4,
                    "type": 0
                  }
                ],
                "nextReplicaId": 6,
                "generation": "10381",
                "stickyBit": {
                  "wallTime": "0",
                  "logical": 0,
                  "synthetic": false
                }
              },
              "lease": {
                "start": {
                  "wallTime": "1681940763877170751",
                  "logical": 0,
                  "synthetic": false
                },
                "expiration": null,
                "replica": {
                  "nodeId": 6,
                  "storeId": 6,
                  "replicaId": 4,
                  "type": 2
                },
                "deprecatedStartStasis": null,
                "proposedTs": {
                  "wallTime": "1681940763886123978",
                  "logical": 0,
                  "synthetic": false
                },
                "epoch": "2",
                "sequence": "2315",
                "acquisitionType": 2
              },
              "truncatedState": {
                "index": "42",
                "term": "9"
              },
              "gcThreshold": {
                "wallTime": "0",
                "logical": 0,
                "synthetic": false
              },
              "stats": {
                "containsEstimates": "0",
                "lastUpdateNanos": "1681947855897427979",
                "intentAge": "0",
                "gcBytesAge": "0",
                "liveBytes": "0",
                "liveCount": "0",
                "keyBytes": "0",
                "keyCount": "0",
                "valBytes": "0",
                "valCount": "0",
                "intentBytes": "0",
                "intentCount": "0",
                "separatedIntentCount": "0",
                "rangeKeyCount": "0",
                "rangeKeyBytes": "0",
                "rangeValCount": "0",
                "rangeValBytes": "0",
                "sysBytes": "1571",
                "sysCount": "8",
                "abortSpanBytes": "180"
              },
              "deprecatedUsingAppliedStateKey": false,
              "version": {
                "majorVal": 22,
                "minorVal": 2,
                "patch": 0,
                "internal": 0
              },
              "raftClosedTimestamp": {
                "wallTime": "1681942418509355273",
                "logical": 0,
                "synthetic": false
              },
              "raftAppliedIndexTerm": "10",
              "gcHint": {
                "latestRangeDeleteTimestamp": {
                  "wallTime": "1681940783499795597",
                  "logical": 0,
                  "synthetic": false
                }
              }
            },
            "lastIndex": "51",
            "numPending": "0",
            "numDropped": "0",
            "raftLogSize": "4164",
            "raftLogSizeTrusted": true,
            "approximateProposalQuota": "8388608",
            "proposalQuotaBaseIndex": "51",
            "proposalQuotaReleaseQueue": [
            ],
            "rangeMaxBytes": "536870912",
            "activeClosedTimestamp": {
              "wallTime": "1681947852826909749",
              "logical": 0,
              "synthetic": false
            },
            "rangefeedRegistrations": "0",
            "tenantId": "1",
            "closedTimestampPolicy": 0,
            "closedTimestampSidetransportInfo": {
              "replicaClosed": {
                "wallTime": "1681947852826909749",
                "logical": 0,
                "synthetic": false
              },
              "replicaLai": "27",
              "centralClosed": {
                "wallTime": "1681947852826909749",
                "logical": 0,
                "synthetic": false
              },
              "centralLai": "27"
            },
            "circuitBreakerError": "",
            "pausedReplicas": [
            ]
          },
          "sourceNodeId": 3,
          "sourceStoreId": 3,
          "errorMessage": "",
          "leaseHistory": [
          ],
          "problems": {
            "unavailable": false,
            "leaderNotLeaseHolder": true,
            "noRaftLeader": false,
            "underreplicated": false,
            "overreplicated": false,
            "noLease": false,
            "quiescentEqualsTicking": false,
            "raftLogTooLarge": false,
            "circuitBreakerError": false,
            "pausedFollowers": false
          },
          "stats": {
            "queriesPerSecond": 0,
            "writesPerSecond": 0,
            "requestsPerSecond": 0,
            "readsPerSecond": 0,
            "writeBytesPerSecond": 0,
            "readBytesPerSecond": 0,
            "cpuTimePerSecond": 0
          },
          "leaseStatus": {
            "lease": {
              "start": {
                "wallTime": "1681940763877170751",
                "logical": 0,
                "synthetic": false
              },
              "expiration": null,
              "replica": {
                "nodeId": 6,
                "storeId": 6,
                "replicaId": 4,
                "type": 2
              },
              "deprecatedStartStasis": null,
              "proposedTs": {
                "wallTime": "1681940763886123978",
                "logical": 0,
                "synthetic": false
              },
              "epoch": "2",
              "sequence": "2315",
              "acquisitionType": 2
            },
            "now": {
              "wallTime": "1681947855908742749",
              "logical": 0,
              "synthetic": false
            },
            "requestTime": {
              "wallTime": "1681947855908742749",
              "logical": 0,
              "synthetic": false
            },
            "state": 1,
            "errInfo": "",
            "liveness": {
              "nodeId": 6,
              "epoch": "2",
              "expiration": {
                "wallTime": "1681947860935702784",
                "logical": 0,
                "synthetic": null
              },
              "draining": true,
              "membership": 0
            },
            "minValidObservedTimestamp": {
              "wallTime": "0",
              "logical": 0,
              "synthetic": false
            }
          },
          "quiescent": true,
          "ticking": false,
          "readLatches": "0",
          "writeLatches": "0",
          "locks": "0",
          "locksWithWaitQueues": "0",
          "lockWaitQueueWaiters": "0",
          "topKLocksByWaitQueueWaiters": [
          ],
          "locality": {
            "tiers": [
              {
                "key": "cloud",
                "value": "gce"
              },
              {
                "key": "region",
                "value": "us-east1"
              },
              {
                "key": "zone",
                "value": "us-east1-b"
              }
            ]
          },
          "isLeaseholder": false,
          "leaseValid": true
        }
      ]
    },
    "4": {
      "response": true,
      "errorMessage": "",
      "infos": [
      ]
    },
    "5": {
      "response": true,
      "errorMessage": "",
      "infos": [
      ]
    },
    "6": {
      "response": true,
      "errorMessage": "",
      "infos": [
        {
          "span": {
            "startKey": "/Table/633/1/3381595941584118592",
            "endKey": "/Table/633/1/3400024257342070192"
          },
          "raftState": {
            "replicaId": "4",
            "hardState": {
              "term": "10",
              "vote": "5",
              "commit": "51"
            },
            "lead": "5",
            "state": "StateFollower",
            "applied": "51",
            "progress": {
            },
            "leadTransferee": "0"
          },
          "state": {
            "state": {
              "raftAppliedIndex": "51",
              "leaseAppliedIndex": "27",
              "desc": {
                "rangeId": "158617",
                "startKey": "9wJ5if0u7deXqAYjQA==",
                "endKey": "9wJ5if0vL1AMRpXdsA==",
                "internalReplicas": [
                  {
                    "nodeId": 8,
                    "storeId": 8,
                    "replicaId": 1,
                    "type": 0
                  },
                  {
                    "nodeId": 3,
                    "storeId": 3,
                    "replicaId": 5,
                    "type": 0
                  },
                  {
                    "nodeId": 6,
                    "storeId": 6,
                    "replicaId": 4,
                    "type": 0
                  }
                ],
                "nextReplicaId": 6,
                "generation": "10381",
                "stickyBit": {
                  "wallTime": "0",
                  "logical": 0,
                  "synthetic": false
                }
              },
              "lease": {
                "start": {
                  "wallTime": "1681940763877170751",
                  "logical": 0,
                  "synthetic": false
                },
                "expiration": null,
                "replica": {
                  "nodeId": 6,
                  "storeId": 6,
                  "replicaId": 4,
                  "type": 2
                },
                "deprecatedStartStasis": null,
                "proposedTs": {
                  "wallTime": "1681940763886123978",
                  "logical": 0,
                  "synthetic": false
                },
                "epoch": "2",
                "sequence": "2315",
                "acquisitionType": 2
              },
              "truncatedState": {
                "index": "13",
                "term": "6"
              },
              "gcThreshold": {
                "wallTime": "0",
                "logical": 0,
                "synthetic": false
              },
              "stats": {
                "containsEstimates": "0",
                "lastUpdateNanos": "1681947855897427979",
                "intentAge": "0",
                "gcBytesAge": "0",
                "liveBytes": "0",
                "liveCount": "0",
                "keyBytes": "0",
                "keyCount": "0",
                "valBytes": "0",
                "valCount": "0",
                "intentBytes": "0",
                "intentCount": "0",
                "separatedIntentCount": "0",
                "rangeKeyCount": "0",
                "rangeKeyBytes": "0",
                "rangeValCount": "0",
                "rangeValBytes": "0",
                "sysBytes": "1571",
                "sysCount": "8",
                "abortSpanBytes": "180"
              },
              "deprecatedUsingAppliedStateKey": false,
              "version": {
                "majorVal": 22,
                "minorVal": 2,
                "patch": 0,
                "internal": 0
              },
              "raftClosedTimestamp": {
                "wallTime": "1681942418509355273",
                "logical": 0,
                "synthetic": false
              },
              "raftAppliedIndexTerm": "10",
              "gcHint": {
                "latestRangeDeleteTimestamp": {
                  "wallTime": "1681940783499795597",
                  "logical": 0,
                  "synthetic": false
                }
              }
            },
            "lastIndex": "51",
            "numPending": "0",
            "numDropped": "0",
            "raftLogSize": "13771",
            "raftLogSizeTrusted": true,
            "approximateProposalQuota": "0",
            "proposalQuotaBaseIndex": "0",
            "proposalQuotaReleaseQueue": [
            ],
            "rangeMaxBytes": "536870912",
            "activeClosedTimestamp": {
              "wallTime": "1681947852826909749",
              "logical": 0,
              "synthetic": false
            },
            "rangefeedRegistrations": "0",
            "tenantId": "1",
            "closedTimestampPolicy": 0,
            "closedTimestampSidetransportInfo": {
              "replicaClosed": {
                "wallTime": "1681947852826909749",
                "logical": 0,
                "synthetic": false
              },
              "replicaLai": "27",
              "centralClosed": {
                "wallTime": "0",
                "logical": 0,
                "synthetic": false
              },
              "centralLai": "0"
            },
            "circuitBreakerError": "",
            "pausedReplicas": [
            ]
          },
          "sourceNodeId": 6,
          "sourceStoreId": 6,
          "errorMessage": "",
          "leaseHistory": [
            {
              "start": {
                "wallTime": "1681940760435397319",
                "logical": 0,
                "synthetic": false
              },
              "expiration": null,
              "replica": {
                "nodeId": 5,
                "storeId": 5,
                "replicaId": 3,
                "type": 0
              },
              "deprecatedStartStasis": null,
              "proposedTs": {
                "wallTime": "1681940760437956200",
                "logical": 0,
                "synthetic": false
              },
              "epoch": "3",
              "sequence": "2313",
              "acquisitionType": 2
            },
            {
              "start": {
                "wallTime": "1681940763877170751",
                "logical": 0,
                "synthetic": false
              },
              "expiration": {
                "wallTime": "1681940769877073340",
                "logical": 0,
                "synthetic": false
              },
              "replica": {
                "nodeId": 6,
                "storeId": 6,
                "replicaId": 4,
                "type": 2
              },
              "deprecatedStartStasis": null,
              "proposedTs": {
                "wallTime": "1681940763877073340",
                "logical": 0,
                "synthetic": false
              },
              "epoch": "0",
              "sequence": "2314",
              "acquisitionType": 1
            },
            {
              "start": {
                "wallTime": "1681940763877170751",
                "logical": 0,
                "synthetic": false
              },
              "expiration": null,
              "replica": {
                "nodeId": 6,
                "storeId": 6,
                "replicaId": 4,
                "type": 2
              },
              "deprecatedStartStasis": null,
              "proposedTs": {
                "wallTime": "1681940763886123978",
                "logical": 0,
                "synthetic": false
              },
              "epoch": "2",
              "sequence": "2315",
              "acquisitionType": 2
            }
          ],
          "problems": {
            "unavailable": false,
            "leaderNotLeaseHolder": false,
            "noRaftLeader": false,
            "underreplicated": false,
            "overreplicated": false,
            "noLease": false,
            "quiescentEqualsTicking": false,
            "raftLogTooLarge": false,
            "circuitBreakerError": false,
            "pausedFollowers": false
          },
          "stats": {
            "queriesPerSecond": 0,
            "writesPerSecond": 0,
            "requestsPerSecond": 0,
            "readsPerSecond": 0,
            "writeBytesPerSecond": 0,
            "readBytesPerSecond": 0,
            "cpuTimePerSecond": 0
          },
          "leaseStatus": {
            "lease": {
              "start": {
                "wallTime": "1681940763877170751",
                "logical": 0,
                "synthetic": false
              },
              "expiration": null,
              "replica": {
                "nodeId": 6,
                "storeId": 6,
                "replicaId": 4,
                "type": 2
              },
              "deprecatedStartStasis": null,
              "proposedTs": {
                "wallTime": "1681940763886123978",
                "logical": 0,
                "synthetic": false
              },
              "epoch": "2",
              "sequence": "2315",
              "acquisitionType": 2
            },
            "now": {
              "wallTime": "1681947855908740696",
              "logical": 0,
              "synthetic": false
            },
            "requestTime": {
              "wallTime": "1681947855908740696",
              "logical": 0,
              "synthetic": false
            },
            "state": 1,
            "errInfo": "",
            "liveness": {
              "nodeId": 6,
              "epoch": "2",
              "expiration": {
                "wallTime": "1681947860935702784",
                "logical": 0,
                "synthetic": null
              },
              "draining": true,
              "membership": 0
            },
            "minValidObservedTimestamp": {
              "wallTime": "1681940763877170751",
              "logical": 0,
              "synthetic": false
            }
          },
          "quiescent": true,
          "ticking": false,
          "readLatches": "0",
          "writeLatches": "0",
          "locks": "0",
          "locksWithWaitQueues": "0",
          "lockWaitQueueWaiters": "0",
          "topKLocksByWaitQueueWaiters": [
          ],
          "locality": {
            "tiers": [
              {
                "key": "cloud",
                "value": "gce"
              },
              {
                "key": "region",
                "value": "us-east1"
              },
              {
                "key": "zone",
                "value": "us-east1-c"
              }
            ]
          },
          "isLeaseholder": true,
          "leaseValid": true
        }
      ]
    },
    "7": {
      "response": true,
      "errorMessage": "",
      "infos": [
      ]
    },
    "8": {
      "response": true,
      "errorMessage": "",
      "infos": [
        {
          "span": {
            "startKey": "/Table/633/1/3381595941584118592",
            "endKey": "/Table/633/1/3400024257342070192"
          },
          "raftState": {
            "replicaId": "1",
            "hardState": {
              "term": "10",
              "vote": "5",
              "commit": "51"
            },
            "lead": "5",
            "state": "StateFollower",
            "applied": "51",
            "progress": {
            },
            "leadTransferee": "0"
          },
          "state": {
            "state": {
              "raftAppliedIndex": "51",
              "leaseAppliedIndex": "27",
              "desc": {
                "rangeId": "158617",
                "startKey": "9wJ5if0u7deXqAYjQA==",
                "endKey": "9wJ5if0vL1AMRpXdsA==",
                "internalReplicas": [
                  {
                    "nodeId": 8,
                    "storeId": 8,
                    "replicaId": 1,
                    "type": 0
                  },
                  {
                    "nodeId": 3,
                    "storeId": 3,
                    "replicaId": 5,
                    "type": 0
                  },
                  {
                    "nodeId": 6,
                    "storeId": 6,
                    "replicaId": 4,
                    "type": 0
                  }
                ],
                "nextReplicaId": 6,
                "generation": "10381",
                "stickyBit": {
                  "wallTime": "0",
                  "logical": 0,
                  "synthetic": false
                }
              },
              "lease": {
                "start": {
                  "wallTime": "1681940763877170751",
                  "logical": 0,
                  "synthetic": false
                },
                "expiration": null,
                "replica": {
                  "nodeId": 6,
                  "storeId": 6,
                  "replicaId": 4,
                  "type": 2
                },
                "deprecatedStartStasis": null,
                "proposedTs": {
                  "wallTime": "1681940763886123978",
                  "logical": 0,
                  "synthetic": false
                },
                "epoch": "2",
                "sequence": "2315",
                "acquisitionType": 2
              },
              "truncatedState": {
                "index": "10",
                "term": "5"
              },
              "gcThreshold": {
                "wallTime": "0",
                "logical": 0,
                "synthetic": false
              },
              "stats": {
                "containsEstimates": "0",
                "lastUpdateNanos": "1681947855897427979",
                "intentAge": "0",
                "gcBytesAge": "0",
                "liveBytes": "0",
                "liveCount": "0",
                "keyBytes": "0",
                "keyCount": "0",
                "valBytes": "0",
                "valCount": "0",
                "intentBytes": "0",
                "intentCount": "0",
                "separatedIntentCount": "0",
                "rangeKeyCount": "0",
                "rangeKeyBytes": "0",
                "rangeValCount": "0",
                "rangeValBytes": "0",
                "sysBytes": "1571",
                "sysCount": "8",
                "abortSpanBytes": "180"
              },
              "deprecatedUsingAppliedStateKey": false,
              "version": {
                "majorVal": 22,
                "minorVal": 2,
                "patch": 0,
                "internal": 0
              },
              "raftClosedTimestamp": {
                "wallTime": "1681942418509355273",
                "logical": 0,
                "synthetic": false
              },
              "raftAppliedIndexTerm": "10",
              "gcHint": {
                "latestRangeDeleteTimestamp": {
                  "wallTime": "1681940783499795597",
                  "logical": 0,
                  "synthetic": false
                }
              }
            },
            "lastIndex": "51",
            "numPending": "0",
            "numDropped": "0",
            "raftLogSize": "14944",
            "raftLogSizeTrusted": false,
            "approximateProposalQuota": "0",
            "proposalQuotaBaseIndex": "0",
            "proposalQuotaReleaseQueue": [
            ],
            "rangeMaxBytes": "536870912",
            "activeClosedTimestamp": {
              "wallTime": "1681947852826909749",
              "logical": 0,
              "synthetic": false
            },
            "rangefeedRegistrations": "0",
            "tenantId": "1",
            "closedTimestampPolicy": 0,
            "closedTimestampSidetransportInfo": {
              "replicaClosed": {
                "wallTime": "1681947852826909749",
                "logical": 0,
                "synthetic": false
              },
              "replicaLai": "27",
              "centralClosed": {
                "wallTime": "1681947852826909749",
                "logical": 0,
                "synthetic": false
              },
              "centralLai": "27"
            },
            "circuitBreakerError": "",
            "pausedReplicas": [
            ]
          },
          "sourceNodeId": 8,
          "sourceStoreId": 8,
          "errorMessage": "",
          "leaseHistory": [
            {
              "start": {
                "wallTime": "1681940760435397319",
                "logical": 0,
                "synthetic": false
              },
              "expiration": null,
              "replica": {
                "nodeId": 5,
                "storeId": 5,
                "replicaId": 3,
                "type": 0
              },
              "deprecatedStartStasis": null,
              "proposedTs": {
                "wallTime": "1681940760437956200",
                "logical": 0,
                "synthetic": false
              },
              "epoch": "3",
              "sequence": "2313",
              "acquisitionType": 2
            },
            {
              "start": {
                "wallTime": "1681940763877170751",
                "logical": 0,
                "synthetic": false
              },
              "expiration": {
                "wallTime": "1681940769877073340",
                "logical": 0,
                "synthetic": false
              },
              "replica": {
                "nodeId": 6,
                "storeId": 6,
                "replicaId": 4,
                "type": 2
              },
              "deprecatedStartStasis": null,
              "proposedTs": {
                "wallTime": "1681940763877073340",
                "logical": 0,
                "synthetic": false
              },
              "epoch": "0",
              "sequence": "2314",
              "acquisitionType": 1
            },
            {
              "start": {
                "wallTime": "1681940763877170751",
                "logical": 0,
                "synthetic": false
              },
              "expiration": null,
              "replica": {
                "nodeId": 6,
                "storeId": 6,
                "replicaId": 4,
                "type": 2
              },
              "deprecatedStartStasis": null,
              "proposedTs": {
                "wallTime": "1681940763886123978",
                "logical": 0,
                "synthetic": false
              },
              "epoch": "2",
              "sequence": "2315",
              "acquisitionType": 2
            }
          ],
          "problems": {
            "unavailable": false,
            "leaderNotLeaseHolder": false,
            "noRaftLeader": false,
            "underreplicated": false,
            "overreplicated": false,
            "noLease": false,
            "quiescentEqualsTicking": false,
            "raftLogTooLarge": false,
            "circuitBreakerError": false,
            "pausedFollowers": false
          },
          "stats": {
            "queriesPerSecond": 0,
            "writesPerSecond": 0,
            "requestsPerSecond": 0,
            "readsPerSecond": 0,
            "writeBytesPerSecond": 0,
            "readBytesPerSecond": 0,
            "cpuTimePerSecond": 0
          },
          "leaseStatus": {
            "lease": {
              "start": {
                "wallTime": "1681940763877170751",
                "logical": 0,
                "synthetic": false
              },
              "expiration": null,
              "replica": {
                "nodeId": 6,
                "storeId": 6,
                "replicaId": 4,
                "type": 2
              },
              "deprecatedStartStasis": null,
              "proposedTs": {
                "wallTime": "1681940763886123978",
                "logical": 0,
                "synthetic": false
              },
              "epoch": "2",
              "sequence": "2315",
              "acquisitionType": 2
            },
            "now": {
              "wallTime": "1681947855908782507",
              "logical": 0,
              "synthetic": false
            },
            "requestTime": {
              "wallTime": "1681947855908782507",
              "logical": 0,
              "synthetic": false
            },
            "state": 1,
            "errInfo": "",
            "liveness": {
              "nodeId": 6,
              "epoch": "2",
              "expiration": {
                "wallTime": "1681947860935702784",
                "logical": 0,
                "synthetic": null
              },
              "draining": true,
              "membership": 0
            },
            "minValidObservedTimestamp": {
              "wallTime": "0",
              "logical": 0,
              "synthetic": false
            }
          },
          "quiescent": true,
          "ticking": false,
          "readLatches": "0",
          "writeLatches": "0",
          "locks": "0",
          "locksWithWaitQueues": "0",
          "lockWaitQueueWaiters": "0",
          "topKLocksByWaitQueueWaiters": [
          ],
          "locality": {
            "tiers": [
              {
                "key": "cloud",
                "value": "gce"
              },
              {
                "key": "region",
                "value": "us-east1"
              },
              {
                "key": "zone",
                "value": "us-east1-d"
              }
            ]
          },
          "isLeaseholder": false,
          "leaseValid": true
        }
      ]
    }
  }
}

kvoli · 2023-04-19T23:54:35Z

Another 144488 had the same issue but resolved itself after 5 minutes:

I230419 22:59:46.494116 1357388647 kv/kvserver/store.go:1729 ⋮ [T1,drain,n6,s6,r144488/4:‹/Table/576/1/-34{5530…-3688…}›] 224593  failed to transfer lease repl=(n6,s6):4VOTER_INCOMING seq=2071 start=1681911361.105018268,0 epo=2 pro=1681911361.222536052,0 for range r144488:‹/Table/576/1/-34{55309204615925008-36880888857973408}› [(n8,s8):1, (n3,s3):2, (n6,s6):4, next=5, gen=9438] when draining: ‹no suitable transfer target found›

...
# tries again
I230419 22:59:57.617015 1357394497 kv/kvserver/store.go:1701 ⋮ [T1,drain,n6,s6,r144488/4:‹/Table/576/1/-34{5530…-3688…}›] 224638  attempting to transfer lease repl=(n6,s6):4VOTER_INCOMING seq=2071 start=1681911361.105018268,0 epo=2 pro=1681911361.222536052,0 for range r144488:‹/Table/576/1/-34{55309204615925008-36880888857973408}› [(n8,s8):1, (n3,s3):2, (n6,s6):4, next=5, gen=9438]
# succeeds
I230419 22:59:57.617259 1357394497 13@kv/kvserver/replicate_queue.go:2034 ⋮ [T1,drain,n6,s6,r144488/4:‹/Table/576/1/-34{5530…-3688…}›] 153689  transferring lease to s3

andrewbaptist · 2023-04-19T23:55:41Z

The issue is when we are in a configuration where the leader and leaseholder are different, the replicate queue is unable to fix the issue. It has a check excludeReplicasInNeedOfSnapshots which attempts to exclude behind replicas. However the problems is that when the leader is not the leaseholder, the ReplicaMayNeedSnapshot method returns LocalReplicaNotLeader. This prevents the replicate queue from moving the lease.

A reasonable fix is to reverse the check here to only exclude replicas if the state is ReplicaStateSnapshot

There is a potential issue of why we have a leader/leaseholder split for this range, and it may be worth tracking that down as well, but its possible that they just "organically appear" and this bug would prevent them from recovering.

nvanbenschoten · 2023-04-20T01:28:32Z

A reasonable fix is to reverse the check here to only exclude replicas if the state is ReplicaStateSnapshot

If a leaseholder is not the raft leader then it does not track the progress of peers. It would not know whether the lease transfer target is caught up on its log or whether it requires a snapshot. See #81561 for more detail about why this would be unsafe and the consequences that allowing such lease transfers could have.

and this bug would prevent them from recovering

Which bug are you referring to? Why would it prevent the leaseholder from receiving leadership through maybeTransferRaftLeadershipToLeaseholderLocked?

nvanbenschoten · 2023-04-20T01:42:18Z

In #101885 (comment)

The full status entry for r158617

We see a split leader/leaseholder range that's quiescent.

➜ jq '.responsesByNodeId."3".infos[0].raftState.lead' status.json
"5"
➜ jq '.responsesByNodeId."3".infos[0].state.state.lease.replica.replicaId' status.json
4
➜ jq '.responsesByNodeId."3".infos[0].quiescent' status.json
true

This would explain why maybeTransferRaftLeadershipToLeaseholderLocked is not collocating the leader and leaseholder to unblock the drain. But it also should not be allowed by shouldReplicaQuiesce:

cockroach/pkg/kv/kvserver/replica_raft_quiesce.go

Line 288 in 4fb09ad

log.Infof(ctx, "not quiescing: not leader")

cockroach/pkg/kv/kvserver/replica_raft_quiesce.go

Line 377 in 4fb09ad

log.Infof(ctx, "not quiescing: not leaseholder")

Is this racy? Did we break things when we touched them earlier this release in #97289?

erikgrinaker · 2023-04-20T09:56:25Z

n6's lease has remained valid since 2023-04-19 21:46:03. The n3:5 replica (leader) only came into existence at 22:13:40:

I230419 22:13:40.559828 1233371706 kv/kvserver/replica_raftstorage.go:489 ⋮ [T1,n3,s3,r158617/5:‹/Table/633/1/3{38159…-40002…}›] 63037  applied INITIAL snapshot 0be85246 from (n6,s6):4 at applied index 42 (total=25ms data=2.0 KiB ingestion=6@20ms)

So the lease has existed for as long as n3 has existed, and thus for as long as it can have been the leader, and it's still valid. The logic in #97289 should not allow it to quiesce. This shouldn't be racy, because if there are any outstanding proposals or unapplied log entries we don't quiesce either, so there can't be another lease in flight that we haven't seen.

However, what would happen if n6 had a liveness blip, either globally or from n3 gossip's point of view? There are two conditions that mark an epoch-based lease as expired:

The liveness record's epoch has been bumped.

cockroach/pkg/kv/kvserver/replica_range_lease.go

Lines 702 to 705 in 0e0e091

    
           if status.Liveness.Epoch > lease.Epoch { 
        
           	status.State = kvserverpb.LeaseState_EXPIRED 
        
           	return status 
        
           }

The liveness record has expired:

cockroach/pkg/kv/kvserver/replica_range_lease.go

Lines 711 to 712 in 0e0e091

if expiration.LessEq(now.ToTimestamp()) {

status.State = kvserverpb.LeaseState_EXPIRED

I think the second condition is problematic. The assumption we make when we allow expired leases to quiesce is that someone has to reacquire the expired lease, and this will wake up the Raft group. But I think this assumption is flawed in two ways:

If n6's liveness record expired from n3's point of view, then n3 will quiesce, but if noone goes on to bump n6's epoch then n6 can still hold on to the lease. From n3's point of view, an expired lease can revert back to a valid lease when the liveness record is updated, but at that point the Raft group is already quiesced.
We assume that anyone can wake up the Raft group by sending a lease request. But they can't always, like in this case, because we require them to be the Raft leader. For lease acquisitions this presumably isn't problematic, since the current Raft leader can acquire it, but for transfers it is.

It's still not clear that this is what happened here (I'll see if I can find any indications that n6's liveness may have blipped on n3), but I think 1 above is reason enough to revert this change. We no longer strictly need this logic anyway, now that we eagerly extend expiration leases.

erikgrinaker · 2023-04-20T10:52:55Z

Submitted a revert in #101899. I'm going to keep this issue open and try to figure out if this is what happened, or if there's more going on. I have to deal with some L2 stuff first though.

101899: Revert "kvserver: allow expired leases to quiesce" r=erikgrinaker a=erikgrinaker This reverts commit 76afb00. This change was flawed, because it assumed an expired lease would have to be reacquired which would wake up the range, but that's not necessarily true. In cases where the Raft leader is not colocated with the leaseholder, the Raft leader may have a stale liveness record, or the leaseholder may temporarily fail to heartbeat liveness without anyone bumping its epoch. If this happens, it's possible for an expired lease to revert to a valid lease from the leader's point of view, but at that point it's too late: the range is already quiesced. This can prevent e.g. lease transfers, because we don't allow leaseholders to transfer their lease if they're not the Raft leader (they need to make sure the target is not behind on the Raft log). We won't attempt to colocate the Raft leader with the leaseholder for a quiesced range, since we don't tick it. Touches #101885. Touches #97289. Epic: none Release note: None Co-authored-by: Erik Grinaker <[email protected]>

andrewbaptist · 2023-04-20T12:35:11Z

Do we want to put a big warning or a Fatal in the replicate queue where it detects quiescent lease in the state leaseholder not leader? It would be much easier to find this sort of issue.

It didn't register to me that this state was so problematic.

erikgrinaker · 2023-04-20T12:37:12Z

Sure, if we can do so without a risk of races. Let's not do fatal though, at least not in production builds.

andrewbaptist · 2023-04-20T12:44:43Z

For 23.1 we can just add an warning log to this. I agree it is hard to avoid the risk of races here. I can take a look and add something if there is an easy change. I created #101902 to track this.

blathers-crl · 2023-04-20T17:49:19Z

cc @cockroachdb/replication

erikgrinaker · 2023-04-27T12:46:20Z

I'm a bit stuck here. I only have time series to go by, since the debug.zip I grabbed only had 48 hours of logs, and the necessary logs are now rotated out.

The timeline is something like this:

21:46:03 n6 acquires lease and retains it throughout.
22:13:40 n3 replica is created.
22:36:00 n3 is restarted, starts picking up Raft leaderships (due to incomplete prevote and idle ranges).
22:53:00 n6 drains, shedding all but 2 leases.
23:00:00 n3 leaders_not_leaseholders drops to 1 and remains there.

There were no liveness failures nor epoch increments in the interval, so the hypothesis above about n3's view of the lease flipping to EXPIRED can't be confirmed. If it hasn't received any liveness gossip for n6 yet, it'll see the lease as ERROR and not quiesce either.

Over the next 20 minutes, we see n3 moving Raft leaderships to leaseholders, as we would expect for unquiesced leaders.

I've reviewed the quiescence code, and I don't see any other explanation for why we fail to quiesce except the hypothesis above, but I also haven't been able to confirm that that's the bug we hit here. I don't know if I'll be able to make much more progress here, so I'll remove the GA blocker, but will try some more restarts and drains on the test cluster to see if it reproduces.

erikgrinaker · 2023-04-28T11:33:02Z

Did another rolling drain+restart of the 23.1 test cluster. Except for #102574, the drains all went through successfully, and leaders_not_leaseholders always dropped to 0 within a reasonable time. Given that we've also ran several other successful drain/restart cycles, I'm going to close this out for now.

nicktrav added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team labels Apr 19, 2023

yuzefovich added GA-blocker branch-release-23.1.0 labels Apr 19, 2023

blathers-crl bot added the T-kv KV Team label Apr 19, 2023

erikgrinaker self-assigned this Apr 20, 2023

exalate-issue-sync bot removed the T-sql-queries SQL Queries Team label Apr 20, 2023

erikgrinaker mentioned this issue Apr 20, 2023

Revert "kvserver: allow expired leases to quiesce" #101899

Merged

This was referenced Apr 20, 2023

kvserver: warn about a quiesced leader not leaseholder range #101902

Open

Warn about a quieseced leader not leaseholder range #101903

Closed

This was referenced Apr 20, 2023

release-23.1: Revert "kvserver: allow expired leases to quiesce" #101911

Merged

release-23.1.0: Revert "kvserver: allow expired leases to quiesce" #101912

Merged

erikgrinaker added T-kv-replication and removed T-kv KV Team labels Apr 20, 2023

erikgrinaker changed the title ~~sql: leases appear stuck when draining a node~~ kvserver: leases appear stuck when draining a node Apr 20, 2023

nicktrav added the O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster label Apr 24, 2023

celiala added the blocks-23.1.0-rc.1 label Apr 26, 2023

erikgrinaker removed GA-blocker blocks-23.1.0-rc.1 labels Apr 27, 2023

erikgrinaker closed this as completed Apr 28, 2023

nvanbenschoten mentioned this issue Sep 25, 2023

kv/kvserver: TestLeaseExpirationBasedDrainTransferWithExtension failed #110715

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: leases appear stuck when draining a node #101885

kvserver: leases appear stuck when draining a node #101885

nicktrav commented Apr 19, 2023 •

edited by exalate-issue-sync bot

Loading

nicktrav commented Apr 19, 2023

nicktrav commented Apr 19, 2023

yuzefovich commented Apr 19, 2023

yuzefovich commented Apr 19, 2023

nicktrav commented Apr 19, 2023

kvoli commented Apr 19, 2023 •

edited

Loading

kvoli commented Apr 19, 2023

andrewbaptist commented Apr 19, 2023

nvanbenschoten commented Apr 20, 2023

nvanbenschoten commented Apr 20, 2023 •

edited

Loading

erikgrinaker commented Apr 20, 2023 •

edited

Loading

erikgrinaker commented Apr 20, 2023 •

edited

Loading

andrewbaptist commented Apr 20, 2023

erikgrinaker commented Apr 20, 2023

andrewbaptist commented Apr 20, 2023

blathers-crl bot commented Apr 20, 2023

erikgrinaker commented Apr 27, 2023

erikgrinaker commented Apr 28, 2023

kvserver: leases appear stuck when draining a node #101885

kvserver: leases appear stuck when draining a node #101885

Comments

nicktrav commented Apr 19, 2023 • edited by exalate-issue-sync bot Loading

nicktrav commented Apr 19, 2023

nicktrav commented Apr 19, 2023

yuzefovich commented Apr 19, 2023

yuzefovich commented Apr 19, 2023

nicktrav commented Apr 19, 2023

kvoli commented Apr 19, 2023 • edited Loading

kvoli commented Apr 19, 2023

andrewbaptist commented Apr 19, 2023

nvanbenschoten commented Apr 20, 2023

nvanbenschoten commented Apr 20, 2023 • edited Loading

erikgrinaker commented Apr 20, 2023 • edited Loading

erikgrinaker commented Apr 20, 2023 • edited Loading

andrewbaptist commented Apr 20, 2023

erikgrinaker commented Apr 20, 2023

andrewbaptist commented Apr 20, 2023

blathers-crl bot commented Apr 20, 2023

erikgrinaker commented Apr 27, 2023

erikgrinaker commented Apr 28, 2023

nicktrav commented Apr 19, 2023 •

edited by exalate-issue-sync bot

Loading

kvoli commented Apr 19, 2023 •

edited

Loading

nvanbenschoten commented Apr 20, 2023 •

edited

Loading

erikgrinaker commented Apr 20, 2023 •

edited

Loading

erikgrinaker commented Apr 20, 2023 •

edited

Loading