Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: leases appear stuck when draining a node #101885

Closed
nicktrav opened this issue Apr 19, 2023 · 18 comments
Closed

kvserver: leases appear stuck when draining a node #101885

nicktrav opened this issue Apr 19, 2023 · 18 comments
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster

Comments

@nicktrav
Copy link
Collaborator

nicktrav commented Apr 19, 2023

Describe the problem

When draining a node in a 23.1.0-beta.2 cluster, I've noticed that occasionally a node will get stuck draining:

$ ./cockroach node drain --insecure --self
node is draining... remaining: 1766
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining... remaining: 2
node is draining...

To Reproduce

We have a test cluster with instructions on the topology and workloads here.

This doesn't reproduce reliably, but I've been doing the following:

  • Shell to a node - roachprod ssh $CLUSTER:$NODE
  • Run a drain command $ ./cockroach node drain --self --insecure
  • If the drain completes, cycle the node - roachprod stop $CLUSTER:$NODE, then start
  • Wait for the node to come back up, wait for underreplicated nodes to reduce to zero, etc.
  • Move to the next node.

Expected behavior

We expect the drain to complete in a timely fashion, rather than hanging.

Environment:

23.1.0-beta.2

Additional context

See this internal thread (internal).

Jira issue: CRDB-27173

Epic CRDB-27235

@nicktrav nicktrav added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team labels Apr 19, 2023
@nicktrav
Copy link
Collaborator Author

It's possible that this is #100761, but there are is some other weirdness that might be worth digging into.

In the latest instance of this, we noticed the following (this is after three attempts to drain):

ubuntu@grinaker-231-0006:~$ ./cockroach node drain --self --insecure
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining... remaining: 13
node is draining...
I230419 23:05:09.211117 1359047790 1@server/drain.go:295 ⋮ [T1,n6] 226463  drain details: range lease iterations: 12, distSQL execution flows: 1
root@localhost:26257/defaultdb> SELECT * FROM crdb_internal.cluster_distsql_flows WHERE now() - since > '1m'::INTERVAL;
                flow_id                | node_id | stmt |           since            | status
---------------------------------------+---------+------+----------------------------+----------
  1b7142fb-e009-4026-9341-7dfc7e4d823e |       1 |      | 2023-04-19 22:53:08.606+00 | running
  1b7142fb-e009-4026-9341-7dfc7e4d823e |       2 |      | 2023-04-19 22:53:08.606+00 | running
  1b7142fb-e009-4026-9341-7dfc7e4d823e |       3 |      | 2023-04-19 22:53:08.606+00 | running
  1b7142fb-e009-4026-9341-7dfc7e4d823e |       4 |      | 2023-04-19 22:53:08.606+00 | running
  1b7142fb-e009-4026-9341-7dfc7e4d823e |       5 |      | 2023-04-19 22:53:08.606+00 | running
  1b7142fb-e009-4026-9341-7dfc7e4d823e |       7 |      | 2023-04-19 22:53:08.606+00 | running
(6 rows)

Time: 20ms total (execution 20ms / network 0ms)

@nicktrav
Copy link
Collaborator Author

Also worth noting that the patch isn't in 23.1.0-beta.2. It will unlikely make beta.3 either. Ideally we'd confirm that this isn't an issue before GA, which would require either an RC or another beta.

@yuzefovich - will defer to you to figure out if we should slap a GA blocker label on this or not.

@yuzefovich
Copy link
Member

I'll add a GA-blocker to justify backporting #101884 to 23.1.0.

@yuzefovich
Copy link
Member

Although leases being stuck is a KV issue, so I'll add to the KV board too, cc @kvoli

@blathers-crl blathers-crl bot added the T-kv KV Team label Apr 19, 2023
@nicktrav
Copy link
Collaborator Author

From Austen:

Its a voter incoming - which is a joint config state… It needs to exit the joint config but doesn’t appear to be doing it

I230419 23:19:36.501199 1363779280 kv/kvserver/store.go:1701 ⋮ [T1,drain,n6,s6,r158617/4:‹/Table/633/1/3{38159…-40002…}›] 231384  attempting to transfer lease repl=(n6,s6):4VOTER_INCOMING seq=2315 start=1681940763.877170751,0 epo=2 pro=1681940763.886123978,0 for range r158617:‹/Table/633/1/3{381595941584118592-400024257342070192}› [(n8,s8):1, (n3,s3):5, (n6,s6):4, next=6, gen=10381]

@kvoli
Copy link
Collaborator

kvoli commented Apr 19, 2023

Looks like the voter incoming was transient. Still stuck but with leader/lease split only.
Before I lose the context, one of the offending ranges has this status entry:

image

When running manually via the RQ:
2023-04-19 23:41:02 kv/kvserver/store.go:3500 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] running replicate.process
2023-04-19 23:41:02 kv/kvserver/store.go:3502 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] processed: true (err: )
2023-04-19 23:41:02 kv/kvserver/replicate_queue.go:1054 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] next replica action: consider rebalance
2023-04-19 23:41:02 kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s1 is a live target, candidate for rebalancing
2023-04-19 23:41:02 kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s5 is a live target, candidate for rebalancing
2023-04-19 23:41:02 kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s7 is a live target, candidate for rebalancing
2023-04-19 23:41:02 kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s2 is a live target, candidate for rebalancing
2023-04-19 23:41:02 kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s8 is a live target, candidate for rebalancing
2023-04-19 23:41:02 kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s4 is a live target, candidate for rebalancing
2023-04-19 23:41:02 kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s1 is a live target, candidate for rebalancing
2023-04-19 23:41:02 kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s5 is a live target, candidate for rebalancing
2023-04-19 23:41:02 kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s7 is a live target, candidate for rebalancing
2023-04-19 23:41:02 kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s2 is a live target, candidate for rebalancing
2023-04-19 23:41:02 kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s3 is a live target, candidate for rebalancing
2023-04-19 23:41:02 kv/kvserver/allocator/storepool/store_pool.go:1428 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] s4 is a live target, candidate for rebalancing
2023-04-19 23:41:02 kv/kvserver/allocator/allocatorimpl/allocator.go:2860 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] not considering [n8, s8] as a potential candidate for a lease transfer because the replica may be waiting for a snapshot
2023-04-19 23:41:02 kv/kvserver/allocator/allocatorimpl/allocator.go:2860 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] not considering [n3, s3] as a potential candidate for a lease transfer because the replica may be waiting for a snapshot
2023-04-19 23:41:02 kv/kvserver/allocator/allocatorimpl/allocator.go:2252 [n6,s6,r158617/4:/Table/633/1/3{38159…-40002…}] no lease transfer target found for r158617
I230419 23:19:33.997372 1363710931 kv/kvserver/store.go:1701 ⋮ [T1,drain,n6,s6,r158617/4:‹/Table/633/1/3{38159…-40002…}›] 231351  attempting to transfer lease repl=(n6,s6):4VOTER_INCOMING seq=2315 start=1681940763.877170751,0 epo=2 pro=1681940763.886123978,0 for range r158617:‹/Table/633/1/3{381595941584118592-400024257342070192}› [(n8,s8):1, (n3,s3):5, (n6,s6):4, next=6, gen=10381]
I230419 23:19:33.997606 1363710931 kv/kvserver/store.go:1729 ⋮ [T1,drain,n6,s6,r158617/4:‹/Table/633/1/3{38159…-40002…}›] 231352  failed to transfer lease repl=(n6,s6):4VOTER_INCOMING seq=2315 start=1681940763.877170751,0 epo=2 pro=1681940763.886123978,0 for range r158617:‹/Table/633/1/3{381595941584118592-400024257342070192}› [(n8,s8):1, (n3,s3):5, (n6,s6):4, next=6, gen=10381] when draining: ‹no suitable transfer target found›
The full status entry for `r158617`
{
  "nodeId": 2,
  "rangeId": "158617",
  "responsesByNodeId": {
    "1": {
      "response": true,
      "errorMessage": "",
      "infos": [
      ]
    },
    "2": {
      "response": true,
      "errorMessage": "",
      "infos": [
      ]
    },
    "3": {
      "response": true,
      "errorMessage": "",
      "infos": [
        {
          "span": {
            "startKey": "/Table/633/1/3381595941584118592",
            "endKey": "/Table/633/1/3400024257342070192"
          },
          "raftState": {
            "replicaId": "5",
            "hardState": {
              "term": "10",
              "vote": "5",
              "commit": "51"
            },
            "lead": "5",
            "state": "StateLeader",
            "applied": "51",
            "progress": {
              "1": {
                "match": "51",
                "next": "52",
                "state": "StateReplicate",
                "paused": false,
                "pendingSnapshot": "0"
              },
              "4": {
                "match": "51",
                "next": "52",
                "state": "StateReplicate",
                "paused": false,
                "pendingSnapshot": "0"
              },
              "5": {
                "match": "51",
                "next": "52",
                "state": "StateReplicate",
                "paused": false,
                "pendingSnapshot": "0"
              }
            },
            "leadTransferee": "0"
          },
          "state": {
            "state": {
              "raftAppliedIndex": "51",
              "leaseAppliedIndex": "27",
              "desc": {
                "rangeId": "158617",
                "startKey": "9wJ5if0u7deXqAYjQA==",
                "endKey": "9wJ5if0vL1AMRpXdsA==",
                "internalReplicas": [
                  {
                    "nodeId": 8,
                    "storeId": 8,
                    "replicaId": 1,
                    "type": 0
                  },
                  {
                    "nodeId": 3,
                    "storeId": 3,
                    "replicaId": 5,
                    "type": 0
                  },
                  {
                    "nodeId": 6,
                    "storeId": 6,
                    "replicaId": 4,
                    "type": 0
                  }
                ],
                "nextReplicaId": 6,
                "generation": "10381",
                "stickyBit": {
                  "wallTime": "0",
                  "logical": 0,
                  "synthetic": false
                }
              },
              "lease": {
                "start": {
                  "wallTime": "1681940763877170751",
                  "logical": 0,
                  "synthetic": false
                },
                "expiration": null,
                "replica": {
                  "nodeId": 6,
                  "storeId": 6,
                  "replicaId": 4,
                  "type": 2
                },
                "deprecatedStartStasis": null,
                "proposedTs": {
                  "wallTime": "1681940763886123978",
                  "logical": 0,
                  "synthetic": false
                },
                "epoch": "2",
                "sequence": "2315",
                "acquisitionType": 2
              },
              "truncatedState": {
                "index": "42",
                "term": "9"
              },
              "gcThreshold": {
                "wallTime": "0",
                "logical": 0,
                "synthetic": false
              },
              "stats": {
                "containsEstimates": "0",
                "lastUpdateNanos": "1681947855897427979",
                "intentAge": "0",
                "gcBytesAge": "0",
                "liveBytes": "0",
                "liveCount": "0",
                "keyBytes": "0",
                "keyCount": "0",
                "valBytes": "0",
                "valCount": "0",
                "intentBytes": "0",
                "intentCount": "0",
                "separatedIntentCount": "0",
                "rangeKeyCount": "0",
                "rangeKeyBytes": "0",
                "rangeValCount": "0",
                "rangeValBytes": "0",
                "sysBytes": "1571",
                "sysCount": "8",
                "abortSpanBytes": "180"
              },
              "deprecatedUsingAppliedStateKey": false,
              "version": {
                "majorVal": 22,
                "minorVal": 2,
                "patch": 0,
                "internal": 0
              },
              "raftClosedTimestamp": {
                "wallTime": "1681942418509355273",
                "logical": 0,
                "synthetic": false
              },
              "raftAppliedIndexTerm": "10",
              "gcHint": {
                "latestRangeDeleteTimestamp": {
                  "wallTime": "1681940783499795597",
                  "logical": 0,
                  "synthetic": false
                }
              }
            },
            "lastIndex": "51",
            "numPending": "0",
            "numDropped": "0",
            "raftLogSize": "4164",
            "raftLogSizeTrusted": true,
            "approximateProposalQuota": "8388608",
            "proposalQuotaBaseIndex": "51",
            "proposalQuotaReleaseQueue": [
            ],
            "rangeMaxBytes": "536870912",
            "activeClosedTimestamp": {
              "wallTime": "1681947852826909749",
              "logical": 0,
              "synthetic": false
            },
            "rangefeedRegistrations": "0",
            "tenantId": "1",
            "closedTimestampPolicy": 0,
            "closedTimestampSidetransportInfo": {
              "replicaClosed": {
                "wallTime": "1681947852826909749",
                "logical": 0,
                "synthetic": false
              },
              "replicaLai": "27",
              "centralClosed": {
                "wallTime": "1681947852826909749",
                "logical": 0,
                "synthetic": false
              },
              "centralLai": "27"
            },
            "circuitBreakerError": "",
            "pausedReplicas": [
            ]
          },
          "sourceNodeId": 3,
          "sourceStoreId": 3,
          "errorMessage": "",
          "leaseHistory": [
          ],
          "problems": {
            "unavailable": false,
            "leaderNotLeaseHolder": true,
            "noRaftLeader": false,
            "underreplicated": false,
            "overreplicated": false,
            "noLease": false,
            "quiescentEqualsTicking": false,
            "raftLogTooLarge": false,
            "circuitBreakerError": false,
            "pausedFollowers": false
          },
          "stats": {
            "queriesPerSecond": 0,
            "writesPerSecond": 0,
            "requestsPerSecond": 0,
            "readsPerSecond": 0,
            "writeBytesPerSecond": 0,
            "readBytesPerSecond": 0,
            "cpuTimePerSecond": 0
          },
          "leaseStatus": {
            "lease": {
              "start": {
                "wallTime": "1681940763877170751",
                "logical": 0,
                "synthetic": false
              },
              "expiration": null,
              "replica": {
                "nodeId": 6,
                "storeId": 6,
                "replicaId": 4,
                "type": 2
              },
              "deprecatedStartStasis": null,
              "proposedTs": {
                "wallTime": "1681940763886123978",
                "logical": 0,
                "synthetic": false
              },
              "epoch": "2",
              "sequence": "2315",
              "acquisitionType": 2
            },
            "now": {
              "wallTime": "1681947855908742749",
              "logical": 0,
              "synthetic": false
            },
            "requestTime": {
              "wallTime": "1681947855908742749",
              "logical": 0,
              "synthetic": false
            },
            "state": 1,
            "errInfo": "",
            "liveness": {
              "nodeId": 6,
              "epoch": "2",
              "expiration": {
                "wallTime": "1681947860935702784",
                "logical": 0,
                "synthetic": null
              },
              "draining": true,
              "membership": 0
            },
            "minValidObservedTimestamp": {
              "wallTime": "0",
              "logical": 0,
              "synthetic": false
            }
          },
          "quiescent": true,
          "ticking": false,
          "readLatches": "0",
          "writeLatches": "0",
          "locks": "0",
          "locksWithWaitQueues": "0",
          "lockWaitQueueWaiters": "0",
          "topKLocksByWaitQueueWaiters": [
          ],
          "locality": {
            "tiers": [
              {
                "key": "cloud",
                "value": "gce"
              },
              {
                "key": "region",
                "value": "us-east1"
              },
              {
                "key": "zone",
                "value": "us-east1-b"
              }
            ]
          },
          "isLeaseholder": false,
          "leaseValid": true
        }
      ]
    },
    "4": {
      "response": true,
      "errorMessage": "",
      "infos": [
      ]
    },
    "5": {
      "response": true,
      "errorMessage": "",
      "infos": [
      ]
    },
    "6": {
      "response": true,
      "errorMessage": "",
      "infos": [
        {
          "span": {
            "startKey": "/Table/633/1/3381595941584118592",
            "endKey": "/Table/633/1/3400024257342070192"
          },
          "raftState": {
            "replicaId": "4",
            "hardState": {
              "term": "10",
              "vote": "5",
              "commit": "51"
            },
            "lead": "5",
            "state": "StateFollower",
            "applied": "51",
            "progress": {
            },
            "leadTransferee": "0"
          },
          "state": {
            "state": {
              "raftAppliedIndex": "51",
              "leaseAppliedIndex": "27",
              "desc": {
                "rangeId": "158617",
                "startKey": "9wJ5if0u7deXqAYjQA==",
                "endKey": "9wJ5if0vL1AMRpXdsA==",
                "internalReplicas": [
                  {
                    "nodeId": 8,
                    "storeId": 8,
                    "replicaId": 1,
                    "type": 0
                  },
                  {
                    "nodeId": 3,
                    "storeId": 3,
                    "replicaId": 5,
                    "type": 0
                  },
                  {
                    "nodeId": 6,
                    "storeId": 6,
                    "replicaId": 4,
                    "type": 0
                  }
                ],
                "nextReplicaId": 6,
                "generation": "10381",
                "stickyBit": {
                  "wallTime": "0",
                  "logical": 0,
                  "synthetic": false
                }
              },
              "lease": {
                "start": {
                  "wallTime": "1681940763877170751",
                  "logical": 0,
                  "synthetic": false
                },
                "expiration": null,
                "replica": {
                  "nodeId": 6,
                  "storeId": 6,
                  "replicaId": 4,
                  "type": 2
                },
                "deprecatedStartStasis": null,
                "proposedTs": {
                  "wallTime": "1681940763886123978",
                  "logical": 0,
                  "synthetic": false
                },
                "epoch": "2",
                "sequence": "2315",
                "acquisitionType": 2
              },
              "truncatedState": {
                "index": "13",
                "term": "6"
              },
              "gcThreshold": {
                "wallTime": "0",
                "logical": 0,
                "synthetic": false
              },
              "stats": {
                "containsEstimates": "0",
                "lastUpdateNanos": "1681947855897427979",
                "intentAge": "0",
                "gcBytesAge": "0",
                "liveBytes": "0",
                "liveCount": "0",
                "keyBytes": "0",
                "keyCount": "0",
                "valBytes": "0",
                "valCount": "0",
                "intentBytes": "0",
                "intentCount": "0",
                "separatedIntentCount": "0",
                "rangeKeyCount": "0",
                "rangeKeyBytes": "0",
                "rangeValCount": "0",
                "rangeValBytes": "0",
                "sysBytes": "1571",
                "sysCount": "8",
                "abortSpanBytes": "180"
              },
              "deprecatedUsingAppliedStateKey": false,
              "version": {
                "majorVal": 22,
                "minorVal": 2,
                "patch": 0,
                "internal": 0
              },
              "raftClosedTimestamp": {
                "wallTime": "1681942418509355273",
                "logical": 0,
                "synthetic": false
              },
              "raftAppliedIndexTerm": "10",
              "gcHint": {
                "latestRangeDeleteTimestamp": {
                  "wallTime": "1681940783499795597",
                  "logical": 0,
                  "synthetic": false
                }
              }
            },
            "lastIndex": "51",
            "numPending": "0",
            "numDropped": "0",
            "raftLogSize": "13771",
            "raftLogSizeTrusted": true,
            "approximateProposalQuota": "0",
            "proposalQuotaBaseIndex": "0",
            "proposalQuotaReleaseQueue": [
            ],
            "rangeMaxBytes": "536870912",
            "activeClosedTimestamp": {
              "wallTime": "1681947852826909749",
              "logical": 0,
              "synthetic": false
            },
            "rangefeedRegistrations": "0",
            "tenantId": "1",
            "closedTimestampPolicy": 0,
            "closedTimestampSidetransportInfo": {
              "replicaClosed": {
                "wallTime": "1681947852826909749",
                "logical": 0,
                "synthetic": false
              },
              "replicaLai": "27",
              "centralClosed": {
                "wallTime": "0",
                "logical": 0,
                "synthetic": false
              },
              "centralLai": "0"
            },
            "circuitBreakerError": "",
            "pausedReplicas": [
            ]
          },
          "sourceNodeId": 6,
          "sourceStoreId": 6,
          "errorMessage": "",
          "leaseHistory": [
            {
              "start": {
                "wallTime": "1681940760435397319",
                "logical": 0,
                "synthetic": false
              },
              "expiration": null,
              "replica": {
                "nodeId": 5,
                "storeId": 5,
                "replicaId": 3,
                "type": 0
              },
              "deprecatedStartStasis": null,
              "proposedTs": {
                "wallTime": "1681940760437956200",
                "logical": 0,
                "synthetic": false
              },
              "epoch": "3",
              "sequence": "2313",
              "acquisitionType": 2
            },
            {
              "start": {
                "wallTime": "1681940763877170751",
                "logical": 0,
                "synthetic": false
              },
              "expiration": {
                "wallTime": "1681940769877073340",
                "logical": 0,
                "synthetic": false
              },
              "replica": {
                "nodeId": 6,
                "storeId": 6,
                "replicaId": 4,
                "type": 2
              },
              "deprecatedStartStasis": null,
              "proposedTs": {
                "wallTime": "1681940763877073340",
                "logical": 0,
                "synthetic": false
              },
              "epoch": "0",
              "sequence": "2314",
              "acquisitionType": 1
            },
            {
              "start": {
                "wallTime": "1681940763877170751",
                "logical": 0,
                "synthetic": false
              },
              "expiration": null,
              "replica": {
                "nodeId": 6,
                "storeId": 6,
                "replicaId": 4,
                "type": 2
              },
              "deprecatedStartStasis": null,
              "proposedTs": {
                "wallTime": "1681940763886123978",
                "logical": 0,
                "synthetic": false
              },
              "epoch": "2",
              "sequence": "2315",
              "acquisitionType": 2
            }
          ],
          "problems": {
            "unavailable": false,
            "leaderNotLeaseHolder": false,
            "noRaftLeader": false,
            "underreplicated": false,
            "overreplicated": false,
            "noLease": false,
            "quiescentEqualsTicking": false,
            "raftLogTooLarge": false,
            "circuitBreakerError": false,
            "pausedFollowers": false
          },
          "stats": {
            "queriesPerSecond": 0,
            "writesPerSecond": 0,
            "requestsPerSecond": 0,
            "readsPerSecond": 0,
            "writeBytesPerSecond": 0,
            "readBytesPerSecond": 0,
            "cpuTimePerSecond": 0
          },
          "leaseStatus": {
            "lease": {
              "start": {
                "wallTime": "1681940763877170751",
                "logical": 0,
                "synthetic": false
              },
              "expiration": null,
              "replica": {
                "nodeId": 6,
                "storeId": 6,
                "replicaId": 4,
                "type": 2
              },
              "deprecatedStartStasis": null,
              "proposedTs": {
                "wallTime": "1681940763886123978",
                "logical": 0,
                "synthetic": false
              },
              "epoch": "2",
              "sequence": "2315",
              "acquisitionType": 2
            },
            "now": {
              "wallTime": "1681947855908740696",
              "logical": 0,
              "synthetic": false
            },
            "requestTime": {
              "wallTime": "1681947855908740696",
              "logical": 0,
              "synthetic": false
            },
            "state": 1,
            "errInfo": "",
            "liveness": {
              "nodeId": 6,
              "epoch": "2",
              "expiration": {
                "wallTime": "1681947860935702784",
                "logical": 0,
                "synthetic": null
              },
              "draining": true,
              "membership": 0
            },
            "minValidObservedTimestamp": {
              "wallTime": "1681940763877170751",
              "logical": 0,
              "synthetic": false
            }
          },
          "quiescent": true,
          "ticking": false,
          "readLatches": "0",
          "writeLatches": "0",
          "locks": "0",
          "locksWithWaitQueues": "0",
          "lockWaitQueueWaiters": "0",
          "topKLocksByWaitQueueWaiters": [
          ],
          "locality": {
            "tiers": [
              {
                "key": "cloud",
                "value": "gce"
              },
              {
                "key": "region",
                "value": "us-east1"
              },
              {
                "key": "zone",
                "value": "us-east1-c"
              }
            ]
          },
          "isLeaseholder": true,
          "leaseValid": true
        }
      ]
    },
    "7": {
      "response": true,
      "errorMessage": "",
      "infos": [
      ]
    },
    "8": {
      "response": true,
      "errorMessage": "",
      "infos": [
        {
          "span": {
            "startKey": "/Table/633/1/3381595941584118592",
            "endKey": "/Table/633/1/3400024257342070192"
          },
          "raftState": {
            "replicaId": "1",
            "hardState": {
              "term": "10",
              "vote": "5",
              "commit": "51"
            },
            "lead": "5",
            "state": "StateFollower",
            "applied": "51",
            "progress": {
            },
            "leadTransferee": "0"
          },
          "state": {
            "state": {
              "raftAppliedIndex": "51",
              "leaseAppliedIndex": "27",
              "desc": {
                "rangeId": "158617",
                "startKey": "9wJ5if0u7deXqAYjQA==",
                "endKey": "9wJ5if0vL1AMRpXdsA==",
                "internalReplicas": [
                  {
                    "nodeId": 8,
                    "storeId": 8,
                    "replicaId": 1,
                    "type": 0
                  },
                  {
                    "nodeId": 3,
                    "storeId": 3,
                    "replicaId": 5,
                    "type": 0
                  },
                  {
                    "nodeId": 6,
                    "storeId": 6,
                    "replicaId": 4,
                    "type": 0
                  }
                ],
                "nextReplicaId": 6,
                "generation": "10381",
                "stickyBit": {
                  "wallTime": "0",
                  "logical": 0,
                  "synthetic": false
                }
              },
              "lease": {
                "start": {
                  "wallTime": "1681940763877170751",
                  "logical": 0,
                  "synthetic": false
                },
                "expiration": null,
                "replica": {
                  "nodeId": 6,
                  "storeId": 6,
                  "replicaId": 4,
                  "type": 2
                },
                "deprecatedStartStasis": null,
                "proposedTs": {
                  "wallTime": "1681940763886123978",
                  "logical": 0,
                  "synthetic": false
                },
                "epoch": "2",
                "sequence": "2315",
                "acquisitionType": 2
              },
              "truncatedState": {
                "index": "10",
                "term": "5"
              },
              "gcThreshold": {
                "wallTime": "0",
                "logical": 0,
                "synthetic": false
              },
              "stats": {
                "containsEstimates": "0",
                "lastUpdateNanos": "1681947855897427979",
                "intentAge": "0",
                "gcBytesAge": "0",
                "liveBytes": "0",
                "liveCount": "0",
                "keyBytes": "0",
                "keyCount": "0",
                "valBytes": "0",
                "valCount": "0",
                "intentBytes": "0",
                "intentCount": "0",
                "separatedIntentCount": "0",
                "rangeKeyCount": "0",
                "rangeKeyBytes": "0",
                "rangeValCount": "0",
                "rangeValBytes": "0",
                "sysBytes": "1571",
                "sysCount": "8",
                "abortSpanBytes": "180"
              },
              "deprecatedUsingAppliedStateKey": false,
              "version": {
                "majorVal": 22,
                "minorVal": 2,
                "patch": 0,
                "internal": 0
              },
              "raftClosedTimestamp": {
                "wallTime": "1681942418509355273",
                "logical": 0,
                "synthetic": false
              },
              "raftAppliedIndexTerm": "10",
              "gcHint": {
                "latestRangeDeleteTimestamp": {
                  "wallTime": "1681940783499795597",
                  "logical": 0,
                  "synthetic": false
                }
              }
            },
            "lastIndex": "51",
            "numPending": "0",
            "numDropped": "0",
            "raftLogSize": "14944",
            "raftLogSizeTrusted": false,
            "approximateProposalQuota": "0",
            "proposalQuotaBaseIndex": "0",
            "proposalQuotaReleaseQueue": [
            ],
            "rangeMaxBytes": "536870912",
            "activeClosedTimestamp": {
              "wallTime": "1681947852826909749",
              "logical": 0,
              "synthetic": false
            },
            "rangefeedRegistrations": "0",
            "tenantId": "1",
            "closedTimestampPolicy": 0,
            "closedTimestampSidetransportInfo": {
              "replicaClosed": {
                "wallTime": "1681947852826909749",
                "logical": 0,
                "synthetic": false
              },
              "replicaLai": "27",
              "centralClosed": {
                "wallTime": "1681947852826909749",
                "logical": 0,
                "synthetic": false
              },
              "centralLai": "27"
            },
            "circuitBreakerError": "",
            "pausedReplicas": [
            ]
          },
          "sourceNodeId": 8,
          "sourceStoreId": 8,
          "errorMessage": "",
          "leaseHistory": [
            {
              "start": {
                "wallTime": "1681940760435397319",
                "logical": 0,
                "synthetic": false
              },
              "expiration": null,
              "replica": {
                "nodeId": 5,
                "storeId": 5,
                "replicaId": 3,
                "type": 0
              },
              "deprecatedStartStasis": null,
              "proposedTs": {
                "wallTime": "1681940760437956200",
                "logical": 0,
                "synthetic": false
              },
              "epoch": "3",
              "sequence": "2313",
              "acquisitionType": 2
            },
            {
              "start": {
                "wallTime": "1681940763877170751",
                "logical": 0,
                "synthetic": false
              },
              "expiration": {
                "wallTime": "1681940769877073340",
                "logical": 0,
                "synthetic": false
              },
              "replica": {
                "nodeId": 6,
                "storeId": 6,
                "replicaId": 4,
                "type": 2
              },
              "deprecatedStartStasis": null,
              "proposedTs": {
                "wallTime": "1681940763877073340",
                "logical": 0,
                "synthetic": false
              },
              "epoch": "0",
              "sequence": "2314",
              "acquisitionType": 1
            },
            {
              "start": {
                "wallTime": "1681940763877170751",
                "logical": 0,
                "synthetic": false
              },
              "expiration": null,
              "replica": {
                "nodeId": 6,
                "storeId": 6,
                "replicaId": 4,
                "type": 2
              },
              "deprecatedStartStasis": null,
              "proposedTs": {
                "wallTime": "1681940763886123978",
                "logical": 0,
                "synthetic": false
              },
              "epoch": "2",
              "sequence": "2315",
              "acquisitionType": 2
            }
          ],
          "problems": {
            "unavailable": false,
            "leaderNotLeaseHolder": false,
            "noRaftLeader": false,
            "underreplicated": false,
            "overreplicated": false,
            "noLease": false,
            "quiescentEqualsTicking": false,
            "raftLogTooLarge": false,
            "circuitBreakerError": false,
            "pausedFollowers": false
          },
          "stats": {
            "queriesPerSecond": 0,
            "writesPerSecond": 0,
            "requestsPerSecond": 0,
            "readsPerSecond": 0,
            "writeBytesPerSecond": 0,
            "readBytesPerSecond": 0,
            "cpuTimePerSecond": 0
          },
          "leaseStatus": {
            "lease": {
              "start": {
                "wallTime": "1681940763877170751",
                "logical": 0,
                "synthetic": false
              },
              "expiration": null,
              "replica": {
                "nodeId": 6,
                "storeId": 6,
                "replicaId": 4,
                "type": 2
              },
              "deprecatedStartStasis": null,
              "proposedTs": {
                "wallTime": "1681940763886123978",
                "logical": 0,
                "synthetic": false
              },
              "epoch": "2",
              "sequence": "2315",
              "acquisitionType": 2
            },
            "now": {
              "wallTime": "1681947855908782507",
              "logical": 0,
              "synthetic": false
            },
            "requestTime": {
              "wallTime": "1681947855908782507",
              "logical": 0,
              "synthetic": false
            },
            "state": 1,
            "errInfo": "",
            "liveness": {
              "nodeId": 6,
              "epoch": "2",
              "expiration": {
                "wallTime": "1681947860935702784",
                "logical": 0,
                "synthetic": null
              },
              "draining": true,
              "membership": 0
            },
            "minValidObservedTimestamp": {
              "wallTime": "0",
              "logical": 0,
              "synthetic": false
            }
          },
          "quiescent": true,
          "ticking": false,
          "readLatches": "0",
          "writeLatches": "0",
          "locks": "0",
          "locksWithWaitQueues": "0",
          "lockWaitQueueWaiters": "0",
          "topKLocksByWaitQueueWaiters": [
          ],
          "locality": {
            "tiers": [
              {
                "key": "cloud",
                "value": "gce"
              },
              {
                "key": "region",
                "value": "us-east1"
              },
              {
                "key": "zone",
                "value": "us-east1-d"
              }
            ]
          },
          "isLeaseholder": false,
          "leaseValid": true
        }
      ]
    }
  }
}

@kvoli
Copy link
Collaborator

kvoli commented Apr 19, 2023

Another 144488 had the same issue but resolved itself after 5 minutes:

I230419 22:59:46.494116 1357388647 kv/kvserver/store.go:1729 ⋮ [T1,drain,n6,s6,r144488/4:‹/Table/576/1/-34{5530…-3688…}›] 224593  failed to transfer lease repl=(n6,s6):4VOTER_INCOMING seq=2071 start=1681911361.105018268,0 epo=2 pro=1681911361.222536052,0 for range r144488:‹/Table/576/1/-34{55309204615925008-36880888857973408}› [(n8,s8):1, (n3,s3):2, (n6,s6):4, next=5, gen=9438] when draining: ‹no suitable transfer target found›

...
# tries again
I230419 22:59:57.617015 1357394497 kv/kvserver/store.go:1701 ⋮ [T1,drain,n6,s6,r144488/4:‹/Table/576/1/-34{5530…-3688…}›] 224638  attempting to transfer lease repl=(n6,s6):4VOTER_INCOMING seq=2071 start=1681911361.105018268,0 epo=2 pro=1681911361.222536052,0 for range r144488:‹/Table/576/1/-34{55309204615925008-36880888857973408}› [(n8,s8):1, (n3,s3):2, (n6,s6):4, next=5, gen=9438]
# succeeds
I230419 22:59:57.617259 1357394497 13@kv/kvserver/replicate_queue.go:2034 ⋮ [T1,drain,n6,s6,r144488/4:‹/Table/576/1/-34{5530…-3688…}›] 153689  transferring lease to s3

@andrewbaptist
Copy link
Collaborator

The issue is when we are in a configuration where the leader and leaseholder are different, the replicate queue is unable to fix the issue. It has a check excludeReplicasInNeedOfSnapshots which attempts to exclude behind replicas. However the problems is that when the leader is not the leaseholder, the ReplicaMayNeedSnapshot method returns LocalReplicaNotLeader. This prevents the replicate queue from moving the lease.

A reasonable fix is to reverse the check here to only exclude replicas if the state is ReplicaStateSnapshot

There is a potential issue of why we have a leader/leaseholder split for this range, and it may be worth tracking that down as well, but its possible that they just "organically appear" and this bug would prevent them from recovering.

@nvanbenschoten
Copy link
Member

A reasonable fix is to reverse the check here to only exclude replicas if the state is ReplicaStateSnapshot

If a leaseholder is not the raft leader then it does not track the progress of peers. It would not know whether the lease transfer target is caught up on its log or whether it requires a snapshot. See #81561 for more detail about why this would be unsafe and the consequences that allowing such lease transfers could have.

and this bug would prevent them from recovering

Which bug are you referring to? Why would it prevent the leaseholder from receiving leadership through maybeTransferRaftLeadershipToLeaseholderLocked?

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Apr 20, 2023

In #101885 (comment)

The full status entry for r158617

We see a split leader/leaseholder range that's quiescent.

➜ jq '.responsesByNodeId."3".infos[0].raftState.lead' status.json
"5"
➜ jq '.responsesByNodeId."3".infos[0].state.state.lease.replica.replicaId' status.json
4
➜ jq '.responsesByNodeId."3".infos[0].quiescent' status.json
true

This would explain why maybeTransferRaftLeadershipToLeaseholderLocked is not collocating the leader and leaseholder to unblock the drain. But it also should not be allowed by shouldReplicaQuiesce:

log.Infof(ctx, "not quiescing: not leader")

log.Infof(ctx, "not quiescing: not leaseholder")

Is this racy? Did we break things when we touched them earlier this release in #97289?

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Apr 20, 2023

n6's lease has remained valid since 2023-04-19 21:46:03. The n3:5 replica (leader) only came into existence at 22:13:40:

I230419 22:13:40.559828 1233371706 kv/kvserver/replica_raftstorage.go:489 ⋮ [T1,n3,s3,r158617/5:‹/Table/633/1/3{38159…-40002…}›] 63037  applied INITIAL snapshot 0be85246 from (n6,s6):4 at applied index 42 (total=25ms data=2.0 KiB ingestion=6@20ms)

So the lease has existed for as long as n3 has existed, and thus for as long as it can have been the leader, and it's still valid. The logic in #97289 should not allow it to quiesce. This shouldn't be racy, because if there are any outstanding proposals or unapplied log entries we don't quiesce either, so there can't be another lease in flight that we haven't seen.

However, what would happen if n6 had a liveness blip, either globally or from n3 gossip's point of view? There are two conditions that mark an epoch-based lease as expired:

  1. The liveness record's epoch has been bumped.

    if status.Liveness.Epoch > lease.Epoch {
    status.State = kvserverpb.LeaseState_EXPIRED
    return status
    }

  2. The liveness record has expired:

    if expiration.LessEq(now.ToTimestamp()) {
    status.State = kvserverpb.LeaseState_EXPIRED

I think the second condition is problematic. The assumption we make when we allow expired leases to quiesce is that someone has to reacquire the expired lease, and this will wake up the Raft group. But I think this assumption is flawed in two ways:

  1. If n6's liveness record expired from n3's point of view, then n3 will quiesce, but if noone goes on to bump n6's epoch then n6 can still hold on to the lease. From n3's point of view, an expired lease can revert back to a valid lease when the liveness record is updated, but at that point the Raft group is already quiesced.

  2. We assume that anyone can wake up the Raft group by sending a lease request. But they can't always, like in this case, because we require them to be the Raft leader. For lease acquisitions this presumably isn't problematic, since the current Raft leader can acquire it, but for transfers it is.

It's still not clear that this is what happened here (I'll see if I can find any indications that n6's liveness may have blipped on n3), but I think 1 above is reason enough to revert this change. We no longer strictly need this logic anyway, now that we eagerly extend expiration leases.

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Apr 20, 2023

Submitted a revert in #101899. I'm going to keep this issue open and try to figure out if this is what happened, or if there's more going on. I have to deal with some L2 stuff first though.

craig bot pushed a commit that referenced this issue Apr 20, 2023
101899: Revert "kvserver: allow expired leases to quiesce" r=erikgrinaker a=erikgrinaker

This reverts commit 76afb00.

This change was flawed, because it assumed an expired lease would have to be reacquired which would wake up the range, but that's not necessarily true. In cases where the Raft leader is not colocated with the leaseholder, the Raft leader may have a stale liveness record, or the leaseholder may temporarily fail to heartbeat liveness without anyone bumping its epoch. If this happens, it's possible for an expired lease to revert to a valid lease from the leader's point of view, but at that point it's too late: the range is already quiesced.

This can prevent e.g. lease transfers, because we don't allow leaseholders to transfer their lease if they're not the Raft leader (they need to make sure the target is not behind on the Raft log). We won't attempt to colocate the Raft leader with the leaseholder for a quiesced range, since we don't tick it.

Touches #101885.
Touches #97289.

Epic: none
Release note: None

Co-authored-by: Erik Grinaker <[email protected]>
@andrewbaptist
Copy link
Collaborator

Do we want to put a big warning or a Fatal in the replicate queue where it detects quiescent lease in the state leaseholder not leader? It would be much easier to find this sort of issue.

It didn't register to me that this state was so problematic.

@erikgrinaker
Copy link
Contributor

Sure, if we can do so without a risk of races. Let's not do fatal though, at least not in production builds.

@andrewbaptist
Copy link
Collaborator

For 23.1 we can just add an warning log to this. I agree it is hard to avoid the risk of races here. I can take a look and add something if there is an easy change. I created #101902 to track this.

@erikgrinaker erikgrinaker added T-kv-replication and removed T-kv KV Team labels Apr 20, 2023
@blathers-crl
Copy link

blathers-crl bot commented Apr 20, 2023

cc @cockroachdb/replication

@erikgrinaker erikgrinaker changed the title sql: leases appear stuck when draining a node kvserver: leases appear stuck when draining a node Apr 20, 2023
@nicktrav nicktrav added the O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster label Apr 24, 2023
@erikgrinaker
Copy link
Contributor

I'm a bit stuck here. I only have time series to go by, since the debug.zip I grabbed only had 48 hours of logs, and the necessary logs are now rotated out.

The timeline is something like this:

  • 21:46:03 n6 acquires lease and retains it throughout.
  • 22:13:40 n3 replica is created.
  • 22:36:00 n3 is restarted, starts picking up Raft leaderships (due to incomplete prevote and idle ranges).
  • 22:53:00 n6 drains, shedding all but 2 leases.
  • 23:00:00 n3 leaders_not_leaseholders drops to 1 and remains there.

There were no liveness failures nor epoch increments in the interval, so the hypothesis above about n3's view of the lease flipping to EXPIRED can't be confirmed. If it hasn't received any liveness gossip for n6 yet, it'll see the lease as ERROR and not quiesce either.

Over the next 20 minutes, we see n3 moving Raft leaderships to leaseholders, as we would expect for unquiesced leaders.

I've reviewed the quiescence code, and I don't see any other explanation for why we fail to quiesce except the hypothesis above, but I also haven't been able to confirm that that's the bug we hit here. I don't know if I'll be able to make much more progress here, so I'll remove the GA blocker, but will try some more restarts and drains on the test cluster to see if it reproduces.

Screenshot 2023-04-27 at 12 58 21

Screenshot 2023-04-27 at 14 41 04

@erikgrinaker
Copy link
Contributor

Did another rolling drain+restart of the 23.1 test cluster. Except for #102574, the drains all went through successfully, and leaders_not_leaseholders always dropped to 0 within a reasonable time. Given that we've also ran several other successful drain/restart cycles, I'm going to close this out for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster
Projects
None yet
Development

No branches or pull requests

7 participants