Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: transfer-leases/signal failed [presumed same as #83261] #83372

Closed
cockroach-teamcity opened this issue Jun 25, 2022 · 21 comments · Fixed by #90106
Closed

roachtest: transfer-leases/signal failed [presumed same as #83261] #83372

cockroach-teamcity opened this issue Jun 25, 2022 · 21 comments · Fixed by #90106
Assignees
Labels
branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-kv KV Team

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jun 25, 2022

roachtest.transfer-leases/signal failed with artifacts on master @ fc7ae339c3f85ad2a2b28643e47d7c20768fe237:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:207,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:896: (1) ranges with no lease outside of node 1: []string{"11"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-17030

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jun 25, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Jun 25, 2022
@tbg tbg changed the title roachtest: transfer-leases/signal failed roachtest: transfer-leases/signal failed [presumed same as #83261] Jun 27, 2022
@tbg
Copy link
Member

tbg commented Jun 27, 2022

Going to assume this is like #83261

@tbg tbg removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Jun 27, 2022
@cockroach-teamcity
Copy link
Member Author

roachtest.transfer-leases/signal failed with artifacts on master @ 3c9b17113488d2ee6929936aa6ec48396f3ed71c:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:207,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:896: (1) ranges with no lease outside of node 2: []string{"14"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.transfer-leases/signal failed with artifacts on master @ b9a165c03643ed43a0d45b16c0a795244543a2fe:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:207,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:896: (1) ranges with no lease outside of node 2: []string{"44"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@nvanbenschoten nvanbenschoten self-assigned this Jul 7, 2022
@cockroach-teamcity
Copy link
Member Author

roachtest.transfer-leases/signal failed with artifacts on master @ eabdc49383a19d5731b765dbb0d8b45bd9e24404:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:207,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:896: (1) ranges with no lease outside of node 1: []string{"44"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.transfer-leases/signal failed with artifacts on master @ f53c21f9fe0fe07ba32d2f28ac4bd2c4bc2ed50b:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:207,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:896: (1) ranges with no lease outside of node 3: []string{"4"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.transfer-leases/signal failed with artifacts on master @ e6a7dc2f8ee39549e186bd05626c4c375b76fd04:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:207,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:896: (1) ranges with no lease outside of node 1: []string{"30", "507"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.transfer-leases/signal failed with artifacts on master @ 524fd14da3fefcd849f44a835cc5f88f5dbdadcc:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:207,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:896: (1) ranges with no lease outside of node 3: []string{"47"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.transfer-leases/signal failed with artifacts on master @ f59620ec646d1181d358d0dc41ab60815ecf59c9:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:207,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:897: (1) ranges with no lease outside of node 3: []string{"34"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.transfer-leases/signal failed with artifacts on master @ 3b16435371a43d603d193a1e2b480a23fba3f07a:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:207,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:897: (1) ranges with no lease outside of node 1: []string{"430", "502", "492", "486"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.transfer-leases/signal failed with artifacts on master @ 2372698da1dfacb90f60c6a63f2c1298d1db16b8:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:207,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:897: (1) ranges with no lease outside of node 1: []string{"416", "519", "444", "455", "314"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.transfer-leases/signal failed with artifacts on master @ e39111b2e714375faa0facc05e51e8f619a55b21:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:207,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:908: (1) ranges with no lease outside of node 3: []string{"8"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.transfer-leases/signal failed with artifacts on master @ a82711442c65cf14489c55041b45b11a1e38415b:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:207,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:906: (1) ranges with no lease outside of node 3: []string{"34", "24"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.transfer-leases/signal failed with artifacts on master @ bc2e47da0523b347c28cf024707e80cd35d6c98a:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:207,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:917: (1) ranges with no lease outside of node 3: []string{"52", "29", "37", "8"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.transfer-leases/signal failed with artifacts on master @ 773568fbda06ba9be9fb1bc34a331f21c8891ffa:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:207,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:917: (1) ranges with no lease outside of node 3: []string{"19"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.transfer-leases/signal failed with artifacts on master @ 95677eb5f8d006629b16024fb7d87d55344c1470:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:207,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:917: (1) ranges with no lease outside of node 3: []string{"46", "34", "52", "20", "30", "39"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.transfer-leases/signal failed with artifacts on master @ 9a05046ce19e7678340e82c70d61e928be95bc72:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:207,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:917: (1) ranges with no lease outside of node 2: []string{"17", "19"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.transfer-leases/signal failed with artifacts on master @ bd97ad5b8c9f537a89492a051574d867469bef33:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:207,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:917: (1) ranges with no lease outside of node 3: []string{"14"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.transfer-leases/signal failed with artifacts on master @ 3fa1a1600898d7b78b9e39d07132a387a2f9a1b6:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:207,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:917: (1) ranges with no lease outside of node 2: []string{"519", "488", "553", "499", "551"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.transfer-leases/signal failed with artifacts on master @ 87ed064dc23eab6948ee8a07e8507f150bda0e44:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:207,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:917: (1) ranges with no lease outside of node 3: []string{"42", "8", "6", "17"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.transfer-leases/signal failed with artifacts on master @ 5862e9b3c1e24b3f644716789c96108a8e92fd71:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:207,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:917: (1) ranges with no lease outside of node 3: []string{"45", "41", "19"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.transfer-leases/signal failed with artifacts on master @ 801bfc62afd7128be180e3396d21a1e0b2daa227:

test artifacts and logs in: /artifacts/transfer-leases/signal/run_1
	quit.go:72,quit.go:323,soon.go:69,retry.go:208,soon.go:75,soon.go:48,quit.go:228,quit.go:95,quit.go:154,context.go:91,quit.go:153,quit.go:95,quit.go:54,quit.go:359,test_runner.go:928: (1) ranges with no lease outside of node 2: []string{"11"}

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Oct 17, 2022
Fixes cockroachdb#83372.
Fixes cockroachdb#90022.
Fixes cockroachdb#89963.
Fixes cockroachdb#89962.

This commit instructs stores to reacquire proscribed leases when draining in
order to subsequently transfer them away. This addresses a source of flakiness
in `transfer-lease` roachtests where some lease would not be transferred away
before the drain completed. This could result in range unavailable for up to 9
seconds while other replicas waited out the lease'S expiration. This is because
only the previous leaseholder knows that a proscribed lease is invalid. All
other replicas still consider the lease to be valid.

This failure mode was always present if a lease transfer failed during a drain.
However, it became more likely with 034611b. With that change, we began
rejecting lease transfers that were deemed to be "unsafe" more frequently.
034611b introduced a best-effort, graceful version of this check and an
airtight, ungraceful version of the check. The former performs the check before
revoking the outgoing leaseholder's lease while the latter performs the check
after revoking the outgoing leaseholder's lease. In rare cases, it was possible
to hit the airtight, ungraceful check and cause the lease to be proscribed. See
cockroachdb#83261 (comment)
for more details on how this led to test flakiness in the `transfer-lease`
roachtest suite.

Release notes: None.

Release justification: Avoids GA-blocking roachtest failures.
craig bot pushed a commit that referenced this issue Oct 18, 2022
90007: execbuilder: enforce_home_region should only apply to DML r=rytaft a=msirek

Fixes #89875
Fixes #88789

This fixes a problem where the enforce_home_region session flag might
cause non-DML statements to error out, such as SHOW CREATE, if those
statements utilize scans or joins of multiregion tables.

This also fixes issues with proper erroring out of mutation DML like
UPDATE AND DELETE. For example, the following previously did not error:
```
CREATE TABLE messages_rbr (
    account_id INT NOT NULL,
    message_id   UUID DEFAULT gen_random_uuid(),
    message    STRING NOT NULL,
    PRIMARY KEY (account_id),
    INDEX msg_idx(message)
)
LOCALITY REGIONAL BY ROW;

SET enforce_home_region = true;

DELETE FROM messages_rbr WHERE message = 'Hello World!'
ERROR: Query has no home region. Try adding a filter on messages_rbr.crdb_region and/or on key column (messages_rbr.account_id).
SQLSTATE: XCHR2
```

Release note (bug fix): This patch fixes an issue with the
enforce_home_region session setting which may cause SHOW CREATE TABLE or
other non-DML statements to error out if the optimizer plan for the
statement involves accessing a multiregion table.


90106: kv: reacquire proscribed leases on drain, then transfer r=shralex a=nvanbenschoten

Fixes #83372.
Fixes #90022.
Fixes #89963.
Fixes #89962.

This commit instructs stores to reacquire proscribed leases when draining in order to subsequently transfer them away. This addresses a source of flakiness in `transfer-lease` roachtests where some lease would not be transferred away before the drain completed. This could result in range unavailable for up to 9 seconds while other replicas waited out the lease'S expiration. This is because only the previous leaseholder knows that a proscribed lease is invalid. All other replicas still consider the lease to be valid.

This failure mode was always present if a lease transfer failed during a drain. However, it became more likely with 034611b. With that change, we began rejecting lease transfers that were deemed to be "unsafe" more frequently. 034611b introduced a best-effort, graceful version of this check and an airtight, ungraceful version of the check. The former performs the check before revoking the outgoing leaseholder's lease while the latter performs the check after revoking the outgoing leaseholder's lease. In rare cases, it was possible to hit the airtight, ungraceful check and cause the lease to be proscribed. See #83261 (comment) for more details on how this led to test flakiness in the `transfer-lease` roachtest suite.

Release notes: None.

Release justification: Avoids GA-blocking roachtest failures.

90107: execbuilder: fix enforce_home_region erroring of input table to LOJ r=rytaft a=msirek

Fixes #88788

This fixes erroring out of locality-optimized join when the input table's home region does not match the gateway region and session flag `enforce_home_region` is true.

Release note (bug fix): This patch fixes detection and erroring out of queries using locality-optimized join when session setting enforce_home_region is true and the input table to the join has no home region or its home region does not match the gateway region.

90165: sql,server: increase severity of upgraded-related logging r=ajwerner a=knz

Informs #90148.

This increases the severity from INFO in the following cases:

- in the case when `SET CLUSTER SETTING version` is issued from a SQL client (WARNING in case of failure).
- in the case when the server spontaneously decides to upgrade in the background (ERROR in case of failure).

Release note: None

90166: sql/rowenc: remove leftover log in test r=mgartner a=mgartner

Epic: None
Release note: None

Co-authored-by: Mark Sirek <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
@craig craig bot closed this as completed in fd3b0b9 Oct 18, 2022
blathers-crl bot pushed a commit that referenced this issue Oct 18, 2022
Fixes #83372.
Fixes #90022.
Fixes #89963.
Fixes #89962.

This commit instructs stores to reacquire proscribed leases when draining in
order to subsequently transfer them away. This addresses a source of flakiness
in `transfer-lease` roachtests where some lease would not be transferred away
before the drain completed. This could result in range unavailable for up to 9
seconds while other replicas waited out the lease'S expiration. This is because
only the previous leaseholder knows that a proscribed lease is invalid. All
other replicas still consider the lease to be valid.

This failure mode was always present if a lease transfer failed during a drain.
However, it became more likely with 034611b. With that change, we began
rejecting lease transfers that were deemed to be "unsafe" more frequently.
034611b introduced a best-effort, graceful version of this check and an
airtight, ungraceful version of the check. The former performs the check before
revoking the outgoing leaseholder's lease while the latter performs the check
after revoking the outgoing leaseholder's lease. In rare cases, it was possible
to hit the airtight, ungraceful check and cause the lease to be proscribed. See
#83261 (comment)
for more details on how this led to test flakiness in the `transfer-lease`
roachtest suite.

Release notes: None.

Release justification: Avoids GA-blocking roachtest failures.
blathers-crl bot pushed a commit that referenced this issue Oct 18, 2022
Fixes #83372.
Fixes #90022.
Fixes #89963.
Fixes #89962.

This commit instructs stores to reacquire proscribed leases when draining in
order to subsequently transfer them away. This addresses a source of flakiness
in `transfer-lease` roachtests where some lease would not be transferred away
before the drain completed. This could result in range unavailable for up to 9
seconds while other replicas waited out the lease'S expiration. This is because
only the previous leaseholder knows that a proscribed lease is invalid. All
other replicas still consider the lease to be valid.

This failure mode was always present if a lease transfer failed during a drain.
However, it became more likely with 034611b. With that change, we began
rejecting lease transfers that were deemed to be "unsafe" more frequently.
034611b introduced a best-effort, graceful version of this check and an
airtight, ungraceful version of the check. The former performs the check before
revoking the outgoing leaseholder's lease while the latter performs the check
after revoking the outgoing leaseholder's lease. In rare cases, it was possible
to hit the airtight, ungraceful check and cause the lease to be proscribed. See
#83261 (comment)
for more details on how this led to test flakiness in the `transfer-lease`
roachtest suite.

Release notes: None.

Release justification: Avoids GA-blocking roachtest failures.
@lunevalex lunevalex added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants