Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: inconsistent span configs between outgoing and incoming leaseholders causes temporary violations #103086

Open
kvoli opened this issue May 11, 2023 · 0 comments
Labels
A-kv-distribution Relating to rebalancing and leasing. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team

Comments

@kvoli
Copy link
Collaborator

kvoli commented May 11, 2023

We don't have causal consistency guarantees that if a previous leaseholder sees an updated span config (v2) that the new leaseholder for the range will also only see => v2.

The period of time where this state occurs is usually short, however still long enough to remove several replicas from a range if the old configuration had a lower RF or different constraints.

The impact is potential (temporary) underreplication and constraint violation. This also results in redundant work, as the new leaseholder has to undo its replication activity from the period where it uses a stale span config, once it receives the updated span config.

Repro can be found by checking #101519 comment thread.

Jira issue: CRDB-27826

@kvoli kvoli added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-distribution Relating to rebalancing and leasing. labels May 11, 2023
@blathers-crl blathers-crl bot added the T-kv KV Team label May 11, 2023
craig bot pushed a commit that referenced this issue May 15, 2023
103077: docgen: add EXPLAIN, EXPLAIN ANALYZE options to diagrams r=michae2 a=taroface

Add various `EXPLAIN` and `EXPLAIN ANALYZE` options that were documented but not exposed in the SQL diagrams. Also add `REDACT`, which will be documented with cockroachdb/docs#16929.

Epic: none

Release note: none

Release justification: non-production code change

103083: kvserver: deflake TestPromoteNonVoterInAddVoter r=andrewbaptist a=kvoli

It is possible for span config updates to arrive at different times
between stores. `TestPromoteNonVoterInAddVoter` was flaking when the
incoming leaseholder would act upon a stale span config, before
receiving the updated one which the outgoing leaseholder used.

This resulted in the test failing as more than just the two expected
promotion events appeared in the range log, as the incoming leaseholder
removed voters, then subsequently added them back upon receiving the
up to date span configuration. 

#103086 tracks this issue.

This PR checks on the prefix of the range log add voter events, to
avoid failing the test when this untimely behavior occurs.

Stressed overnight, removing the skip under stress flag:

```
dev test pkg/kv/kvserver -f TestPromoteNonVoterInAddVoter -v --stress --stress-args="-p 4"
...
27158 runs so far, 0 failures, over 12h56m55s
```

This PR also adds additional (v=6) logging of the range descriptor and span config,
as these come in handy when debugging failures such as this.

Fixes: #101519

Release note: None

103332: cloud/kubernetes: bump version to v23.1.0 r=ZhouXing19 a=ZhouXing19

Epic: None
Release note: None

103335: util/version: update roachtest version for 23.2 to 23.1.0 r=ZhouXing19 a=ZhouXing19

Epic: None
Release note: None

103338: ci: additionally build `workload` and `dev` for all Unix systems r=rail,srosenberg a=rickystewart

This includes ARM machines and macOS systems.

Epic: none
Release note: None

Co-authored-by: Ryan Kuo <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Jane Xing <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team
Projects
None yet
Development

No branches or pull requests

1 participant