-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: eager lease preference enforcement occasionally fails when acquiring node fails liveness #108512
Labels
A-kv-distribution
Relating to rebalancing and leasing.
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
T-kv
KV Team
Comments
kvoli
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
A-kv-distribution
Relating to rebalancing and leasing.
labels
Aug 10, 2023
kvoli
added a commit
to kvoli/cockroach
that referenced
this issue
Aug 10, 2023
The lease preferences roachtest could occasionally fail, if the liveness leaseholder were on a stopped node. We should address this issue, for now, pin the liveness lease to a live node to prevent flakes. Informs: cockroachdb#108512 Resolves: cockroachdb#108425 Release note: None
craig bot
pushed a commit
that referenced
this issue
Aug 10, 2023
107302: storage: add method to ingest external files, rename IngestExternalFiles r=RaduBerinde a=itsbilal Requires cockroachdb/pebble#2753 This change renames the existing IngestExternalFiles method on storage.Engine to IngestLocalFiles, and adds a new IngestExternalFiles that ingests pebble.ExternalFile, for use with online restore. Depends on cockroachdb/pebble#2753. Epic: none Release note: None 108402: serverutils: remove ad-hoc code from StartNewTestCluster r=yuzefovich a=knz This function is a convenience alias for NewTestCluster+Start. This should not contain custom logic specific to certain tests. Any custom logic should be conditional on testing knobs and put inside `(*testcluster.TestCluster).Start()` instead. (The code removed here was mistakenly added in the wrong place in 70f85cd). Release note: None Needed for #107986. Epic: CRDB-18499 108446: kv: skip TestConstraintConformanceReportIntegration under deadlock r=erikgrinaker a=nvanbenschoten Fixes #108430. This commit avoids flakiness in `TestConstraintConformanceReportIntegration` by skipping the test under deadlock builds. It has been observed to run slowly and flake under stress, and we see the same kinds of behavior under deadlock builds. Release notes: None 108451: schemachanger: Refactor tests for concurrent schema changer behaviors r=Xiang-Gu a=Xiang-Gu 1. It cleans up some redundant tests about concurrent schema changer behavior and refactor in a new simpler, cleaner test 2. It adds an integration style test for testing concurrent schema change behaviors where we run many schema changes for an extended period of time and assert that all of they eventually succeed and the descriptors end up in the expected state. Fix #108140 Fix #107223 Epic: None Release note: None 108492: kv: remove errSavepointInvalidAfterTxnRestart r=knz a=nvanbenschoten This commit simplifies logic in `checkSavepointLocked`. Epic: None Release note: None 108497: sql: don't start default test tenant in MT admin function tests r=yuzefovich a=yuzefovich These tests themselves start multiple tenants, so there is no need to create a default test tenant (doing that also makes it a bit more confusing because the default tenant as well as the first test tenant share the same TenantID effectively making it two SQL pod config, which is confusing). Starting the default test tenant was enabled recently in c899661 when we enabled the CCL license, and we have seen at least one confusing failure that is possibly related to this. Starting the default test tenant was originally added in cfa4375, but I don't see a good reason for it. This PR is opportunistic fix of #108081. Fixes: #108081. Release note: None 108502: kvstreamer: add more assertions to RequestsProvider.enqueue r=yuzefovich a=michae2 If we ever enqueue zero-length requests, it could cause a deadlock where the `workerCoordinator` is waiting for more requests and the enqueuer is waiting for results. Add assertions that we never do this. Informs: #101823 Release note: None 108517: roachtest: pin liveness lease to live node in lease prefs test r=erikgrinaker a=kvoli The lease preferences roachtest could occasionally fail, if the liveness leaseholder were on a stopped node. We should address this issue, for now, pin the liveness lease to a live node to prevent flakes. Informs: #108512 Resolves: #108425 Release note: None Co-authored-by: Bilal Akhtar <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Xiang Gu <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Michael Erickson <[email protected]> Co-authored-by: Austen McClernon <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv-distribution
Relating to rebalancing and leasing.
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
T-kv
KV Team
Describe the problem
#107507 added lease transfers when the acquiring node violated preferences. The mechanism can fail however, if the acquiring node then fails a heartbeat, like we saw in #108425.
To Reproduce
See the
lease-preferences/full-first-preference-down
roachtest: #108425.Expected behavior
When (if) a node successfully heartbeats, the leases would transfer then. Or be acquired by another node.
Environment:
Jira issue: CRDB-30503
The text was updated successfully, but these errors were encountered: